Multiple independent colonizations into the Congo Basin during the continental radiation of African Mastacembelus spiny eels

There has been recent interest in the origin and assembly of continental biotas based on densely sampled species‐level clades, however, studies from African freshwaters are few so that the commonality of macroevolutionary patterns and processes among continental clades remain to be tested. Within the Afrotropics, the Congo Basin contains the highest diversity of riverine fishes, yet it is unclear how this fauna was assembled. To address this, and the diversification dynamics of a continental radiation, we focus on African Mastacembelus spiny eels.


| INTRODUCTION
Continental scale species radiations represent excellent opportunities to elucidate patterns and mechanisms responsible for the generation of diversity across heterogeneous landscapes and over geologically complex time periods (e.g. Daniels, Phiri, Klaus, Albrecht, & Cumberlidge, 2015;Day et al., 2013;Derryberry et al., 2011;De-Silva, Elias, Willmott, Mallet, & Day, 2016;Liedtke et al., 2016;Schenk, Rowe, & Steppan, 2013). Speciation and the persistence of diversity through time may be particularly evident in freshwater systems given landscape heterogeneity through fragmentation of aquatic networks as well as ephemerality (Seehausen & Wagner, 2014). Despite river drainage evolution having been shown to shape species distributions and phylogeographical relationships of fish clades (e.g. Goodier, Cotterill, O'Ryan, Skelton, & de Wit, 2011;Montoya-Burgos, 2003;Morris et al., 2016;Near & Keck, 2005), densely sampled species-level studies at the continental scale are lacking. This is particularly pertinent for the Afrotropics, where the processes generating riverine ichthyofauna diversity are largely untested.
The spatial and temporal complexity of continental systems contrast to insular systems where radiations are often described as adaptive, and ecological processes ultimately limit the build up of diversity (e.g. Harmon, Melville, Larson, & Losos, 2008;Reddy, Driskell, Rabosky, Hackett, & Schulenberg, 2012). While such a scenario is not always identified in insular systems (e.g. Harmon et al., 2010), it remains unclear to what degree ecological processes limit the generation of diversity in radiations that span continents (Harmon & Harrison, 2015; but see Rabosky & Hurlbert, 2015). Several recent individual taxon studies of tropical species radiations spanning continents (Day et al., 2013;Derryberry et al., 2011;Liedtke et al., 2016) highlight a near constant rate of diversification. Explanations offered for an absence of density-dependent diversification in these truly continental radiations maybe because widely distributed clades are not limited by ecological opportunities (see Kisel, McInnes, Toomey, & Orme, 2011) or are too young to have reached their ecological limit (Derryberry et al., 2011).
Within the Afrotropics, the dense and extensive hydrological network of the Congo Basin, covering 4 million km 2 and only second in size to the Amazon Basin, contains the highest diversity of riverine fishes in Africa (Stiassny, Brummett, Harrison, Monsembula, & Mamonekene, 2011). Despite such elevated levels of species richness (c. 1200 species) and endemism (>80%) (Harrison, Brummett, & Stiassny, 2016), it is unclear how this diversity arose and is maintained. Evolutionary studies focused at the population-level have shown the importance of hydrographic barriers that prevent mixing between freshwater populations, even on a microscale, facilitating diversification (Alter, Munshi-South, & Stiassny, 2017;Markert, Schelly, & Stiassny, 2010). However, the role of the Congo Basin in generating and shaping present day Afrotropical riverine faunas, specifically its importance as a source (i.e. allowing only dispersal out of the Congo) or sink (i.e. allowing only dispersal into the Congo Basin) region, or whether a combination of these processes has led to the build up of its diversity, has to our knowledge not been specifically tested.
Here we combine historical biogeography and diversification analyses, focusing on Afrotropical mastacembelid spiny eels that differ in their ecology to other African freshwater fishes investigated from a continental perspective (Day et al., 2013;Schwarzer et al., 2009), to offer further insights into the macroevolutionary processes that shaped freshwater biodiversity of the Afrotropics. We address how the fauna of the Congo Basin has been assembled, and test: Has the Congo Basin acted as a source or a sink? Diversification dynamics are also investigated in which we test the null model: Have rates remained constant during a continental-wide radiation?
Previous molecular studies have focused on regions with high levels of sympatric diversity (Alter, Brown, & Stiassny, 2015;Brown, R€ uber, Bills, & Day, 2010), however, continental scale biogeographical and diversification patterns and processes within the group remain to be investigated.
By reconstructing the most comprehensive molecular phylogeny of African mastacembelid spiny eels to date, we show repeated colonization of the Congo Basin, whereas conversely all other biogeographic areas are remarkably conserved phylogenetically indicating considerable geographic constraint. Furthermore, irrespective of the geological and climatic perturbations that occurred during the evolutionary history of this continental radiation, there is a lack of strong evidence for declining (density-dependent) diversification.

| Data collection
Five molecular markers were sequenced (5,297 bp) that included those selected in Brown et al. (2010): cytochrome b (Cytb), cytochrome c oxidase sub-unit (CO1), nuclear S7 introns 1 and 2, and two additional nuclear loci: recombination activating gene 1 (RAG1) and ectodermal neural cortex 1 (ENC1). DNA was extracted from white tissue/fin clip samples using DNeasy Blood and Tissue kit (Qiagen, UK). For PCR primers and amplification conditions, see Appendix S3. Cleaned PCR products were sequenced on an ABI 3730 sequencer (Applied Biosystems, UK). Molecular sequence data were aligned in GENEIOUS 5.6 (Kearse et al., 2012) using default settings and subsequently checked by eye for stop codons and reading frame shifts.

| Phylogenetic inference
A detailed version of the phylogenetic methods is available within Appendix S3. Analyses were performed on (1) single markers, (2) mitochondrial data, (3) nuclear data, (4) combined mitochondrial and nuclear data.

| Estimation of divergence times
In the absence of fossil Synbranchiformes, our phylogeny was dated using fossil Channidae (snakeheads), a related family (Betancur_R et al., 2013). We followed Brown et al. (2010) and selected the oldest African Channidae fossil (Murray, 2006) from the latest Eocene to earliest Oligocene (35-33 Ma, Kappleman, 1992) as a minimum age constraint on the Channidae crown group. Older fossils from the Middle Eocene that have been placed within this family (Roe, 1991) were applied to a recent study of the Channoidei (Adamson, Hurwood, & Mather, 2010). However, these fossils were not utilized here as a minimum age constraint since their taxonomic placement is ambiguous. Nevertheless we acknowledge that the channid fossil record could be older and therefore placed a lognormal prior (initial value = 33 Ma, mean = 1.8, SD = 0.5; 95% CPD = 46.77) on the calibration node assuming that the most recent common ancestor (MRCA) of the Channidae may be as old as 48 Ma (Roe, 1991).
We also performed further analyses applying a geological calibration based on the age of Lake Tanganyika (LT, 9-12 Ma, Cohen, Soreghan, & Scholz, 1993) as a hard maximum for the node pertaining to the endemic radiation using a normal prior (mean = 10.5 Ma, standard deviation = 0.7; 95% CPD = 11.65). While geological calibrations can lead to wide variability in their precision and reliability (Ho et al., 2015), it is plausible to assume these endemic taxa evolved in situ.
Although calibrations based on lake age does not allow for independent assessment of colonization and diversification scenarios of lake endemics, the use of the LT calibration allows us to examine the degree of congruence between fossil and geological calibrations.
A single representative per species (including undescribed taxa) was selected based on having all loci sequenced (see Appendix S1).
Several additional taxa were included where they had shown to be non-monophyletic (i.e. M. congicus, M. frenatus, see Appendix S3).
Separate analyses were run implementing linked and unlinked (nuDNA data only) uncorrelated lognormal relaxed molecular clocks (i.e. allowing different distributions of rates among branches depending on gene) selecting a Yule speciation prior. Optimal models and partitioning schemes for these datasets were selected using PARTITION-FINDER 1.1 (Lanfear, Calcott, Ho, & Guindon, 2012; see Appendix S3) in which one partition per gene was defined in order to account for over-parameterization. Each analysis was conducted three times (using the CIPRES Science Gateway server; Miller, Pfeiffer, & Schwartz, 2010) and Markov chain Monte Carlo (MCMC) chains were run for 50,000,000 generations. Burnin run convergence and tree visualization were assessed using the same methods as outlined in Appendix S3.

| Ancestral range estimation
Geographic range evolution was estimated using dispersal-extinctioncladogenesis (DEC; Ree, Moore, Webb, & Donoghue, 2005;Ree, Smith, & Baker, 2008) implemented in the R package BIOGEOBEARS (Matzke, 2013). To test between alternative biogeographic scenarios regarding the build up of fauna within the Congo Basin the following models were implemented: M0, (unconstrained) allowing dispersal to and from the Congo Basin; M1, allowing only dispersal out of the Congo Basin (i.e. Congo Basin acting as a source); M2, allowing only dispersal into the Congo Basin (i.e. Congo Basin acting as a sink). Models were evaluated based on Akaike's information criterion (AIC) scores and weights. We did not test founder-event speciation (DEC+J) as this model is preferred for island clades (Matzke, 2014).
Eight geographic areas were included in our analysis (see Appendix S2) based on the Afrotropical ichthyo-provinces (after Roberts, 1975) Forest (LGF), in which N-S, UGF and LGF were combined as the broad region West Africa (WA) to simplify analyses. Lake Tanganyika (LT) and Lake Malawi (LM) were also included as separate regions (following Day et al., 2013) based on the high levels of lake endemicity. All non-African mastacembelids are placed into a single Asian (A) grouping. Distributional data (see Appendix S1) was based on museum databases and FISHBASE (http://www.fishbase.org).
Models allowed for geographic ranges to include any combination of ichthyo-provinces.

| Lineage diversification
To test whether diversification rates had changed over time, the gamma (c) statistic was calculated (Pybus & Harvey, 2000) using the R package APE 3.4 (Paradis, Claude, & Strimmer, 2004), and the effects of missing species examined using the Monte Carlo constant rates test (Pybus & Harvey, 2000). The tempo of diversification was visualized by generating a lineage-through-time (LTT) plot from 100 Bayesian chains sampled randomly from the posterior distribution of the nuDNA dated tree using PHYTOOLS 0.5-10 (Revell, 2012) and APE 3.4 (Paradis et al., 2004). This tree included n = 41 taxa (36 valid species, and five possible undescribed species). A total of 20 missing species (five valid and five undescribed species [Vreven, 2001] and 10 cryptic species [we included a very liberal estimate based on results from the mtDNA tree]) were included at random, with equal probability along the branches, using code adapted from Day, Cotton, and Barraclough (2008).
As a likelihood approach is considered more powerful for testing between diversification scenarios (Morlon, 2014), we tested the null hypothesis that diversification rates remained temporally constant across the African radiation, and constituent subclade comprising Lake Tan  .
Three independent MCMC runs were each run for five million generations sampling every 1,000th and assessed using TRACER 1.6 to confirm convergence of replicates. The global sampling fraction was set to 0.88 (to account for missing species). Bayes factors (BF) were calculated for model (i.e. rate regime) comparison. Rate dynamics were further investigated by a rate-through-time (RTT) analysis. All analyses were carried out in R 3.3.3 (R Core Team 2015).  Table 1). Analyses of nuDNA data implementing the fossil calibration revealed greater variation of divergence estimation, although confidence intervals broadly overlapped. While unlinking nuDNA clocks gave similar estimates to the mtDNA data for older divergences, this analysis yielded the youngest estimates for more recent divergences, for example, the LT radiation (see Table 1).

| Phylogenetic inference
Generally linking nuDNA clocks gave more similar dates to the mtDNA data, with the exception of older divergences, for example, MRCA of the Mastacembelidae (Table 1) (Figure 1). This is sometime after the estimated age of the family at 28.9 Ma (95% HPD: 47.4-16.7), indicating that the colonization and subsequent diversification within Africa is relatively chronologically recent, but we note confidence intervals are broad. Divergence of the LT radiation at 5.5 Ma (95% HPD: 9.2-3.0) is younger than previous estimates generated from combined mt-and nuDNA data: 7.9 Ma (95% HPD: 10.6-5.5, Brown et al., 2010); 7.0 Ma (95% HPD: 9.9-4.7, Alter et al., 2015) although confidence intervals overlap with these studies. showed no further diversification (Day et al., 2013). Notably, the ancestor of clades D+E, reconstructed as Congo Basin in origin (Figure 1), seeded the East African rift lakes-Lake Tanganyika (containing a largely endemic radiation) and Lake Malawi (conversely containing a single endemic), as well as rivers within East Africa.

| Diversification dynamics and rates
Although a constant rate (pure birth [PB]) for the African radiation is best supported by ML analyses (Table 3), it is only 2.14 times more likely than the next best variable rate (exponential PB) model. This result is reflected by the pattern of diversification generated using BAMM that showed speciation rates gradually decreased through time (Figure 2), but for which zero core shifts (i.e. 100% of the posterior distribution was not assigned to a rate shift) received the best BF score (BF = 1). Visual inspection of LTT plots generated with  n.a n.a n.a n.a Lake Tanganyika radiation (Ma

| Multiple independent colonization events into the Congo Basin
We found strong evidence for repeated independent colonization into the Congo Basin, mostly likely due to availability of habitats and resources within this complex hydrological system. This has occurred throughout the evolutionary history of African Mastacembelus between different adjacent regions, in which several lineages have subsequently diversified, albeit to a limited extent. These inferred colonization events are consistent with dispersal being the primary cause of lineage divergence with respect to the Congo Basin. Based on our densely sampled tree, we however show that the original (unconstrained) model allowing dispersal into, as well as out of, the Congo Basin (Table 2) is strongly preferred over the Congo Basin acting purely as a macroevolutionary source (M1) or sink (M2) region.
Studies of other taxa including cichlid fishes and tigerfish (Hydrocynus) also highlight repeated colonization of the Congo Basin (e.g. Day, Santini, & Garcia-Moreno, 2007;Goodier et al., 2011;Schelly, Salzburger, Koblm€ uller, Duftner, & Sturmbauer, 2006;Schwarzer et al., 2009) but have not explicitly tested between alternative biogeographic models. The importance of dispersal in shaping broad-scale biodiversity patterns both temporally and spatially has recently been suggested (Pigot & Tobias, 2014) in which rates of transition from allopatry to sympatry were indicated to be faster in more strongly dispersive clades, although this has not been tested at a more local, that is, regional level, or in freshwater fishes.
While dispersal was likely to be an important process to the build up of the ichthyofauna of the Congo Basin, our findings otherwise show all other areas to be remarkably conserved phylogenetically indicating considerable geographic constraints. Aggregation of distinct biogeographic clades, including the Congo Basin, was also shown analytically for Syndontis (Day et al., 2013), in which these catfishes in contrast to Mastacembelus have diversified to a great extent in situ in this region.

Several of the recent independent colonizations of the Congo
Basin by Mastacembelus are clades composed of Lower Congo River T A B L E 3 Testing models of diversification. Models include: pure birth (PB); birth-death (BD); PB with an exponential rate (PBkexp); BD with a constant speciation rate and exponential extinction rate (BDkcst-lexp); BD with a exponential speciation rate and constant extinction rate (BDkexp-lcst); BD in which speciation and extinction rates are both exponential (BDkexp-lexp). Log-likelihood (AICc) and the difference in AICc with the best model (DAIC) are shown, along with Akaike Weights (AW). Parameters refer to the estimated rates at the tips and the corresponding time-variation parameter. The best model is indicated in grey. f = number of sampled taxa/total of that clade (see section on lineage diversification)  (Figure 1, denoted in bold text; see also Alter et al., 2015). While LT Mastacembelus are endemic, the LT species M. cunningtoni has also been recorded in the Lukuga River (Kullander & Roberts, 2011 | 2315 Congo River system. Such emigration has been documented in other fishes, for example, lamprologine cichlids in which several species are suggested to have colonized the Congo River from LT (Day et al., 2007;Schelly et al., 2006). Notably, the colonization of LT from the Congo Basin contrasts to other evolutionary radiations from this system that have conversely been colonized from East Africa (e.g. Daniels et al., 2015;Day et al., 2013).
Applying a similar analytical framework to other freshwater continental clades will bring important insights into the processes underlying the high levels of diversity within the Congo Basin.

| Diversification dynamics of a continent-wide radiation
During the time period of diversification suggested by our findings,  (Derryberry et al., 2011). That study and several others focusing on African continent-wide radiations (Day et al., 2013;Liedtke et al., 2016) have suggested that either these clades are too young to have reached their ecological limit, or that tropical continental diversification may not be as limited by ecological opportunities. The latter suggestion is particularly noteworthy considering that Africa, as the world's second large land mass, totals c. 30 million km 2 . In contrast to our findings for the continental radiation, the LT clade (including non-endemics) was, however, supported by a variable rate model. Time variable models support a general trend for declining speciation rate over time (see Morlon, 2014 and references therein). This trend is often interpreted as a density-dependent effect from the saturation of niche space following adaptive radiation (Morlon, 2014), which is a common pattern identified in insular systems (e.g. Day et al., 2008;Reddy et al., 2012). Ultimately, further studies focused at the continental scale are needed to determine if these patterns are supported across other tropical groups, in addition to understanding the factors responsible for their diversification.

| Molecular divergence dates
Although our study employed a single fossil calibration that may bias age estimates (see Ho et al., 2015 and references therein), its inclusion produced relatively similar divergence estimates across mtDNA and nuDNA analyses considering 95% confidence intervals (Table 1), although we acknowledge that these were broad. We note that the time frame of the LT radiation is congruent with studies of other non-cichlid LT fish groups (at a similar taxonomic level) that have used limited fossil calibrations (e.g. Day et al., 2013;Peart, Bills, Wilkinson, & Day, 2014), providing validation of the selected calibration. In contrast, although the geologic calibration, placed on a shallow node, produced similar estimates for the mtDNA analyses to those generated from the fossil calibration based on a deep node, its effect on the nuclear data was striking in consistently estimating older divergence dates across all nodes. Calibrations at the root or deeper nodes are generally preferred to shallower nodes as substitution rate estimates are mostly based on the branches between the calibrating nodes and the tips, and as such deeper calibrations capture a larger proportion of the overall genetic variation (Duchêne, Lanfear, & Ho, 2014). This explains why the nuclear derived estimates using the geologic calibration have had a greater impact on node ages, as these branches between the calibrating node and the tips contain considerably less genetic variation than those for the mtDNA data. Nuclear loci from more rapidly evolving markers, and additional fossil calibrations, would allow future researchers to potentially generate more accurate divergence dates.

ACKNOWLEDG EMENTS
We thank Heinz B€ uscher, J€ org Freyhof and Uli Schiewen for providing additional samples, and Simon Ho and Nick Matzke for advice with BEAST and BioGeoBEARS software respectively. Michael Dawson, Richard Ree and three anonymous referees are thanked for insightful comments on this manuscript.