Unravelling the evolution of Africa’s drainage basins through a widespread freshwater fish, the African sharptooth catfish Clarias gariepinus

The formation history of Africa's current river basins remains largely unknown. In order to date changes in landscape and climate, we studied the biogeography of the African freshwater fish with the largest natural distribution. We also validated biogeographical units.


| INTRODUC TI ON
Africa has a complex hydrology that is characterized by numerous inland deltas, palaeolakes, cataracts and elbows of river captures.
All are reminders of a highly dynamic past (Goudie, 2005). Even for the largest rivers on the continent, the question remains how and when they obtained their present-day form. Major large-scale geological events such as the formation of the central African shear zone (Fairhead, 1988), the East African rift (Stankiewicz & de Wit, 2006) and the Kalahari uplift (Cotteril & de Wit, 2011) were pivotal in shaping the boundaries of the current drainage patterns. Tracing the origin and the timing of associated changes in hydrology remains difficult when solely relying on the stratigraphic record (Watchman & Tweddle, 2002). Yet, the evolutionary histories of aquatic organisms can provide proxies to identify, quantify and date changes in hydrology (Burridge, Craw, Jack, King, & Waters, 2008;Skelton, 1994).
Cycling between dry and wet periods in the Pleistocene and Pliocene impacted the evolution of Africa's landscapes and fauna, including early hominids (deMenocal, 2014;Maslin et al., 2014). This climate cycling led to alternate expansions and contractions of savanna-and forest-like habitats, in parallel with alterations of deepand low-water stands in Africa's Great Lakes (Malinsky & Salzburger, 2016). In tropical rivers, such changes in climate triggered the instances of immigration, extinction and allopatric divergence (Tedesco, Oberdorff, Lasso, Zapata, & Hugueny, 2005), creating the current fish faunas. Using similarities between fish faunas allowed ichthyologists to divide Africa into ten ichthyofaunal provinces (Snoeks & Getahun, 2013). These are: the Congo, East Coast, Lower Guinea, Maghreb, Malagasy, Nilo-Sudan, Quanza, Southern, Upper Guinea and Zambezi provinces (Figure 1). When taking ecological similarity into account, a more fine-scale subdivision of the continent in 95 freshwater ecoregions has also been proposed .
Clarias gariepinus (Burchell, 1822) is the African freshwater fish with the largest (natural) distribution (Skelton, 2001). It is a true generalist that occurs in almost all freshwater systems throughout continental Africa, except for the extreme north-and south-west, and in the Levant and southern Anatolia, and is thus well suited to validate the boundaries of biogeographical entities using genetic data (Arndt et al., 2003;Giddelo, Arndt, & Volckaert, 2002;Rognon et al., 1998).
Clarias gariepinus has an omnivorous diet, is tolerant of a wide range of water temperatures and is often ecologically dominant. It endures harsh conditions and tolerates high turbidity, low levels of oxygen and desiccation. The species has remarkable drought adaptations such as a secondary suprabranchial organ that allows it to take up atmospheric oxygen. It can move overland to escape the driest conditions and is frequently the last or only species inhabiting diminishing pools, poorly oxygenated swamps or drying rivers (Skelton, 2001). The species is rendered paraphyletic by both C. anguillaris (Linnaeus, 1759), which has a similar ecology as C. gariepinus, and by the nine species of Bathyclarias Jackson, 1959(Angnèse & Teugels, 2001Rognon et al., 1998), which became adapted to deep water conditions. All of these are fully sympatric with C. gariepinus, with the former being widespread in western and northern Africa, and the latter nine being endemic to Lake Malawi. Although it cannot be excluded that C. gariepinus contains undescribed variation, for the phylogeographical study presented here, working with a monophyletic lineage is more important than with a taxonomic unit. Hence, we will refer to the assemblage of C. gariepinus, C. anguillaris and the species flock of Bathyclarias as C. gariepinus sensu lato (sl.). F I G U R E 2 TCS haplotype network of the cytb sequences of C. gariepinus sl. Seven clusters were identified in the network (letter codes), some of which were further divided into subclusters (numbers). Vertices were coloured based on the ichthyofaunal provinces from which the samples originate as indicated on the map to the right, their sizes denote the number of samples. Using the dominant origin of their haplotypes, clusters were labelled as: Nilo-Sudan (NS), Congo (C), Kivu (K), Great Lakes (GL), Southern (S), Tana (T) and Zambezi (Z) [Colour figure can be viewed at wileyonlinelibrary.com] While the evolutionary history of Hydrocynus, which are restricted to warm, well-oxygenated, large rivers and lakes, is useful to track changes in the topology of major river channels (Goodier et al., 2011), the history of C. gariepinus will also tell the story of swamplands, headwaters and relict populations. Additionally, while the isolation led to speciation within Hydrocynus, Synodontis and Mastacembelus, this was prevented in C. gariepinus by its niche width and dispersal abilities. This lack of speciation allowed signatures of subsequent biogeographical events to become embedded in the genomes of single populations.
We infer a continental mitochondrial phylogeography of C. gariepinus sl., and provide a regional microsatellite distribution pattern to test the biogeographical potential of C. gariepinus with two evolutionary hypotheses: (1) the current genetic structure of the species can be used to validate biogeographical entities and (2) its phylogenetic history can be used to date and reconstruct patterns of landscape evolution and climatic events.

| Data collection
We sequenced fish sampled at 97 sites across Africa (Table S1; Figure 1), including those analysed for other markers by Giddelo et al. (2002) and Arndt et al. (2003). Fin clips were preserved at collection sites in absolute ethanol or in salt-saturated dimethylsulphoxide. We extracted DNA using proteinase K and CTAB buffer or using the NucleoSpin Tissue kit (Macherey-Nagel) and amplified the mitochondrial cytochrome b (cytb) gene using the L14724/H15915 primer combination (Irwin, Kocher, & Wilson, 1991) and the following protocol: 94°C for 60 s, followed by 30 cycles of 94°C for 30 s, 56°C for 30 s, 72°C for 40 s and final extension at 72°C for 5 s.
We added all sequences on NCBI GenBank for which data on fieldbased collection were available (AF126823-24; c [seven sequences,
We grouped specimens according to origin, that is, ichthyofaunal provinces (Snoeks & Getahun, 2013; Table S1). We identified clusters and subclusters in the network, based on a combination of geographical and genetic distinctness (Table S2).

| Genetic landscape shape interpolation
We investigated the spatial patterns of genetic differentiation by genetic landscape shape interpolation analysis implemented in Alleles in Space v1.0 (Miller, 2005). This analysis is based on a Delaunay triangulation connectivity network in which residual genetic distances are derived from a linear regression of genetic versus geographic distances (recommended for datasets with substantial variation in distances between sites; Manni, Guerard, & Heyer, 2004). We set grid size to 1 × 1 degrees, and used a distance weighting parameter α = 1. We obtained qualitatively similar results with different grid sizes and parameters (not shown).

| Molecular diversity indices
We calculated the number of unique haplotypes, the number of polymorphic sites, the nucleotide diversity and the mean number of pairwise distances in Arlequin v3.5.2.2 (Excoffier & Lischer, 2010). With the same software, we assessed departures from the mutation-drift equilibrium, by calculating Tajima's D (Tajima, 1989) and Fu's F S (Fu, 1997). We ran 1,000 coalescent simulations to assess significance of the parameters. We calculated all indices and statistics for C. gariepinus sl. for the entire continent, for each ichthyofaunal province, and for the main clusters identified in the haplotype network (Table S2).
We assessed nodal support for the ML tree with 1,000 bootstrap replicates. For BI, Metropolis-coupled Markov chain Monte Carlo (MCMC) simulations (two independent runs, 2 million generations, eight chains, 25% burn-in) provided the posterior trees and model parameters. Split deviation frequencies were <0.01, indicating that MrBayes runs were run long enough. We assessed stationarity of chains and convergence of parameter values in Tracer v1.6 (http:// tree.bio.ed.ac.uk/softw are/tracer) prior to constructing a 50% majority rule consensus tree. The post-burn-in effective sample sizes (ESS) for all parameters were >200, indicating that sampled parameter values accurately represented the posterior distribution (Kuhner, 2009).
We inferred a chronogram with BEAST v1.8.0 (Drummond & Rambaut, 2007), employing a Bayesian skyline tree prior and a strict clock model. We selected this model as we mainly dealt with intraspecific data and as preliminary analyses in TREE-PUZZLE v5.3 (Schmidt, Strimmer, Vingron, & von Haeseler, 2002) indicated that a clock-like evolution could not be rejected at a 0.05 significance level. We ran two independent MCMC chains for 25 million generations, and sampled model parameters every 1,000 generations.
We used LogCombiner (BEAST package) to combine the two chains, after having discarded the first 20% of generations as burn-in. We assessed the stationarity and convergence of parameter values in Tracer v1.6. Pooled post-burn-in ESS were >200. We computed a maximum-clade-credibility tree in TreeAnnotator (BEAST package), which we visualized with FigTree v1.4.1 (Rambaut, 2014).
We calculated divergence times as mean node heights of the 95% highest posterior density (HPD) intervals. To calculate absolute divergence times, we assumed substitution rates of 0.75 and 2.2% per MY, a range typically observed and applied for cytb in fish (e.g. Bermingham, McCafferty, & Martin, 1997;Doadrio & Carmona, 2004;Zhao et al., 2009).
We performed ancestral area reconstructions in RASP v3.2 (Yu, Harris, Blair, & He, 2015) with both the S-DIVA (Yu, Harris, & He, 2010) and S-DEC model (Beaulieu, Tank, & Donoghue, 2013;Ree & Smith, 2008). To take topological uncertainty into account, we used 1,000 randomly sampled post-burn-in trees from both the MrBayes and the BEAST analyses as input. The five ichthyofaunal provinces from which fish were sampled were used as area definitions: East Coast, Congo, Nilo-Sudan, Zambezian and the Southern province.
As samples from 'Lake Albert' originated both from the Lake proper as from above the escarpment, they were, for this analysis, included in the East Coast province. We defined an ancestral range to include no more than two adjacent areas, as this is also the current maximum observed range of haplotypes. We visualized the results of the RASP analyses on a map, taking the current distribution of terminal nodes into account.
We amplified all loci using the QIAGEN Multiplex PCR kit (QIAGEN), following the conditions in Table S3, and ran multiplex PCR products with an internal size standard (500LIZ, Applied Biosystems), on an ABI 3130 automated capillary DNA sequencer (Applied Biosystems).
We conducted fragment analysis in GENEMAPPER v4.0 (Applied Biosystems) following the recommendations of Larmuseau, Raeymaekers, Hellemans, Van Houdt, and Volckaert (2010 2005). Scoring errors were detected at loci Cga09 and Cba19, which were excluded from further analyses. We ran 10%-30% of the samples twice for all markers to verify reproducibility.

| Haplotype network
We identified seven clusters in the TCS network (  Table S2). The K cluster occupied a central position in the network, but was closest to the genetically diverse C cluster (3 vs. 7-9 mutations respectively). Although most samples from the Congo province had C cluster haplotypes, specimens from the basin's northern, southern and eastern boundaries belonged to the NS, S and GL clusters respectively. A more complex pattern was observed in the Zambezi province. Here, all specimens from the western sector of the province, which includes the Kunene, the Okavango and the Upper Zambezi, carried S cluster haplotypes. Specimens from the province's eastern sector: the Lower Zambezi and Lake Malawi, all bore C cluster haplotypes. In the central sector: the Middle Zambezi and the Limpopo basin, specimens mostly had Z cluster haplotypes. All specimens from the Southern province bore S cluster haplotypes except for some from coastal streams near the border with the Zambezi province that had Z cluster haplotypes. Specimens from the East Coast province had haplotypes belonging to the K, GL and T clusters. In the Nilo-Sudan province, all specimens bore NS haplotypes, except those from Lake Albert, which had GL cluster haplotypes.
As we noticed additional structuring in the NS, Z, S and C clusters, we divided them into two, three, four and six subclusters respectively (Table S2; Figure 2). Within the NS cluster, one subcluster consisted of sequences of C. anguillaris from West Africa, and C. gariepinus from the lower Nile and the Ubangi (Congo basin), whereas the other was restricted to C. gariepinus specimens from the lower Nile. Two of the three Z subclusters consisted of haplotypes that mostly occurred in sympatry in the central zone of the Zambezi province, although the range of one of them also extended into the Southern province. The third Z subcluster was restricted to populations from the Pongola River, which also harboured specimens with S cluster haplotypes.
Three of the four S subclusters were mostly confined to populations from the Southern province. Specimens of the fourth S subcluster, which forms the link with the remainder of the network, stem from the Upper Lualaba and Upper Kasai (Congo basin), and from the Cunene, Okavango and Upper and Middle Zambezi (Zambezi province). Three of the six subclusters of the genetically diverse C cluster were restricted to the Congo basin. Here, haplotypes of one of these occurred throughout the basin whereas those of the other two were restricted to the Lufira River in the basin's south. Another C subcluster was shared between C. gariepinus from the Bangweulu-Mweru ecoregion (south-eastern Congo) and Bathyclarias from Lake Malawi. The last two C subclusters were restricted to specimens from Lake Malawi and the Lower Zambezi respectively.

| Genetic landscape shape interpolation
The genetic landscape analysis revealed two large zones of low genetic differentiation in C. gariepinus sl., one at the northern and one at the south-western part of the continent (Figure 3). We also observed low genetic differentiation along the borders of the Congo basin. Yet, sharp differentiation was present within the basin itself, especially in its northern and southern parts. Genetic differentiation was lower in the east, which is in line with the central position of Kivu specimens in the network. The East African Great Lakes (except Lake Malawi) also formed a region with low genetic differentiation.
Sharp differentiation was present within the Zambezi province, reflecting the distinctness of its eastern, central and western sectors.
On a smaller geographical scale, the analysis revealed the distinction between populations from the upper and the lower reaches of the easternmost rivers of the Southern province.

| Molecular diversity indices
Diversity indices are summarized in diversity, whereas the Congo and Nilo-Sudan provinces harbour the most diverse C. gariepinus assemblages. In none of the provinces did Tajima's D deviate significantly from zero, whereas Fu's F S was significantly negative for the whole continent and for the Zambezi province. We also calculated these parameters for the seven TCS clusters with more than 10 haplotypes. This revealed high diversity in the C and NS clusters, and lower diversity in the Z and S clusters.
The sample size of the NS cluster (N = 10) could, however, already have been too small to make reliable estimates. Tajima's D did not significantly deviate from zero in any of the clusters whereas Fu's F S was negative for the Z and T clusters, indicating recent population expansion.

| Phylogenetic reconstruction, molecular dating and ancestral area reconstruction
PhyML (ML) and MrBayes (BI) yielded highly similar phylogenetic trees ( Figure S4a,b), but the BEAST-derived tree deviated in topology ( Figure S5). Although all trees contained the same major clades, we observed large differences in their branching order. As these exclusively concerned nodes with low statistical support, the biogeographical hypotheses presented here should be interpreted with care (Figure 4d,e). Using 2.2% as the rate of molecular evolution, the BEAST-derived approach revealed that the split between C.
We used the MrBayes-and BEAST-derived trees as input files for RASP. The results for the S-DIVA (Figure 4a,b) and the S-DEC ( Figure S6a,b) models were, overall, highly similar. However, the latter suggested more alternative scenarios and did not resolve the ancestral area for several nodes of the MrBayes tree. As this suggests that the S-DEC model could be less suitable to explain our data, we choose to focus on the area reconstructions provided by S-DIVA.
This model suggested, for both input trees, that the common ances-   (2013)

| Microsatellite genotyping
We investigated the population structure of C. gariepinus in the  (Clarke, 1993). STRUCTURE revealed a separation into the same hypothetical clusters. Although this analysis also assigned specimens from Mwaba (LUT) to the third cluster, total population assignment was only 68% ( Figure 5;

| Validating biogeographical entities
We hypothesized that the genetic structure of C. gariepinus sl.
can be used to validate biogeographical entities such as proposed ichthyofaunal provinces. Although strong geographical patterns were evident from the haplotype network and the genetic landscape shape analysis (Figures 1-3), these did not fully match with the ichthyofaunal provinces. Most samples from the Nilo-Sudanic province grouped into a single cluster, confirming the homogeneity found in C. gariepinus and C. anguillaris of the region (Rognon et al., 1998), and that of their monogenean parasites (Barson, Přikrylová, Vanhove, & Huyse, 2010). This can be explained by previous connections between the region's main rivers during wetter phases of the late Quaternary (Drake & Bristow, 2006). The NS cluster also included the sample from the Ubangi (Congo basin), which supports the treatment of the Ubangi basin (save the Uele) as a distinct freshwater ecoregion: the Sudanic Congo. This ecoregion harbours many aquatic species of Nilo-Sudanic origin (Peck & Thieme, 2005) and is considered transitional between the Congo and the Nilo-Sudanic provinces . Samples from Lake Albert did not cluster with other Nilo-Sudanic samples, but rather with those from the Great Lakes. This is remarkable as Snoeks and Getahun (2013) assigned the lake to the Nilo-Sudanic province, based on its ichthyofaunal similarity with the Nile.
Populations from the East Coast province were highly differentiated as samples from three distinct locations, Lake Kivu, the Tana River and lakes Edward and Victoria, occurred in different clusters, confirming Giddelo et al. (2002). Samples from Lake Kivu were ancestral to the C lineage, which spread throughout the Congo province. This seems in line with current drainage patterns as Lake Kivu drains, via Lake Tanganyika, into the Congo. However, Lake Kivu was not assigned to this province as its original fauna experienced extinction phases caused by the basin's turbulent tectonic history (Snoeks, De Vos, & Thys van den Audenaerde, 1997). We hypothesize that, due to its high physiological tolerance, C. gariepinus might have survived these adverse conditions. Hence, the current population might be one of the few surviving relicts of the original Kivu fauna. Additionally, the samples of C. gariepinus from Lake Tanganyika bore haplotypes that were identical to those found in Lake Victoria. An affinity between Tanganyikan and Victorian faunas is also known for the lakes' cichlid species flocks. Here, the ancestors of the haplochromines of lakes Kivu, Edward and Victoria can be traced to Lake Tanganyika (Verheyen, Salzburger, Snoeks, & Meyer, 2003 (Klett & Meyer, 2002 Skelton (1994), who noticed that present-day watersheds divide rather than circumscribe the northern boundaries of the Zambezian ichthyofauna. Parasitological data derived from monogeneans of cichlids (Vanhove et al., 2013) and clariids (Barson et al., 2010)  were shown to be sister to an endemic Congo lineage, pointing at a very old connection (Schultheiss et al., 2014). Some of the Lake's three species of Opsaridium Peters, 1854 (Cyprinidae) might also have their origin in the Congo basin (Sungani et al., 2017). Finally, a parallel can be drawn between the evolutionary mechanisms that gave rise to the radiations of Bathyclarias and haplochromine cichlids in Lake Malawi. In both cases, the presumed parent species of these radiations, C. gariepinus and Astatotilapia calliptera (Günther,   Table 2, colours denote the three microsatellite-based clusters derived using STRUCTURE with I: blue, II: green and III: red [Colour figure can be viewed at wileyonlinelibrary.com] 1894), respectively, occur in sympatry with the 'offspring species'.
Phylogenies of these two radiations contain members of the parent species and the members of the two species flocks form clades within the diversity of the parent species (Agnèse & Teugels, 2001;Malinsky et al., 2018). Strangely, a widespread haplotype from the central part of the basin was also found in the Upper Lufira, where it occurred in sympatry with a highly divergent C2 haplotype. Although it cannot be excluded that the former stems from an anthropogenic introduction, this seems unlikely as microsatellite data did not reveal any evidence of introgression. The deviant haplotype, which is most closely related to one found in Lake Malawi  (Barson et al., 2010). The western zone, which is known to have a different ichthyofauna than the rest of the province (Skelton, 1994), included the western part of the Zambezi, and the entire Southern province.
Originally, both the western and the central part of the Zambezi system drained to the Indian Ocean via the Limpopo (Stankiewicz & de Wit, 2006). Uplift severed this connection, and these rivers became connected with the Middle and Lower Zambezi, creating the Victoria Falls. Yet, it has been hypothesized that during the mid-Pleistocene, this connection was disrupted again, giving rise to a vast lake, Palaeo Makgadikgadi (Joyce et al., 2005). As this lake underwent several cycles of expansion and retraction, its formation has been difficult to date geologically (Derricourt, 1976 2000). This scenario also explains why the ecologically tolerant C.
gariepinus was able to disperse southward, whereas more specialized taxa, such as species belonging to Hydrocynus, Synodontis or Mastacembelus, failed to do so.
The eastern border between the Southern and the Zambezi ichthyofaunal provinces is not well defined, and many coastal streams have a fish fauna that is a mixture of both provinces (Skelton, 1994).
The genetic structure of the C. gariepinus populations of these basins revealed how these mixed faunas emerged. The lower reaches are dominated by typical 'central Zambezian' Z cluster haplotypes, whereas in the upper reaches, specimens are found with haplotypes related to those from the rest of the Southern province. This suggests multiple colonizations of these rivers by river capture events in the highlands, in parallel with invasions via the coastal plain.
Of interest is that the distribution of S-and Z-type haplotypes matches the distinction between two freshwater ecoregions, whose boundaries do not follow ichthyofaunal provinces: the 'Zambezian lowveld', downstream, and the 'southern temperate highveld', upstream . The separation between the up and downstream populations of C. gariepinus highlights the importance of waterfalls in maintaining the isolation of the ichthyofaunal communities of both ecoregions.

| Reconstructing landscape evolution and climatic events
Clarias gariepinus is unique because of its almost continuous distri- Pleistocene fluctuations in climate also shaped the distribution of terrestrial taxa. The phylogeography of large savanna mammals typically reveals three major clades that match with refugia in the continent's western, eastern and southern savanna zones (Hewitt, 2004). These zones are mirrored in the phylogeographical patterns of C. gariepinus. The MrBayes-derived phylogeography (Figure 4a) of C. gariepinus can be interpreted as branching in a forest and a savanna clade, with the latter consisting of subclades occurring in these three savanna zones. The BEAST-derived phylogeography reveals a similar pattern, although here, the 'forest' clade clusters within the eastern group (Figure 4b). The genetic landscape analysis revealed two large zones of low genetic differentiation: one at the northern and one at the south-western part of the continent.
These regions currently contain some of the driest regions of Africa: the Sahara and the Sahel in the north, and the Kalahari, Namib and Karoo in the south. At first sight, this is counterintuitive as one would expect these regions to represent barriers to fish migration.
Yet, the occurrence of very closely related haplotypes within and close to these current arid zones reflects the presence of a permeable landscape matrix for C. gariepinus in the relatively recent past. These conditions would have been present during the African wet period that followed the last glacial maximum. In the north of the continent, the now extinct megalakes of the Sahara, of which Palaeolake Megachad was the largest, provided the opportunity for faunal connections between Nile, Niger and Ubangi (Drake & Bristow, 2006). In the south, Palaeolake Makgadikgadi might have played a similar role (Joyce et al., 2005). Interestingly, this landscape might not have been equally permeable for other African freshwater fish. In the Nilo-Sudanic province, conspecific populations of Synodontis were significantly divergent in mitochondrial markers, whereas this clade failed to spread to the Southern ichthyofaunal province (Day et al., 2013).
The ecological characteristics and distribution of C. gariepinus make it an ideal choice to study successive stages of landscape evolution. The sole caveat, however, is that since the species became well-established in aquaculture, human-induced translocations have left signatures in its genome. Due to translocation, the species has become part of the non-indigenous fish community in many tropical and subtropical regions (Radhakrishnan, Lan, Zhao, Qing, & Huang, 2011;Vitule, Umbria, & Aranha, 2006). For the current dataset, this is, of course, the case for the feral population from the Western Cape (Cambray, 2003).