Contrasting geographic structure in evolutionarily divergent Lake Tanganyika catfishes

Abstract Geographic isolation is suggested to be among the most important processes in the generation of cichlid fish diversity in East Africa's Great Lakes, both through isolation by distance and fluctuating connectivity caused by changing lake levels. However, even broad scale phylogeographic patterns are currently unknown in many non‐cichlid littoral taxa from these systems. To begin to address this, we generated restriction‐site‐associated DNA sequence (RADseq) data to investigate phylogeographic structure throughout Lake Tanganyika (LT) in two broadly sympatric rocky shore catfish species from independent evolutionary radiations with differing behaviors: the mouthbrooding claroteine, Lophiobagrus cyclurus, and the brood‐parasite mochokid, Synodontis multipunctatus. Our results indicated contrasting patterns between these species, with strong lake‐wide phylogeographic signal in L. cyclurus including a deep divergence between the northern and southern lake basins. Further structuring of these populations was observed across a heterogeneous habitat over much smaller distances. Strong population growth was observed in L. cyclurus sampled from shallow shorelines, suggesting population growth associated with the colonization of new habitats following lake‐level rises. Conversely, S. multipunctatus, which occupies a broader depth range, showed little phylogeographic structure and lower rates of population growth. Our findings suggest that isolation by distance and/or habitat barriers may play a role in the divergence of non‐cichlid fishes in LT, but this effect varies by species.

Lake Tanganyika catfishes (~34 species), as with cichlids from this lake, comprise multiple independent radiations (Day, Bills, & Friel, 2009;Day & Wilkinson, 2006;Peart, Bills, Wilkinson, & Day, 2014;Wright & Bailey, 2012) albeit on a smaller scale. They also share a similar habitat range, from deep to littoral waters, with littoral species vulnerable to displacement by lake-level changes. Here, we focus on two species with lake-wide distributions selected from independent evolutionary radiations, the claroteine (15 described LT species), Lophiobagrus cyclurus, and the mochokid (≈11 LT species), Synodontis multipunctatus. L. cyclurus is a mouthbrooder (Ochi, Rossiter, & Yanagisawa, 2002), a trait shared with many cichlid species, which may reduce dispersal ability through extended parental care, whereas S. multipunctatus is a known brood parasite of multiple cichlid species (Sato, 1986). Both species occur over rocky shores, although while L. cyclurus is thought to be restricted to the littoral zone (Bailey & Stewart, 1984), S. multipunctatus can also tolerate a greater depth range (sampled up to 170 m, Coulter, 1991). Fossil-calibrated molecular dating indicates that both L. cyclurus and S. multipunctatus diverged from their sister species before the onset of climatic fluctuations in the Late Pleistocene (Day et al., 2013;Peart et al., 2014), suggesting that the present-day geographic distributions of these species do not reflect the location of a recent origin.
The high sample number required for the techniques previously employed to investigate populations of LT taxa, for example, mitochondrial sequences, microsatellites (Koblmüller et al., 2015;Wagner & McCune, 2009), and AFLPs (Egger et al., 2007), has been a barrier to undertaking intraspecific studies on poorly sampled non-cichlid taxa.
Here, we use restriction-site-associated DNA sequencing (RADseq) to generate high numbers of loci per individual, allowing robust estimates of differentiation to be calculated using small numbers of individuals (e.g., Willing, Dreyer, & van Oosterhout, 2012). RAD loci were used to compare population structure in the two focal species to address the following questions: (1) Is there any evidence of lake-wide phylogeographic structure? (2) Is there evidence of population growth at each site and is this more pronounced at the more recent shallower shorelines? (3) Is there localized population structure in L. cyclurus in areas with potential habitat barriers?

| METHODS
A total of 57 individuals from the focal species (33 L. cyclurus and 24 S. multipunctatus specimens), sampled from four main locations around LT (Figure 1, Table S1), were included in this study. Samples were collected from one steeper shoreline (Kigoma) and three shallower shorelines (Bujumbura Rural, Mpulungu, Sumbu) ( Figure 1 in Scholz et al., 2007). In addition, as the relationship between L. cyclurus and its hypothesized sister species, the paternal mouthbrooder L. aquilus (Ochi et al., 2002), has not previously been resolved (Peart et al., 2014), seven L. aquilus specimens from Zambia (Table S1) were also included. RADseq libraries were constructed following a protocol modified from Baird et al. (2008). Briefly, SbfI was used to digest the samples and each sample was individually barcoded. Samples were pooled into four libraries, each containing 16 samples, and size selection was then performed to select fragments 300-700 bp. The libraries were amplified using 17 PCR cycles and sequenced across two lanes of 100-bp paired-end Illumina Hi-Seq2500 (two libraries per lane).

| Bioinformatic processing
Bioinformatic processing was conducted as in Hoffman et al. (2014). In brief, the Stacks pipeline v1.08 (Catchen, Hohenlohe, Bassham, Amores, & Cresko, 2013) was used for demultiplexing and run using the forward reads following a superparent approach. The paired-end reads for each locus were then assembled using Velvet (Zerbino & Birney, 2008) and used to generate a reference with the first read connected to the paired-end by a string of Ns. All tags with more than one assembled contig were eliminated to avoid including instances where several tags had been collapsed into one tag in the clustering process, which ignores the second read. The Lophiobagrus reference consisted of 33,833 contigs with lengths ranging from 595 bp to 1,140 bp. The S. multipunctatus reference consisted of 26,472 contigs with lengths ranging from 592 bp to 1,133 bp. The Burrows-Wheeler algorithm was used to map each sample using BWA-MEM (Li, 2013), duplicates were removed using picard-tools 1.102 (http://picard.sourceforge.net), and SNPs were called using GATK UnifiedGenotyper (McKenna et al., 2010). The resultant vcf file was filtered to include only high-quality calls with the following parameters; only biallelic SNPs, SNP quality score of 30, genotype quality score of 20, mapping quality score of 20, sites with coverage under five were excluded. In addition, in an attempt to avoid including repeated transposon regions in all datasets, sites with excessive coverage (the upper 5% of the coverage distribution) were excluded. Sites that were present in at least two individuals were included as it has been shown that stringent exclusion of missing data can bias the dataset (Huateng & Knowles, 2016). This resulted in very few called sites for the individual C137 from the Lophiobagrus dataset, and so, this individual was excluded from further analysis. The final Lophiobagrus dataset included 299,633 SNPs F I G U R E 1 Maximum-likelihood trees based on 2,628,515-bp and 3,356,174-bp alignments (bootstrap support: black circles 100%, gray circles >90%, white circles >80%) and structure plots, based on 23,739 SNPs and 16,443 SNPs for Lophiobagrus and Synodontis multipunctatus, respectively. Colors in the phylogeny depict collection locality. Green in the S. multipunctatus plot comprises sites from the northern basin. Sampling localities: BR -Bujumbura Rural, KG-Kigoma SM-Sumbu, MP-Mpulungu. Major rivers are shown on the map, the inflow of the (smaller) Lufubu river is denoted with an "L" of which 266,434 SNPs were variable in L. cyclurus. The S. multipunctatus dataset consisted of 107,321 SNPs.

| Population structure
Maximum-likelihood (ML) trees were calculated with the GTR+GAMMA model using RAxML 7.7.8 (Stamatakis, 2006) with 1,000 nonparametric rapid bootstraps. These analyses were performed on alignments containing both variant and nonvariant sites: 2,628,515 sites for the Lophiobagrus dataset and 3,356,174 sites for the S. multipunctatus dataset. This analysis was also repeated using alignments with no missing data across individuals leading to 350,347 sites for the Lophiobagrus dataset and 1,479,271 sites for the S. multipunctatus dataset.
Population structure was investigated using Bayesian clustering in structure 2.3 (Pritchard, Stephens, & Donnelly, 2000) using a matrix of variable sites. Structure uses unlinked markers, so only one SNP per contig was retained, leading to datasets of 23,739 sites for Lophiobagrus and 16,443 for S. multipunctatus. Five replicates per K value (K 1-8) were run using 100,000 generations as burn-in before sampling 100,000 generations using a model of admixture with correlated allele frequencies (Falush, Stephens, & Pritchard, 2003). Runs were collated using STRUCTURE HARVESTER (Earl & VonHoldt, 2011). Independent runs for the suggested K values were combined and averaged in CLUMPP (Jakobsson & Rosenberg, 2007) and the output visualized in distruct (Rosenberg, 2004). An additional analysis was repeated using datasets of variable sites present in all individuals using only one SNP per contig (1,938 Lophiobagrus and 5,116 S. multipunctatus). The package fineRADstructure (v. 0.2) (Malinsky, Trucchi, Lawson, & Falush, 2016) was also used to investigate structure in both the Lophiobagrus and S. multipunctatus datasets. These analyses were performed using loci with one to four SNPs in the forward read (using loci defined by Stacks), with no missing data, leading to datasets of 10,380 sites for Lophiobagrus and 13,028 for S. multipunctatus. Samples were assigned to populations using 100,000 iterations as burn-in before sampling 100,000 iterations. The trees were built using 10,000 iterations and the output visualized using the fineradstructureplot.r and finestructurelibrary.r R scripts (http://cichlid.gurdon.cam.ac.uk/fin-eRADstructure.html).
Structure within L. cyclurus and S. multipunctatus datasets was further investigated using principal component analysis (PCA) in the R packages "adegenet" (Jombart, Devillard, Dufour, & Pontier, 2008) and ade4 (Dray & Dufour, 2007). PCAs were conducted on datasets of variable sites present in all individuals using only one SNP per contig (2,065 L. cyclurus and 5,116 S. multipunctatus). F ST values were calculated using the index of Reich, Thangaraj, Patterson, Price, and Singh (2009) as this method performed well on small sample sizes in a recent simulation study (Willing et al., 2012). Analyses were completed using one SNP per contig with code modified from Rheindt, Fujita, Wilton, and Edwards (2013). 95% confidence intervals for these F ST values were calculated by jackknifing.
Admixture between populations of L. cyclurus was investigated with D-Statistics (Durand, Patterson, Reich, & Slatkin, 2011;Green et al., 2010) using each supported four-taxon tree (L. aquilus as outgroup). The significance of the D-statistic (from 0) is usually calculated by computing the standard error of the D-statistic using a block jackknife approach over linkage groups. This approach was not possible in this case, as in the absence of a reference genome, the order of RAD contigs is unknown, so reliability was assessed by randomly subsampling the dataset 1,000 times including 99%, 95%, 90%, 80%, and 70% of the data and calculating the mean and standard error of the D-statistic. These analyses were conducted using a modified version of a python script from Rheindt et al. (2013).
The statistical significance of the relationship between geographic distance and genetic distance was assessed using Mantel tests with 999 permutations in the R package ade4 (Dray & Dufour, 2007).
Straight-line distances between collection localities were calculated using the haversine method for calculating distances on the surface of a sphere. Distances along the lakeshore were calculated using ERSI ArcMap 10's Network Analyst package using the Detailed Water Bodies layer. Lakeshore distances were only calculated for L. cyclurus as this taxon was sampled from the littoral zone (<15 m). The analysis was performed using average genetic distance between collection localities as within location samples are pseudoreplicates. The uncorrected P-distances were calculated using the R package ape (Paradis, Claude, & Strimmer, 2004).

| Demography
Population trends in each population were estimated using a subset of the most variable loci by constructing Extended Bayesian Skyline Plots (EBSPs) in BEAST (Drummond, Suchard, Xie, & Rambaut, 2012).
We followed the protocol of Trucchi et al. (2014) to avoid overparameterization when analyzing RADseq datasets. For each population, this analysis was performed on three independent subsamples of 50 loci with four SNPs in the forward read (using loci defined by Stacks). Due to a shortage of four SNP loci found in all individuals within a population, loci found in at least seven individuals were used for S. multipunctatus samples from Mpulungu, and loci found in at least six individuals were used for analyses on L. cyclurus samples from Kigoma and S. multipunctatus samples from Bujumbura Rural and Kigoma. Each analysis was run until convergence, with Tracer (Rambaut & Drummond, 2009) used to visualize convergence and effective population sizes. Sample C171 was excluded from this analysis due to its placement in the phylogeny and population structure analyses (Figure 1). In the absence of any rapidly evolving markers with a known mutation rate to calibrate this analysis, it was not possible to date demographic events.

| Population structure
Lophiobagrus cyclurus is supported as monophyletic in the maximumlikelihood tree (100%, Figure 1 and Figure S1), which was not observed in previous analyses (Peart et al., 2014), although the structure and FineRADstructure analysis (K = 5, Figures 1, S2 and S3) highlight an admixed sample, C228, which is derived almost equally from the L. aquilus and Sumbu clusters. All analyses show a clear separation between the northern and southern basins, as well as between Kigoma and Bujumbura Rural within the northern basin. In the southern basin, the two maximally supported clades correspond to collection locality, with a single exception (C171). This sample, from Sumbu, nests within the Mpulungu clade, and in the structure, analysis is comprised entirely of a cluster found in all Mpulungu samples (blue in Figure 1 and Figure   S2). The cluster most strongly associated with Sumbu (red) is also seen in the remainder of the Mpulungu samples. The same patterns within L. cyclurus were observed in the structure analysis at K = 4 ( Figure S4). This placement of C171 as close to but distinct from the Mpulungu samples is also supported by the FineRADstructure analysis ( Figure   S3). The samples from the southern basin are not supported as monophyletic in the maximum-likelihood tree constructed with no missing data ( Figure S1).
The population divisions in L. cyclurus are further supported by the PCA results ( Figure 2) that show three distinct clusters relating to Kigoma, Bujumbura Rural, and the Zambian sites on the first two axes (19.3% and 13.7% of the variation, respectively). The third PCA axis also shows separation between the Zambian sites ( Figure S5).  (Table 1) with greater gene flow between Kigoma and the Zambian sites than between Bujumbura and the Zambian sites, consistent with the larger distance between the latter sites. Subsampling of the data yielded similar values of the D-statistic with standard errors that do not include zero. The D-statistics increased in value when the analysis was repeated without the possible confounding sample C228 (Table S2).
In S. multipunctatus population structure was weaker. The maximum-likelihood tree strongly supports the separation of populations from the northern and southern basins (100%) but finds no support for further structure within the northern basin. Support for the separation of the northern and southern basins is lower in the phylogeny built with no missing data ( Figure S6). The structure analysis indicates a single population (K = 1); however, at K = 2, the northern and southern basins cluster separately (Figure 1 and Figure   S7). The fineRADstructure analysis shows a clear separation of the northern and southern basins and also weaker structure within the northern basin ( Figure S8). Similarly, the first two PCA axes represent 8.7% and 5.6% of the variation ( Figure S9) and show three clusters corresponding to the sampled populations, although these are less tightly clustered than in L. cyclurus (Figure 2). The F ST values were below zero, indicating a lack of population differentiation.

| Demography
In general, mixing in the EBSP analyses was poor, requiring long  Figure S10). All S. multipunctatus EBSP analyses show F I G U R E 2 PCA plots using 2,065 and 5,116 SNPs for Lophiobagrus cyclurus and Synodontis multipunctatus, respectively. Colors depict collection locality shallow population growth (Figure 4 and Figure S11). In Bujumbura Rural and Mpulungu, population growth is steeper in L. cyclurus than in S. multipunctatus, whereas both species show similar population trajectories in Kigoma.

| DISCUSSION
Biodiversity hotspots contain a disproportionate amount of the world's diversity and as a result have received considerable attention regarding the patterns and processes underlying the generation and maintenance of species diversity. For the East African Great Lakes however, the vast majority of studies have focused on the hyperdiverse cichlid fishes, which may not exemplify the patterns of evolution seen in other fish groups, and it is not known whether the same factors influence other taxa that have diversified in these water bodies. To begin to remedy this knowledge gap, we focused on non-cichlid taxa and show contrasting spatial patterns of phylogeographic structure in two broadly sympatric rocky shore catfish species from independent evolutionary radiations with different parental care strategies. We report strong lake-wide population structure in L. cyclurus, including within basin differentiation. Conversely, population structure in S. multipunctatus is weak, and although there is some support for population differentiation between the northern and southern basins from the phylogenetic and structure analyses, this pattern is not reflected in the F ST values suggesting that it is driven by a small number of SNPs. Such weak structure may be due to its dispersal ability at depth (Coulter, 1991). The only other LT species found at equivalent depths for which population structure has been investigated is the cichlid Boulengerochromis microlepis (Koblmüller et al., 2015) that showed similarly weak lake-wide phylogeographic structure. The extent to which brood parasitism in S. multipunctatus (Sato, 1986) facilitates dispersal is unclear, because this species parasitizes both stenotypic and less geographically restricted species, for example, Tropheus moorii, ; Simochromis diagramma (Wagner & McCune, 2009).
At a smaller spatial scale, a high degree of genetic structure is observed between L. cyclurus populations in Sumbu and Mpulungu, which are only ~78 km apart. Previous cichlid studies have shown that river inflows and associated deltas can constitute barriers to gene flow (Wagner & McCune, 2009), suggesting a role for allopatry in increasing genetic diversity. This may also be a factor here because the Lufubu river inflow separates the Sumbu and Mpulungu sites. Microallopatry has been suggested as an important mechanism in generating diversity in cichlid fishes in African Great Lakes (e.g., Van Oppen et al., 1997)   there are more missing data in the reconstructions from Kigoma and we cannot discount the possibility that the weak or no population growth seen in L. cyclurus might be related to a lack of resolution to accurately reconstruct the demographic history. Without an absolute calibration of the EBSPs, it is not possible to date the reconstructions; however, the signatures of population growth at both Zambian sites in L. cyclurus are consistent with results from several cichlid species where populations grew following the most recent low stand as new habitat became available (Koblmüller et al., 2011;Winkelmann et al., 2017). S. multipunctatus showed weaker population growth than L. cyclurus, suggesting that because of its greater depth range, its demographic history has been less influenced by past lake-level fluctuations.
The placement in both the phylogenetic and structure analyses of L. aquilus-C228 suggested possible interbreeding between this species and L. cyclurus. Notably, this specimen was collected in Mpulungu, despite its partial assignment to a genetic cluster most common in the Sumbu L. cyclurus population and was identified as L. aquilus based on the key of Bailey and Stewart (1984). Within L. cyclurus, there was admixture between all populations with the overall patterns reflecting geographic structure. Population subdivision in ancestral species can also influence gene trees and the relative proportion of ABBA/BABA SNPs (Eriksson & Manica, 2012). It is possible that this affected our results as population structure with L. aquilus has not been investigated and this species was sampled only from Zambia.
F I G U R E 3 EBSPs for Lophiobagrus cyclurus. The y-axis shows effective population size scaled by mutation rate (N e μ). Gray area represents 95% confidence interval F I G U R E 4 EBSPs for Synodontis multipunctatus. The y-axis shows effective population size scaled by mutation rate (N e μ). Gray area represents 95% confidence interval

| Conclusions and future directions
Investigating diversity patterns both within and between species in biodiversity hotpots allow conclusions to be drawn as to which mechanisms are responsible for such elevated diversity. Here, we report contrasting geographic structure at a broad spatial scale in evolutionary divergent LT catfishes using genomic data. Our study suggests that lake-level fluctuations have a role in structuring their diversity, although the genomic consequences of these differing histories remain to be studied. We recommend the study of additional species from non-cichlid radiations with different ecologies to provide a more comprehensive understanding of evolutionary patterns and processes in these systems.
Within LT catfishes restricted to the littoral zone, and in L. cyclurus in particular, fine-scale population structure warrants further investigation with additional sampling localities required. This study is limited by low sample sizes, and further sampling would allow the timing and extent of gene flow between populations to be investigated plus the identification of habitat barriers for this species. Additional sampling localities in the southern basin of LT may allow the placement of sample C171 to be further understood in a broader geographic context. Further L. aquilus samples from these localities would also allow estimates of the extent of gene flow between these species to be investigated.

ACKNOWLEDGMENTS
For facilitating fieldwork, we are grateful to Yohana Budeba, Ben We thank Roger Bills and George Kazumbe for their invaluable field assistance and Saskia Marijnissen for facilitating collecting in Burundi. We thank anonymous reviewers for improving earlier versions of this work. under COSTECH permit: no. 2010-03-NA-2009.

DATA ACCESSIBILITY
Sequence data are accessible on NCBI under SRP126168. Datasets used in the analyses are available on Dryad http://doi.org/10.5061/ dryad.671bs