POPULATION GENOMIC TESTS OF MODELS OF ADAPTIVE RADIATION IN LAKE VICTORIA REGION CICHLID FISH

Authors


Current address: Department of Biology, Reed College, 3203 SE Woodstock Blvd, Portland Oregon 97202.

Abstract

Adaptive radiation is usually thought to be associated with speciation, but the evolution of intraspecific polymorphisms without speciation is also possible. The radiation of cichlid fish in Lake Victoria (LV) is perhaps the most impressive example of a recent rapid adaptive radiation, with 600+ very young species. Key questions about its origin remain poorly characterized, such as the importance of speciation versus polymorphism, whether species persist on evolutionary time scales, and if speciation happens more commonly in small isolated or in large connected populations. We used 320 individuals from 105 putative species from Lakes Victoria, Edward, Kivu, Albert, Nabugabo and Saka, in a radiation-wide amplified fragment length polymorphism (AFLP) genome scan to address some of these questions. We demonstrate pervasive signatures of speciation supporting the classical model of adaptive radiation associated with speciation. A positive relationship between the age of lakes and the average genomic differentiation of their species, and a significant fraction of molecular variance explained by above-species level taxonomy suggest the persistence of species on evolutionary time scales, with radiation through sequential speciation rather than a single starburst. Finally the large gene diversity retained from colonization to individual species in every radiation suggests large effective population sizes and makes speciation in small geographical isolates unlikely.

The study of adaptive radiation, the rapid evolution of species and ecological diversity from a common ancestor, has received considerable attention in recent years (Schluter 2000; Gavrilets and Losos 2009). One of the most impressive examples are the 600 or more endemic species of cichlid fish that evolved in East Africa's Lake Victoria (LV) and nearby lakes within less than 250,000 years. Yet, key questions about the origins of this adaptive radiation remained unanswered. Work on individual pairs of phenotypically defined taxa has demonstrated divergence ranging from intraspecific phenotypic polymorphism with no neutral genetic differentiation (Seehausen et al. 2008; Magalhaes et al. 2010) to reproductively isolated sister species with strong phenotypic and significant neutral genetic differentiation (Seehausen et al. 2008; Magalhaes et al. 2009; Mzighani et al. 2010). Such data have been interpreted as evidence for the continuous nature of ecological speciation (Seehausen et al. 2008; Seehausen 2009; Seehausen and Magalhaes 2010). Other authors, however, have reported complete lack of genetic differentiation between phenotypically distinct species, and interpreted this as evidence for intraspecific adaptive radiation through phenotypic polymorphism without speciation (Samonte et al. 2007).

Rapid recent selection-driven speciation is expected to be associated with heterogeneous genomic differentiation due to the interactions between gene flow, divergent selection, and genetic hitchhiking against a background of very recent divergence (Wu 2001; Smadja et al. 2008; Via and West 2008; Nosil et al. 2009; Via 2009). In young adaptive radiations, species are therefore expected to share ancestral polymorphisms at most genetic loci, and this is the case in radiations of haplochromine cichlids (Moran and Kornfield 1993; Parker and Kornfield 1997; Nagl et al. 1998; Klein et al. 2007). After the completion of speciation, neutral polymorphisms are expected to be driven to fixation by drift in 4Ne generations (Kimura and Ohta 1969) on average with large variance across loci. Given the recent geological history of LV, which was dry for several thousand years at the end of the Pleistocene (Johnson et al. 1996; Stager and Johnson 2008), and a generation time of one year, fixation of many neutral polymorphisms would indeed be unexpected. This would only be likely if most of the cichlid species of LV originated as small populations instantaneously with immediate complete cessation of gene flow when the lake filled 14,600 years ago (i.e. requiring an effective population size of Ne << 3500 throughout the radiation period).

Adaptive radiations are often characterized by episodes of interspecific hybridization even well after onset of speciation (Grant and Grant 2008), and even between distantly related species (Salzburger et al. 2002; Schliewen and Klee 2004). Such occasional hybridization may not only slow down the fixation of neutral genetic differentiation but may sustain a fast rate of adaptive evolution (Rieseberg et al. 1999; Seehausen 2004; Mallet 2007; Grant and Grant 2008). Rapid ecological speciation relies on standing genetic variation (Barrett and Schluter 2008) but is thought to be associated with strong divergent natural and sexual selection, which quickly erode heritable variation for key traits (Dieckmann and Doebeli 1999; Kondrashov and Kondrashov 1999). Hence, to the extent that consecutive ecological speciation events involve divergent selection on the same key phenotypic traits, one event probably constrains opportunities for a second one in either of the descendent species, and further speciation may require waiting time for the recovery of genetic variation through mutation. Even though the radiation of LV cichlid fish involves many dimensions of ecology and phenotype, the same dimensions of divergence occur repeatedly among sister species, specifically red versus blue male nuptial coloration, bicuspid versus unicuspid tooth shape, numbers of tooth rows, and body shape. Such constraints may be alleviated by temporally or spatially contained episodes of gene flow between species, the syngameon hypothesis of adaptive radiation (Seehausen 2004).

Hence, we can formulate two models for alternative courses of adaptive radiation (1) speciation and adaptation happen around the same time (Givnish 1997; Schluter 2000) but occasional episodes of hybridization may or may not exist between species; (2) adaptive radiation as essentially an intraspecific phenomenon in which speciation happens only well after adaptation if at all (Givnish 1997). We refer to these alternative courses as the “ecological speciation model” and the “intraspecific polymorphism model” for adaptive radiation. Note that we use the term “ecological speciation” in the wider sense that includes speciation by natural selection, sexual selection, and their interaction. Note also that these models are not exclusive but constitute the two end points of a potential continuum. We are asking if cichlid fish radiations are closer to one of these end points than to the other. In the ecological speciation model, genetic differentiation is rapid and the polymorphism state transient and short lived. In the “intraspecific polymorphism model” phenotypic differentiation precedes speciation by a long time if speciation happens at all. Individual cases may exist among LV cichlids that would fit either of these alternatives. Thus, to distinguish between their prevalence, population genetic fixation indices (Wright 1951) would ideally be estimated simultaneously between very many of the taxa in a radiation, using information from many loci distributed across their genome. In a first attempt to do this, we genotyped more than 100 putative cichlid species from Lakes Victoria, Edward, Kivu, Albert, Nabugabo, Saka, and ancestral riverine species at 1200 polymorphic AFLP loci and estimated fixation indices for each putative species and all species pairs. We use these data to test predictions of the two alternative courses of adaptive radiation.

In so doing, we first seek an answer to the key question: are phenotypically defined putative species generally genomically differentiated? Because we do find significant genetic differentiation of most species, we further seek answers to three questions that address the dynamics of adaptive radiation: (1) does the average genomic differentiation of species increase with time since the start of an adaptive radiation, as predicted if, and only if, lineages persist through time? (2) Is taxonomic structure above the species level reflected in genetic structure, as predicted by lineage persistence with sequential evolutionary branching but not by a simultaneous starburst? (3) Is the initial colonization event associated with loss of genetic variation, or are subsequent speciation events associated with such loss, as predicted by speciation in small populations, that is in isolated satellite lakes.

Materials and Methods

THE SYSTEM

The endemic haplochromine cichlid fish of the LV region, the LV Region Superflock (LVRS), are considered the fastest vertebrate adaptive radiation, with more than 600 species evolved within the past 250,000 years (Nagl et al. 2000; Seehausen 2006; Verheyen et al. 2003). Diversification has occurred in each of the major lakes of the LV region (i.e. lakes Victoria, Edward, Albert, and Kivu) and also in several smaller lakes within the system. The six lakes that we sampled vary in age over two orders of magnitude, ranging from 4000 (Lakes Nabugabo and Saka) to 200,000 years (Lake Edward, LE) (Table 1), due to geologically different origins (i.e. old deep Rift lakes, crater lakes of more recent volcanic origin or shallow flood-plain-like lakes (Beadle 1981)), and major differences in depth, which determine their vulnerability to climatic variability and desiccation. The LV region experienced a rather dramatic Pleistocene climate history with several episodes of major drought and flooding. Paleolimnological and geological evidence suggests the latest complete desiccation of LV ended around 15,400 years ago (Johnson et al. 1996; Stager and Johnson 2008). Other lakes in the region also experienced major climate-driven oscillations. Lakes Albert and Turkana, for instance, had most likely dried up at around the same time as LV (Beuning et al. 1997). Furthermore Lake Kivu (LK) experienced a major disruption due to volcanism between 3000 and 5000 years ago, which led to the extinction of most phytoplankton taxa and is thought to have had major impact on higher lake fauna (Haberyan and Hecky 1987).

Table 1.  Characteristics of the lakes of the Lake Victoria region and their cichlid radiations. A complete list of samples analyzed is given in Table S1.
Lake radiationAge of the radiationSize of the lakeDepthEndemic speciesSampled taxaReferences
GeneraSpeciesn
  1. Age of the cichlid radiation is defined as time since geological origin1, last desiccation2 or geological disruption3, whichever is more recent. Known numbers of endemic species and number of species (and genera) sampled.

  2. LV = Lake Victoria; LE = Lake Edward; LK = Lake Kivu; LA = Lake Albert; LN = Lake Nabugabo; LS = Lake Saka; Riv = rivers outside the Lake Victoria region.

LVRS100–200,000 years21104317 
LV15,000 years268,000 km283 m44720 72230Johnson et al. (2000); Stager and Johnson (2008)
LN5000 years124 km25 m  7 3  5 15Johnson et al. (2000)
LS12–4000 years111.4 km2112 m1  41 11  41 101Russell et al. (2007)1
LE200,000 years12325 km2112 m 62 7 13 34Johnson et al. (2000), Laerdal and Talbot (2002), Russell et al. (2003)
LA115,000 years215300 km2158 m1  81 21  31  81(Beuning et al. 1997; Johnson et al. 2000)1
LK3–11,000 years32700 km2480 m 16 6  7 20Haberyan and Hecky (1987), Snoeks (1994)
Riv1–1–1–1–1 11  51 111 
Total:    21109328 

SAMPLES

We sampled the largest possible number of lakes, species, and markers and as much of the phenotypic diversity as possible, at the expense of the number of individuals per species. Previous investigations dealt with small numbers of selected species and limited numbers of markers. Although that approach is appropriate to investigate individual cases of speciation, our sampling strategy is more appropriate for investigating more general trends of genetic differentiation in these radiations. We analyzed samples from six different lake radiations; lakes Victoria (LV), Albert (LA), Edward (LE), Kivu (LK), Nabugabo (LN), and Saka (LS); as well as samples from five riverine species of the genus Astatotilapia, potential ancestor lineages of the LVRS (Fig. 1). The riverine species are not endemic to individual rivers but exist in several adjacent rivers, are strictly allopatric from each other, and none of them has undergone speciation within any of the rivers or lakes.

Figure 1.

Map of the sampling localities, with focus on the lakes of the Lake Victoria region. For each of the six haplochromine lake radiations studied (in bold), number of species analyzed and sampling localities (open rectangles) are indicated. The sampling localities for the riverine Astatotilapia species are indicated by open triangles in the small insert map.

We genotyped between two and eight individuals per species (median n = 3) for a total of 328 individuals from 109 different phenotypically defined putative species, and 21 genera (Table 1). We obtained 72 species belonging to 20 genera from LV, five species from three genera from LN, four color morphs from LS, 13 species belonging to seven genera from LE, seven species belonging to six genera from LK, and three species two genera from LA. All specimens were identified by OS. This sampling assured that a comparable fraction of all known species were sampled from all larger lakes (Table 1). For each lake, all samples analyzed here have been collected at a single sampling locality, except for lakes Edward and Victoria (see Fig. 1 and Table S1).

MOLECULAR ANALYSIS

Genomic DNA was extracted from fin clips stored in ethanol using a Qiagen-BioSpring96 (Qiagen Science, Germantown, MD) robot and blood extraction kit, after a standard proteinase-K digestion step.

The original AFLP protocol (Vos et al. 1995) was followed, except that the restriction and ligation steps of the procedure were combined. The restriction enzymes used were EcoRI and MseI. Primer sequences for preselective polymerase chain reaction (PCR) were GACTGCGTACCAATTCA and GATGAGTCCTGAGTAAC. Selective PCRs were performed with 12 different primer pairs. The EcoRI primers were labeled with one of three fluorescent dyes (i.e. Alexa AX647-blue (B), AX750-black (N) and IRD IR700-green (G)) to allow the multiplexing of the selective-amplification PCR products, for fragments separation on Beckman–Coulter CEQ-8000 capillary sequencers. The 12 primer pairs were then analyzed in four multiplex sets (S-1: CTT_ACA-B, CTT_AAG-G and CTT_AGC-N; S-2: CTC_ACA-B, CTC_AAG-G and CTC_AGC-N; S-3: CTA_ATT-B, CTA_AGG-G and CTA_ATC-N; S-4: CAT_ACA-B, CAT_AAG-G and CAT_AGC-N).

Electropherograms were exported from CEQ-8000) Genetic Analysis System (Beckman–Coulter) to be analyzed with GeneMarker v 1.75 (SoftGenetics, State College, PA). The scoring of the AFLP fragments was done independently for each of the 12 primer pairs amplified, considering fragments between 60 bp and 380 bp length. The definition of the scoring panel of AFLP loci was done manually by defining bin sets individually for each AFLP fragment, according to (1) the overall fragment phenotype across the individual-sample traces generated, and (2) the reproducibility of the fragment phenotype among the panel of repeated individuals (see below). For each individual trace, the amplified fragments corresponding to the selected locus bins panel were scored for presence/absence by automatic scoring algorithm, followed by manual scoring inspection. The reproducibility of the entire process was quantified using 32 samples (10% of the total number of analyzed individuals) randomly selected across the entire sample set. The entire molecular protocol and scoring process for these 32 fish were performed independently twice, to estimate the rate of mismatch between pairs of repeated traces (Bonin et al. 2007). Over all the 12 genotyped primer pairs, a reproducibility of 94.9 ± 1.2% was obtained. In total, 1475 AFLP loci (123 ± 15.2 per primer pair) were considered for analyses.

STATISTICAL ANALYSIS

Population genetic analyses

Number and percentage of polymorphic loci, and expected heterozygosity (Hj) were estimated using AFLPsurv (Vekemans 2002) and Arlequin 3.11 (Excoffier et al. 2005). Wright's fixation index, Fst, was estimated as (1) pairwise Fst between any two species and as (2) average species-specific Fst (among species within each lake radiation) using Arlequin 3.11. To estimate the accuracy of the Fst estimates derived from few but data-rich genotypes (i.e. n ∼ 3 individuals per species each with 1475 genetic loci), we subsampled a dataset of 10 species for which we had genotyped larger numbers of individuals (i.e. n ∼ 30 individuals each) with the same AFLP protocol (Bezault et al., unpubl. data). The relationship between mean Fst estimated from population subsamples of three individuals and that estimated from the full population samples of ∼ 30 individuals is highly significant (r2= 0.973; P < 2 × 10−16) with a slope of m = 0.926, and y-intercept of zero (y = 0.005; Fig. 2). Despite large scatter in the empirical Fst estimates based on the small sample sizes, the relationship between estimates for the same species pairs from our small (n = 2–8) and from our larger population samples is very similar to the one between the latter and random subsampling of these (slope of 0.892 and intercept of 0.001, even though the relationship between random subsampling and the actual small population samples is not significant; P = 0.547). The similarity of the regression slopes, and the fact that the empirical estimates from small sample sizes are all contained within the range observed by randomly subsampling a larger dataset, and mostly within the 90% CI estimated from random subsampling, allow us to conclude that our sampling strategy, although causing noise in our Fst estimates, does not induce bias, and certainly not bias toward higher Fst estimates.

Figure 2.

Effects of sample size on FST estimates. Relationship between pairwise species differentiation (FST) estimated from population-size samples (n ≈ 30) of 19 Lake Victoria (LV) species and populations (Bezault et al. unpubl. data) and the same FST estimated from small samples (n = 3) generated by random subsampling of the population-size data sets (30 times per species). For each pairwise comparison, the distribution of the FST estimated from subsampling is represented by its mean (i.e. dark gray diamond), the 90% range interval (i.e. reported by continuous gray line), and its total dispersion (i.e. reported by dashed gray line). We then plotted the empirical pairwise FST estimates (open circles) between 13 of these populations, obtained from independent small samples (n = 2–8; i.e. the present study) also against the FST estimated from the larger samples. Linear regressions fitted to the 13 observed datapoints (continuous black line) and to the resampled data (dashed black line). The majority of the empirical small sample Fst (10 out of 13) fall within the 90% range interval of the distribution of the corresponding Fst estimates obtained from subsampling larger samples. Furthermore the linear relationships, with slopes close to m = 1 and intercepts close to y = 0, between Fst estimated based on population-size samples and both random subsampling and independent small sample sizes from the same species pairs suggest the absence of bias in Fst estimates from our small sample sizes.

The partitioning of the genetic variation was analyzed by analysis of molecular variance (AMOVA; Excoffier et al. 1992). The original model based on the variance decomposition from a hierarchical model with two nesting levels into three variance components (Fct, Fsc, and Fst) has been generalized to models with any number of nesting levels, and corresponding F-components (Yang 1998). This generalized model has been implemented in HierFstat (Goudet 2005), allowing the estimation of the corresponding variance components, and testing their significance using randomization tests. These algorithms are based on the partitioning of the variance in allele frequencies of codominant markers. The use of the AMOVA procedure to test for the presence of hierarchical genetic structure can be extended to the analysis of dominant markers under the assumption of similar within-population mating patterns in all populations, as implemented in Arlequin and Famd (Schluter and Harris 2006). Using dominant markers, the AMOVA is partitioning the genotypic variance from a phenotype data matrix (i.e. presence/absence of fragment), which provides a meaningful estimation of the proportion of variance that is explained by the different structuring levels in the model and its significance (Huff et al. 1993; Stewart and Excoffier 1996). We verified the accuracy of variance partitioning based on dominant data by the generalized AMOVA procedure implemented in HierFstat by comparing our results for several two-level hierarchy models, with results obtained for the same models using Arlequin (data not shown).

Evolutionary hypotheses

Testing the ecological speciation versus the intraspecific polymorphism models for rapid adaptive radiation: For each of the six lake radiations we tested for the prevalence of speciation, first, using the AMOVA models, asking if a significant part of the genomic variation was segregating between phenotypically defined putative species; and second, by calculating the frequency distributions of Fst for each lake radiation separately and comparing these against a mean of zero. When sympatric populations of divergent phenotypes are compared, geographical population structure can be excluded, and deviation of Fst from zero is consistent with speciation. AMOVA models have been computed separately for the three main sampling regions in LV: northern Mwanza Gulf, Makobe Island, and western Speke Gulf, and for the main sampling location, Katunguru, in LE.

Does genomic differentiation of species increase with the age of a radiation? Genomic differentiation of species is predicted to increase with the age of radiation if, and only if, species or lineages tend to persist through time. We compared the distribution of Fst between radiations in lakes of different age. Pairwise FST and population-specific FST were calculated for all putative species within each lake using the AMOVA-based approach implemented in Arlequin. Hierarchically nested AMOVA models were computed separately for LE, LV, LK, LA, and LN. Additionally we computed a hierarchical AMOVA model for the entire LVRS, including as grouping factors lakes and species. The extent of species differentiation was compared between radiations by comparing the distributions of pairwise FST using the Mann–Whitney U-test. We calculated both nonparametric Spearman rank correlation and linear regression to test if there was a relationship between average species differentiation and age of radiation. Age of radiation data was obtained from geological literature as the maximum age of the lake hosting a radiation, or the time since its last complete desiccation if the latter was more recent (see details and references in Table 1).

The lakes also exhibit large heterogeneity in size and depth, ranging from very small crater or satellite lakes (1.4 km2 for LS) to LV (68,000 km2), and from very shallow lakes (5 m max depth in LN) to very deep (480 m depth of LK; Table 1). Because persistence of lakes, and hence age of lake, are affected by lake size and depth, and lake size and depth may influence genetic species differentiation through habitat heterogeneity and opportunity for isolation by distance, we further estimated the relationship between species differentiation and lake size (total surface area and maximum depth of the lake), and also estimated the relationships among lake age, surface, and maximum depth using regression analysis.

Starburst versus sequential speciation—is there phylogenetic structure above-species level? Alternative hierarchical AMOVA models were calculated to quantify the genetic variance components explained by species, genus, and geography. We calculated models for each of the lake radiations separately. Additionally, we calculated two models with three hierarchical levels each: species, genus [defined within lake], and lake. First, species was nested in genus and genus in lake, assuming that radiations took place independently in each of the lakes and that the same nominal genera in different lakes were due to homoplastic origins. Second, species was nested in genus independently of lake (i.e. species, genus within lake, genus across lakes), assuming that genera were monophyletic and lake radiations paraphyletic or polyphyletic (Greenwood 1980). Finally, for the two largest lake radiations (i.e. LV and LE) in which species were sampled from several geographical regions within the lake (Fig. 1 and Table S1), the region was considered as another putative above-species level structuring factor, alternative to genus.

Are colonization and rapid lineage splitting through adaptive radiation associated with loss of genetic variation? The expected heterozygosity (Hj) corresponding to Nei's gene diversity was computed as implemented in AFLPsurv for four hierarchical levels: species, genus [defined within lake], lake, and the entire LVRS. Additionally, Hj was calculated for species and genus of riverine Astatotilapia spp., representing lineages closely related to the ancestor of the LVRS, but currently geographically isolated from the LV basin and from each other. We used both nonparametric correlation and linear regression to test for a trend in gene diversity from the entire LV region superflock to the colonization of individual lakes within the region and the origins of individual species within each lake. We tested the hypothesis of independence using Student's t-test (i.e. comparing the slope of the regression line to H0 that β= 0). Tests were conducted for each lake, for the entire LVRS, and separately for the riverine Astatotilapia lineage.

Results

GENETIC POLYMORPHISM AND BIOGEOGRAPHICAL DIFFERENTIATION OF THE LAKE VICTORIA REGION SUPERFLOCK

Of the 1474 AFLP loci analyzed, 1282 loci (87%) were polymorphic at the level of our entire set of samples (i.e. LVRS and riverine taxa), and 1144 loci (78%) were polymorphic within the LVRS itself. At the level of the LVRS, the average species differentiation was FST= 0.235, compared to an average species differentiation of FST= 0.735 amongst the riverine Astatotilapia species, representing an old group of entirely allopatric species. AMOVA revealed strong genomic differentiation between the members of the LVRS radiation and the riverine Astatotilapia species, with 34.5% of the genetic variance explained by the model (Table 2, model 1a). When considering the LVRS radiation and the riverine Astatotilapia species (Table 2, model 1a), 7.3% of the total genetic variance was segregating between the lakes. When considering only the LVRS radiation (Table 2, model 2a), 11.3% of the total genetic variance was segregating between the lakes.

Table 2.  Genetic structure according to alternative biogeographic and taxonomic models, estimated by analyses of molecular variance (AMOVAs) at different hierarchical levels: (1) entire Lake Victoria Region super-flock (LVRS) and rivers of East and North Africa; (2) LVRS with species nested in genus within lake (2 a), and species nested in genus defined by lake within same genus defined across lakes (2b); (3) each lake separately with species either nested in genus (top) or not (below); and for each of the two lakes from which sampling has been carried out in more than one locality (4) lake flock analysis with species nested in sampling region; and (5) analysis of species differentiation within sampling regions (level of significance estimated by resampling 10,000 times); levels of significance: *P<0.05; **P<0.01; ***P<0.001; and ns, non-significant.
Model descriptionHierarchical structuring factors (%variance and significance)
Species grouped by genus within lake and radiation
 (1a) LVRS and riverine species1Radiation1Lake1Genus1Species1Individuals1
 34.5%*** 7.3%*** 0.1%***12.5%***45.7%
Species grouped by genus and lake within LVR superflock
 (2a) Genus within lake1 Lake1Genus1Species1Individuals1
  11.3%*** 1.7%***15.1%***71.9%
 (2b) Genus across lakes Genus across LakeGenusSpeciesIndividuals
  −3.9% ns11.4%***17.5%***75.0%
Species grouped by genus within lake  GenusSpeciesIndividuals
 (3a) Lake Edward   2.2%***22.3%***75.5%
   -24.4%***75.6%
 (3b) Lake Victoria   2.3%***17.5%***80.1%
   -19.0%***81.0%
 (3c) Lake Kivu   1.8% ns 9.0% ns89.2%
   -10.7%***89.3%
 (3d) Lake Albert   9.9% ns 6.5% ns83.6%
   -13.9%**86.1%
 (3e) Lake Nabugabo   2.8% ns 6.8%*90.5%
   - 9.0%***81.0%
 (3f) Lake Saka  -−1.3% ns101.3%
Species grouped by region within lake  RegionSpeciesIndividuals
 (4a) Lake Edward  −0.3% ns24.6%***75.7%
 (4b) Lake Victoria   1.3%**19.0%***79.7%
Species within regional communities   SpeciesIndividuals
 (5a) Lake Edward—Katunguru   23.6%***76.4%
 (5b) Lake Victoria—Mwanza   21.3%***78.7%
 (5c) Lake Victoria—Makobe   14.0%***86.0%
 (5d) Lake Victoria—Speke   23.5%***76.5%

EVOLUTIONARY HYPOTHESES

Is rapid adaptive radiation achieved primarily through speciation or through the evolution of intraspecific polymorphism?

The analysis conducted at LVRS level, controlling for biogeography (i.e. lake explaining 11.3% of the molecular variance (Table 2, model 2a)), revealed that species explained more than 15% of the molecular variance within lakes (P < 0.001). Next we investigated each lake radiation separately. When considering non-nested AMOVA models, the grouping factor “species” accounts for a very highly significant part of the genetic variance in all lake radiations, ranging from 24.4% in LE to 9% in LN (model 3a–e), with the exception of LS, in which species level does not account for any significant amount of the molecular variance that we measured (model 3f). After controlling for geographical structure of sampling within the two largest lake radiations, LE and LV, the percentage of variance explained by species even went up (24.6% and 19%, respectively; model 4a and 4b). Additionally, we calculated AMOVA for each of the main sampling regions separately in Lakes Victoria and Edward. All geographically controlled models for five of the six lakes (i.e. except LS) reveal highly significant effects of species on the partitioning of genetic variance in the absence of geographic isolation (Table 2 models 5a–5e).

Frequency distributions of pairwise FST estimates were very significantly offset from zero for Lakes Edward, Victoria, Kivu, and Nabugabo (Table 3; Fig. 3). Sample sizes were too small for Lakes Albert and Saka to test this. Frequency distributions for Lakes Edward, Albert, and Victoria did not or barely at all include the category of smallest FST (0.0–0.025). This was true for distributions generated from all species sampled from a lake as well as for distributions generated just from species sampled within each of the main sampling regions within a lake, suggesting that essentially every putative species sampled was genetically differentiated against every other putative species that we sampled. In contrast, Lakes Kivu and Saka each contained several pairwise comparisons in the category of near-zero FST, and LN had several with very low FST, suggesting not every pair of putative species was significantly differentiated in these lakes.

Table 3.  Levels of genetic differentiation among species in the lakes of the LVRS, and in sampling regions within lakes. Mean, standard variation, range, and significance against zero (with t-test) of the distribution of (1) species-specific FST or (2) species pairwise FST; for each lake and regional community within Lake Victoria and Edward. Pairwise comparisons of the distribution of species differentiation between lakes were done with Mann–Whitney U-test. The estimates of differentiation among allopatric riverine species of Astatotilapia (Riv) are provided for comparison (abbreviations as in Table 1); levels of significance: *P<0.05; **P<0.01; ***P<0.001; and ns, non-significant. Thumbnail image of
Figure 3.

Distribution of the pairwise FST estimates between species within each lake radiation, left column from top: (1) Lake Edward, (2) Lake Victoria, (3) Lake Kivu; right column from top: (4) Lake Albert, (5) Lake Nabugabo, and (6) Lake Saka. For Lake Edward and Lake Victoria, the distribution of the FST between species within sampling regions are also shown (i.e. Katunguru in dark gray for Lake Edward, Mwanza in white, Makobe Island in gray, and Speke Gulf in dark gray for Lake Victoria); see Table 3 for statistical comparisons of distributions.

Does the average genomic differentiation of species increase with the age of a radiation?

The six lakes studied vary in age by two orders of magnitude, ranging from less than 4000 years (LS) to 200,000 years (LE) (Table 1). The average genetic differentiation of species also varied between lakes but only by one order of magnitude, from FST= 0.012 in LS, to FST= 0.251 in LE (Table 3). The distribution of species differentiation differed significantly between most lakes (Table 3). Exceptions were the comparisons of lakes Kivu and Nabugabo, as well as comparisons involving LA for which we had only three species sampled. The between-lake variation in the distribution of genetic species differentiation is also clearly apparent in plots of the FST distributions (Fig. 3). LE not only shows a distribution centred on the highest mean (FST= 0.25), but there is a tendency to bimodality (with modes on FST= 0.20 and 0.30). LK, on the other hand, shows a distribution with a right skew.

Additionally to the variation in age, the six studied lakes also vary widely in terms of lake surface area and maximum depth, two variables that can potentially affect species differentiation. Whereas area and depth were significantly correlated in a rank correlation test but not in a parametric test, age was not related to either area or depth in either test. Univariate and multivariate regression analyses were conducted of average species differentiation against the three lake variables (Table 4). The association between the age of a lake and the average genomic differentiation of its species was statistically significant (Table 4 and Fig. 4 a; analysis of variance (ANOVA), P = 0.022 and Spearman rank correlation, P < 0.001). Visual inspection of the plot may suggest that the relationship follows a decelerating function, but with no statistical support for this. The evidence for larger genetic differentiation among species in older radiations suggests that initially rapid speciation is followed by persistence of lineages.

Table 4.  Relationships between the age of lakes (Age), surface area of lakes (Area), and maximum depth of lakes (Depth) and between these lake variables and the extent of genetic differentiation among species (i.e. the average of the species-specific FST within each lake; n=6 datapoints). Spearman rank correlation, linear least squares regression, and multivariate categorical regression models of Fst against the log10-transformed lake variables, as well as the residual of Fst controlling for the two other lake variables; levels of significance: *P<0.05; **P<0.01; ***P<0.001; and ns, nonsignificant..
Models Variables Model summaryANOVAStandardized coefficientsSpearman Correlation
R2SEFSig. BetaPartialtSig. RhoSig.
  1. ANOVA = analysis of variance; SE = standard error.

Relationships among lake variables
 AgeArea0.2990.5981.7080.261 ns 0.547  1.3070.261 ns 0.6380.173 ns
 AgeDepth0.2090.6351.0540.363 ns 0.457 1.0270.363 ns 0.5800.228 ns
 AreaDepth0.5171.3354.2840.107 ns 0.719  2.0700.107 ns 0.9860.000***
Relationships between lake variables and species differentiation
Univariate linear regressions
 FSTAge0.7670.05013.1710.022* 0.876  3.6290.022* 0.9860.000***
 FSTArea0.7080.0569.7180.036* 0.842 3.1170.036* 0.6570.156 ns
 FSTDepth0.2830.0871.5770.278 ns 0.532  1.2560.278ns 0.6000.208 ns
Multivariate linear regression
 FSTAge, Area, and Depth0.9800.02033.4850.029*- ----
 Part. Age     0.614  5.1700.035*  
 Part. Area     0.673  4.4290.047*  
 Part. Depth    −0.233 −1.6270.245 ns  
Univariate linear regressions based on residual of Fst controlling for the other lake variables
 AgeRes_Fst (Area/Depth)0.6440.4267.2440.055∼ns 0.803 2.6910.055 ∼ns 0.0290.957 ns
 AreaRes_Fst (Age/Depth)0.3831.5092.4870.190 ns 0.619  1.5770.190 ns 0.6570.156 ns
 DepthRes_Fst (Age/Area)0.2720.6781.4930.289 ns−0.521−1.2220.289 ns−0.5430.266 ns
Figure 4.

Relationships between species differentiation within each of six lakes and lake variables in the Lake Victoria Region: (A–C) all species-specific FST within each lake and (A) the lake age, (B) lake surface area, and (C) maximum depth of the lake; allopatric riverine Astatotilapia spp. (Riv) are presented in panel (A) for comparison only; (D–F) relationship between average species-specific differentiation for each lake and each of the three lake variables; (G–I) relationship between each of the lake variables and the residual of the average species-specific FST controlled for the two other lake variables.

There is also a significant but weaker (and less consistent) association between lake surface area and average genomic species differentiation (Table 4 and Fig. 4b; ANOVA, P = 0.036 but Spearman rank correlation, P = 0.156). Finally no association was detected between lake depth and the average genomic species differentiation (Table 4 and Fig. 4c; ANOVA, P = 0.278 and Spearman rank correlation, P = 0.208). The results of a multivariate categorical regression including all three lake parameters were consistent with the above, a globally significant multivariate regression model and significant partial associations with lake age and surface area, but not with maximum lake depth (Table 4; P < 0.05). Univariate regressions using residuals of the average species differentiation controlling for two of the three lake variables revealed a marginally non-significant effect of lake age (Table 4; P = 0.055), but no effects of lake surface area or depth on species differentiation.

Starburst versus sequential speciation

The hierarchical AMOVA analysis, using as nesting levels lake, genus, and species (Table 2, model 2a) revealed a highly significant effect of genus (defined within lake), which accounted for 1.7% of the total genomic variance. This suggests some genetic support for the above-species level taxonomy. In the alternative model, species nested in genus across lakes, genus did not explain any significant amount of genomic variance (model 2b). We repeated this nested AMOVA with species grouped by genus, separately for each lake radiation (except for LS in which only one genus is present). We found highly significant effects of genus in lakes Victoria and Edward (models 3a and 3b; 2.2% and 2.3%, respectively), besides highly significant effects of species. In lakes Kivu, Albert, and Nabugabo, we found weaker and or non-significant effects of genus (models 3 c–e), which is most likely just due to the small number of genera sampled, and species within each.

For LV, the proportion of genomic variance explained by the geographical structure in our sampling was 1.3% of the total variance (model 4b). In contrast, sampling region did not explain any significant amount of genetic variation in our LE samples, although the regions were the Kazinga channel versus the open lake shore (model 4a).

Are colonization and rapid speciation associated with loss of genetic variation?

The species from the LV region show a relatively high level of expected heterozygosity, considerably higher than observed for the riverine Astatotilapia species (Fig. 5). The group of riverine Astatotilapia, however, shows a very high expected heterozygosity when its different species are pooled (Hj = 0.216), reflecting deep genetic divergence between these old and geographically isolated riverine species. When all its species are pooled, the LVRS radiation shows an expected heterozygosity (Hj = 0.139) above that of individual riverine species, but smaller than the pooled riverine Astatotilapia species group. The latter is consistent with a more recent origin of the radiation that is phylogenetically nested within the riverine Astatotilapia. However, that the expected heterozygosity was higher in the LVRS radiation than in any one of the riverine species might suggest an origin of the LVR radiation from an unusually diverse riverine species, or from more than one riverine species.

Figure 5.

Expected heterozygosity (Hj) at four hierarchical levels of species flock formation in the cichlids of the Lake Victoria region: (1) Lake Victoria region “superflock” (LVRS); (2) individual lake radiations; (3) genus (within lake), (4) individual species (no species is found in >1 lake); as compared to the expected heterozygosity observed in individual species of riverine Astatotilapia (the ghost point refers to He of pooled Astatotilapia species). The different lakes are represented by different symbols and shading of gray: Lake Victoria, LV; Lake Nabugabo, LN; Lake Saka, LS; Lake Edward, LE; Lake Albert, LA; Lake Kivu, LK; and the riverine Astatotilapia spp. , Riv.

Within the species flock, we observed close to no decay of expected heterozygosity (Hj) with successively shallower hierarchical nesting level, starting with the entire LV region superflock, through individual lake radiations, to genera [defined within lake], and finally species (Student's t-tests β= 0: −0.028, P = 0.455; Fig. 5).

Discussion

We have used 1475 genomic AFLP loci, 1282 of which were polymorphic, to conduct a genome scan of a very young yet phenotypically highly diverse and putatively species rich adaptive radiation, the cichlid fish in the LV region of East Africa. Our study includes two to eight individuals of each of more than 100 phenotypically defined taxa, putative species, from LV and five other lakes in the LV region. We sampled every genus in the radiation, except those that are most likely extinct (Witte et al. 1992; Seehausen et al. 1997), and multiple species within each non-monotypic genus, for a total of 22 genera. Our data reveal strong population genomic signatures of species differentiation and significant although weaker signatures of above-species level genus differentiation. At the same time, we find a high level of genetic polymorphism within the superflock, higher than in ancestral riverine species, and no measurable decay of gene diversity from the level of the entire region (six lakes and 105 species included) to individual lakes and individual species within each lake. Comparing radiations in lakes of different age and size, we find that genomic differentiation between species increases with both age and size of lakes. Our data suggest (1) that colonization of the LV region was associated with a gain of genetic variation as compared to ancestral riverine populations; (2) that adaptive diversification and speciation happened rapidly and synchronously at least on the time scale that we can resolve, clearly supporting the speciation model over the polymorphism model for the course of this adaptive radiation; (3) that speciation happened against a background of large effective population sizes, and (4) that lineages have persisted through significant evolutionary time. We will discuss each of these conclusions in turn.

ADAPTIVE RADIATION THROUGH ECOLOGICAL SPECIATION

The relative importance of phenotypic plasticity, intraspecific polymorphism and speciation in explaining the extraordinary diversity of cichlid fish in LV has been debated since the first application of molecular genetic methods to the problem of cichlid diversity (Basasibwaki 1975; Sage and Selander 1975). Early investigators, using first allozymes and later mitochondrial DNA and nuclear gene sequences in a phylogenetic context, observed a general lack of measurable genetic species differentiation (Basasibwaki 1975; Verheyen et al. 1984; Van Rompaey et al. 1988; Meyer et al. 1990; Nagl et al. 1998). In hindsight, this may not be all that surprising because these markers were either not very informative at the recent time scale of divergence, or standing variation in these markers would have been inherited from a common ancestor of all the species and insufficient time for lineage sorting would have required population genetic approaches to look at allele frequency differences between species. Recent investigations studying population samples of sibling species within an explicit ecological and evolutionary context routinely demonstrated genetic species differentiation in allele frequencies at neutral markers, using either microsatellites or mitochondrial DNA sequences (Seehausen et al. 2008; Magalhaes et al. 2009; Mzighani et al. 2010). At the same time, however, other studies continued to demonstrate lack of genetic differentiation between other taxonomically distinct species at the same classes of neutral markers (Samonte et al. 2007; Elmer et al. 2009). Hence, adaptive radiation through the evolution of intraspecific functional polymorphism without (Samonte et al. 2007) or well in advance of speciation (Klein et al. 2007), remained distinct possibilities to explain the rapid phenotypic diversification of LV cichlids.

With regard to addressing prevalence of speciation versus intraspecific polymorphism, previous investigations were constrained by the number of molecular markers, the number of taxa sampled, and the ways these were chosen from the large number of species in the system. We attempted to overcome all three limitations in this study. Our analysis demonstrates strong and dominant genetic structure between species within LV (19.0% of the overall molecular variance) and LE (24.4%). Fixation indices suggest that almost any two phenotypically defined taxa of cichlids in LV and LE are significantly genetically differentiated. The distribution of pairwise FST, both within narrowly defined sampling regions in LV, and across the lake, has a distinct mode between 0.15 and 0.25, and of 2556 pairwise FST estimates fewer than 12 (0.5%) are near zero. Finally, differentiation was significant for almost any two of the species for which our sample size permitted the assessment of significance (43 of 45 pairwise comparisons (96%) with ≥ 4 individuals sampled of each taxon), and of the 72 species-specific FST estimates, none was smaller than 0.070 (and only one was smaller than 0.100). Similarly, in LE none of the pairwise FST estimates were near zero, and the smallest species-specific FST estimate was 0.157. Strong indications of species differentiation were also found in lakes Albert, Kivu, and Nabugabo, even though the partitions of the total genetic variance explained by species were smaller (i.e. 13.9%, 10.7%, and 9.0%, respectively), and a fraction of values in the pairwise FST distributions of the two latter lakes were near zero.

We conclude that phenotypic differentiation is routinely coupled to the evolution of some reproductive isolation in this exceptionally fast radiation, supporting the classical model of adaptive radiation in which phenotypic adaptation is associated with speciation (Simpson 1944; Schluter 2000). We do not see much support in our data for the alternative hypothesis of adaptive radiation through the evolution of intraspecific phenotypic polymorphism without, or significantly preceding, speciation. Was the latter frequent, we had expected to find many cases of near-zero FST between phenotypically distinct taxa. Note that our putative species are differentiated in more than one phenotypic trait and do not include putatively conspecific color morphs except in the case of LS in which only color morphs with very small or no other morphological divergence are known. That we do find a majority of zero FST in LS, demonstrates that the near absence of zero FST in the other lakes is not artifactual, and that low-dimensional phenotypic diversification can, in some situations, occur without speciation or at least without measurable neutral marker differentiation.

SPECIES PERSIST THROUGH EVOLUTIONARY TIME

In the classical view (Simpson 1944), adaptive radiation starts with the colonization of an underutilized adaptive landscape. During the course of selection-driven diversification, incipient species will successively come to occupy the available adaptive peaks (Schluter 2000; Harmon et al. 2003), and the rate of diversification decreases through time as available niches fill up and opportunity for further ecological speciation diminishes (Schluter 2000). A decreasing rate of diversification from younger to older radiations is consistent with this prediction, but in the absence of species age data, can alternatively be explained by rising extinction and species-turnover rates against a background of unchanged speciation rates (Rabosky and Lovette 2008; Rabosky 2009). This would be predicted if speciation was driven by niche-independent mechanisms but species coexistence depended on niche partitioning. With niche-independent speciation mechanisms, the average genetic differentiation of species would be expected to be, at best, loosely related to the age of a radiation. Niche-dependent speciation models on the other hand predict that older radiations have more strongly differentiated species, and this is indeed apparent when the cichlid fish radiations of the LV region are compared to the much older Lake Tanganyika cichlid fish radiation, but this is a crude comparison possibly confounded by other differences between these radiations (Seehausen 2006).

Our study system allows testing such hypotheses more directly. Lakes within the LV region and their extant species assemblages vary in age. Whereas some Ugandan crater lakes and satellite lakes are only a few thousand years old, the modern assemblage of LV originated about 15,000 years ago (even though some populations most likely survived the Pleistocene desiccation in headwaters (Stager and Johnson 2008)), and that of LE may be as old as 200,000 years. The radiations in the different lakes of the region can therefore be considered replicate radiations that we sampled at different times after onset of radiation. We find that these radiations do indeed have significantly different distributions of species genetic differentiation. Mean species differentiation is significantly positively related to the age of lakes, ranging from FST= 0.012 in the youngest lake (LS) to FST= 0.25 in the oldest (LE). We conclude that species lineages tend to persist as genetically differentiated entities for at least tens of thousands of years.

Beyond the genetic differentiation of species, we also find a significant albeit small fraction (∼ 2%) of molecular variance explained by the genera defined by Greenwood (1980). Even though this genus effect is small, it explains more genetic variation above the species level than does sampling region within the large lakes. This implies that the historical genetic signal is stronger than any signal of possible gene flow among sympatric species. We conclude from this that adaptive radiation proceeded by successive events of speciation as opposed to one single starburst in each lake.

EFFECTS OF LAKE SIZE AND DEPTH

Variation in lake surface area and depth can be expected to influence the dynamics of speciation and genetic differentiation through several independent mechanisms: (1) habitat heterogeneity, and thereby the opportunity for adaptive differentiation, increases with area size and lake depth; (2) the opportunity for intralacustrine geographical isolation increases with surface area; (3) finally, as total population size increases with area size, average species duration times may also increase. A strong positive relationship between lake surface area and species richness of African cichlid radiations has previously been shown (Seehausen 2006). Here, we do not detect any relation between lake depth and average genetic differentiation of species, however, we show some mild evidence that average genetic species differentiation correlates positively with lake surface area. This effect of area is weak though and significant only in parametric regressions, but not in rank correlation tests, and also not when age and depth are controlled for. We do not think that this mild effect is likely due to opportunity for intralacustrine geographical isolation because our sampling design controlled for geographical isolation. Hence, we suggest that the effect is more likely due either to the predicted relationship between lake size and habitat heterogeneity or lake size and species duration times.

COLONIZATION WITHOUT LOSS OF GENETIC VARIATION

Paleoecological and geological data suggest several complete desiccation events in the history of LV (Stager and Johnson 2008). Yet, neither the present data nor previous studies (Nagl et al. 1998; Mzighani et al. 2010) detect the genetic signature of the expected drastic reduction in Ne. To the contrary, we found gene diversity in each of the older riverine cichlid species to be lower than in the average endemic LV region cichlid species, suggesting that diversity increased rather than decreased during colonization of the LV region. The signature of a bottleneck or founder event on genetic diversity can be minimized if the episode of reduced Ne was brief, and directly followed by a long and/or intense phase of demographic expansion. An alternative to this hypothesis that is consistent with the increased gene diversity relative to riverine cichlids, and has been documented in several other adaptive radiations, is that colonization involved more than one genetically differentiated species that hybridized (Barrier et al. 1999; Seehausen 2004; Herder et al. 2008; Hudson et al. 2011; Joyce et al. 2011).

ECOLOGICAL SPECIATION WITHOUT LOSS OF GENETIC VARIATION

We found no difference between the mean gene diversity within individual species, entire lakes, and the entire LV region. Hence, much gene diversity was retained through the many speciation events. Ecological speciation may be driven by strong divergent selection on a small number of genomic regions. Such a mechanism is facilitated by large standing genetic variation in the founder population (Barrett and Schluter 2008), but is expected to quickly erode genetic variation in key regions of the genome (Dieckmann and Doebeli 1999; Kondrashov and Kondrashov 1999). On the other hand, the colonization of new areas could be associated with demographic founder events, expected to erode genetic diversity widely across the genome. The latter is a prediction of the allopatric island-speciation model that has been proposed in the context of intralacustrine cichlid fish radiations (Mayr 1984; Salzburger and Meyer 2004), sometimes referred to as “microallopatric speciation” (though the term is perhaps inappropriate; Mallet et al. 2009). Our data suggest that speciation in the LV region cichlid radiation was not associated with founder events or bottlenecks in any of the lakes of the region. The retention of genetic polymorphism through many speciation events would have required that the effective population size remained large throughout the entire period of radiation. In the absence of any signature of founder events or bottlenecks, the strong genomic differentiation of tens and hundreds of coexisting species arisen within just thousands of years is perhaps most readily explained by speciation driven by divergent selection, genomic hitchhiking, and isolation-by-adaptation (Nosil et al. 2009).

Speciation caused by divergent selection without geographical isolation generates highly heterogeneous genomic divergence with islands of differentiation hitchhiking around loci under selection, within a sea of low genomic divergence (Smadja et al. 2008; Via and West 2008; Nosil et al. 2009). Such pattern has been recently shown also in Helianthus (Yatabe et al. 2007) and Littorina (Wood et al. 2008), and is congruent with the fact that some studies of LV cichlids that used a limited number of neutral genetic markers (Nagl et al. 1998; Nagl et al. 2000; Elmer et al. 2009), failed to uncover genetic differentiation between sympatric species. It seems likely that rare hybridization between species in each lake radiation happens for a sustained period of time, slowing down broad-scale genomic species divergence without altogether cancelling the signal of time and phylogeny. By increasing the effective population size of species and restoring trans-species polymorphism, such rare gene flow may in fact facilitate a sustained high rate of adaptive diversification, a hypothesis that requires further testing in the future.


Associate Editor: C. Peichel

ACKNOWLEDGMENTS

We would like to thank the Tanzania Commission for Science & Technology for research permissions and to the team of the Tanzania Fisheries Research Institute in Mwanza for their hospitality during the 2005 and earlier sampling periods. We thank H. Mrosso, M. Kayeba, and M. Haluna for assistance during fieldwork on Lake Victoria, Sylvester Wandera for sampling LA, Lauren and Colin Chapman for sampling LN and for collaboration and hospitality during sampling on Lakes Edward and Saka, Pascal Isumbisho, Yves Fermon, Sylvain Piry, Richard Velay and the association Haplochromis.org for providing samples from Lakes Kivu, Victoria system and A. desfontainii, Martin Genner for samples from Lake Chilwa, Sigal Balshine for A. flavijosephi samples. We are grateful to M. Maan, K. Wagner, C. Melian, J. Goudet, D. Schlüter, two anonymous referees, and the editor, K. Peichel, for their valuable comments on the manuscript and to the members of the Fish Ecology & Evolution Lab (Eawag and University of Bern) for discussions. This work was part of Swiss National Science Foundation grant 3100A0-106573 to OS.

Ancillary