Dr Filip Volckaert, Katholieke Universiteit Leuven, Laboratory of Aquatic Ecology, Ch. de Bériotstraat 32, B-3000 Leuven, Belgium. Tel.: +32 16 32 39 66; fax: +32 16 32 45 75; e-mail: firstname.lastname@example.org
Abstract Pleistocene genetic structure of the bullhead, Cottus gobio, was evaluated across the western Palearctic using a 771-bp long fragment of the mitochondrial control region in 123 individuals collected at 35 sites (data set I). In total, 59 haplotypes that differed at 73 positions (9.3%) were detected. Data analysis also included sequences from Englbrecht et al. (2000), thus increasing the sampling to a more comprehensive data set of 529 fish and 63 control region sequences of 482 bp (data set II). A minimum spanning and phylogenetic tree identified a seventh clade (Brittany–Loire) in addition to the previously identified six clades. The geographical range of the North Sea and Lower Rhine clades was considerably larger than thought previously. Haplotype diversity was generally low, and the total fixation index high (FST = 0.49). Among-group differentiation accounted for 41.7% (data set I) of the variation. Contiguous range expansions and restricted gene flow combined with isolation by distance, interspersed with past fragmentation characterize bullhead across its range. New is the knowledge that dated interglacial periods correlated with population expansions; river captures, proglacial lake systems and sea level played a significant role in the dispersal and expansion either in northern or southern direction. Hence it became possible to identify and date the colonization routes and putative palaeorefugia, most of which were located in Central and North-west Europe. Glacial periods resulted in distinct fragmentation events and lineage sorting.
The evolutionary legacy of the Cenozoicum, incorporating such events as vicariance and palaeoclimate, can be traced back in the genome of a wide variety of terrestrial and aquatic organisms (Avise, 2000). It resulted in an identifiable impact on species richness, abundance, genetic diversity and structure. The phylogeographical approach (Avise et al., 1987) provides insights into the determinants of current distribution and abundance, and a powerful method for investigating the nature and efficacy of evolutionary processes. For example, phylogeographical analyses have shown that terrestrial animals responded across Europe in three general ways to the glacial recession over the past 20 000 years (Hewitt, 2000). Species belonging to the ‘hedgehog’ pattern colonized from three refugia (Balkan, Iberic and Italian peninsula) to Central and Western Europe, those following the ‘bear’ pattern did so from the Iberic and Caucasian refugia, and those following the ‘grasshopper’ pattern colonized from a single refugium in the Balkans. In addition to the vast amount of information provided on Europe's terrestrial animals from phylogeographical studies, there is a steadily increasing knowledge on the Pleistocene history of European freshwater ichthyofauna from genetic studies (Nesbøet al., 1999; Bernatchez, 2001; Brunner et al., 2001). The dispersal of aquatic organisms is constrained by the availability of suitable river and lake habitats. Hence aquatic organisms might show a phylogeographical pattern, which is – or which is not – distinct from their terrestrial counterparts. River captures and connections throughout geological time might provide both the necessary migration routes for colonization and for retreat during deteriorating environmental conditions. Most fish, for example, were well established in Europe by the Late Miocene (Banarescu, 1990, 1992), including bullhead (Heizmann, 1992). Their distribution, however, became subject to the continuous remodelling of the river drainage (Gibbard, 1988) and to climatic oscillations, of which the glacial/interglacials of the Late Pleistocene played a crucial role (Banarescu, 1990, 1992). Periods of above or below average temperatures and precipitation levels, as well as periods with strong climate fluctuations, have affected the viability of populations as well as their chances for expansion or reduction.
Here, the population structure and Pleistocene history of a small secondary freshwater fish living in the western Palearctic is analysed. Despite a benthic life style with paternal brood care (Marconato & Bisazza, 1988) and limited individual dispersal ability (Downhower & Brown, 1979; Waterstraat, 1992), the colonization potential of the bullhead, Cottus gobio L. (Cottidae, Teleostei), is considerable as evidenced by its wide distribution. Bullhead have, for example, recolonized Scandinavia across an approximately 1500 km range within a 10 000-year period (Kontula & Väinölä, 2001). Its current range fits between the maximum July isotherm of 20 °C and the minimum January isotherm of −20 °C. In addition, it inhabits rheophilic well-oxygenated upland rivers and the surf zone of lakes of the western Palaearctic (Lelek, 1980; Wanzenböck et al., 2000).
Previous allozyme studies have shown that C. gobio has experienced considerable population fragmentation during the Pleistocene (Riffel & Schreiber, 1995; Hänfling & Brandl, 1998a,b; Eppe et al., 1999). In a recent study based on mtDNA sequences, Englbrecht et al. (2000) identified six major clades or haplotype groups across the European range of C. gobio. They proposed that the founder population of C. gobio inhabited the Parathetys Sea some 10 ma BP, invaded Europe during the Late Pliocene, and colonized Central and Northern Europe along the Danubian – Upper Rhenian – Elbe – Baltic axis (see also Kontula & Väinölä, 2001). During the Late Pliocene a Central European lineage must have reached the north-eastern North Sea along a northern route (Dnjepr and Baltic plain) and has colonized the East-Atlantic region south-west to northern Spain. However, because of the restricted spatial sampling of the previous studies, it remained unclear whether all bullhead clades had been identified to the west of the rivers Vistula and Danube, and what the full distribution was of the clades. Additional contentious issues included the timing of the population expansions and reductions caused by the uncertain calibration of the molecular clock, and the identification of the various colonization routes in the western Palearctic during the Late Pleistocene.
A phylogeographical analysis (as measured at a 771-bp and a smaller 482 bp fragment of the mitochondrial control region) was performed here on a regional scale to address the colonization, radiation and extinction of C. gobio using inferences in a multidisciplinary historic perspective.
Materials and methods
Collection of samples
Juvenile and adult bullhead were collected across Western Europe (UK, Belgium, France and the Netherlands) and Central Europe (Slovakia and Germany) at representative sites with electrofishing (Table 1 and Fig. 1). Previous personal research had indicated these areas to harbour phylogeographically informative populations. In total, 35 (labelled V1–V52) rivers and lakes were sampled between 4°W and 30°E and 47–62°N (data set I). Biopsies (fin clips) were collected and preserved in either 100% ethanol or a solution of salt (NaCl) saturated dimethylsulphoxide.
Table 1. Sample code, catchment, river, geographical location of sampling sites and control region (771 bp) haplotypes as identified in each population of Cottus gobio (data set I). Between brackets are given the number of individuals sharing a given haplotype.
The tissue was incubated overnight in cetyltrimethylammonium bromide (CTAB)-buffer [2% CTAB, 1.4 m NaCl, 0.2%β-mercaptoethanol, 20 mm ethylenediaminetetraacetic acid (EDTA) and 100 mm Tris pH 8.0] at 55 °C (Winnepenninckx et al., 1993). DNA, extracted with phenol-chloroform, was precipitated with ethanol, followed by resuspension in 50 μL distilled water.
Sequencing of the control region
An mtDNA control region fragment of 1.0 kb was amplified using the HN20 (Bernatchez & Danzmann, 1993) and the Pro19-primers (Bernatchez et al., 1995), located in the phenylalanine tRNA-gene and the proline tRNA-gene, respectively. The reaction product was purified with the ‘GFX PCR DNA and Gel Band Purification’ kit (Amersham Biosciences, Little Chalfont, UK) and eluted in 50 µL 10 mm Tris-Cl (pH 8.0). Samples were cloned into the PCR2.1-TOPO vector (Invitrogen, Paisley, UK) and sequenced with standard M13 primers using the ‘SequiTherm Excell II’ kit (Epicentres Technologies, Madison, OH, USA) according to the protocol of the producer. The reaction products (771 bp) were analysed on an automatic sequencer ‘LI-COR GeneReadIR DNA system’ with the AlignIR software (LI-COR).
Analysis of haplotypes
Alignment of sequences was undertaken using the clustalw, version 1.7 software (Thompson et al., 1994). Two data sets were analysed: a full length control region sequence of 771 bp from the newly collected samples (data set I – labelled HV) and a shorter control region sequence (482 bp) from a comprehensive data set whose nucleotides are shared between the published data of Englbrecht et al. (2000) (523 bp – labelled HE) and this study (data set II). The haplotypes of the short fragments (482 bp – data set II) were relabelled and named according to the most common haplotype found by Englbrecht et al. (2000) or in this study (in total 63 HE and HV haplotypes). Mitochondrial DNA polymorphism was estimated as nucleotide (π; Nei, 1987) and haplotype diversity (h; Nei & Tajima, 1981) and within drainage and among post hoc defined haplotype groups using the program arlequin, version 2.0 (Schneider et al., 2000).
In a first step we used Modeltest 1.05 to select the model of DNA evolution that best fits data set I (771 bp fragment including as outgroup C. cognatus) based on log likelihood scores (Posada & Crandall, 1998). The Hasegawa–Kishino–Yano model (Hasegawa et al., 1985), with a gamma distribution shape parameter of 0.816, a proportion of invariable sites of 0.745 and base frequencies of A: 0.313, C: 0.217, G: 0.179 and T: 0.291 proved to give the best fit. We also compared the base composition for all sequences using a 5%χ2-test on the average composition (puzzle; Strimmer & von Haeseler, 1996). The molecular-clock hypothesis was tested assuming the TrN model (Tamura & Nei, 1993) with γ-distributed rates across sites, with the likelihood ratio test for the clock hypothesis implemented in puzzle. Gaps were treated as missing data.
To infer a phylogeny of the 771 bp fragment, we used maximum parsimony (MP), maximum likelihood (ML) and distance-based methods. With MP the following model was used: weighted parsimony with all characters unordered, gaps treated as missing data, transition : transversion (ti/tv) ratios of 1.84 : 1.00 and transversion parsimony (ti/tv weight 0 : 1). The ML analysis was performed using the parameters estimated under the best-fit model (quartet method). With MP and ML we conducted the exhaustive search method and bootstrapped (n = 1000 or 100, respectively) with the branch and bound, and heuristic algorithm, respectively. A neighbour joining (NJ) phenogram of pair-wise distances (Kimura-2 parameter) between haplotypes was also prepared. Analyses were performed with paup vs. 4.0b4a (Swofford, 1998), puzzle vs. 4.0.2. (Strimmer & von Haeseler, 1996) and mega vs. 2.1 (Kumar et al., 2001). The slimy sculpin, C. cognatus, was employed for outgroup comparison (0.049–0.065 p-distance) (Kiril'chik & Slobodyanyuk 1997), using bootstrap values (1000 replicates) to assess the node support. In order to test for the presence of saturation in the sequences, we compared the saturation index expected when assuming full saturations with the observed saturation index (DAMBE 4.0.75; Xia & Xie, 2001). A t-test with infinite degrees of freedom was used to assess statistical significance.
As the phylogenetic analysis detailed only some evolutionary aspects, an analysis of hierarchical genetic structure as adapted to molecular data (amova as implemented in arlequin vs. 2.0; Schneider et al., 2000) was undertaken. The significance of variance components and F-statistic analogues were tested by multiple permutation of the original data set. The amova was run using the full 771 bp and the shorter 482 bp control region fragment, which integrated the present data with that from Englbrecht et al. (2000).
We also tested whether C. gobio had experienced population expansions and contractions in two ways. (1) A frequency distribution of the pair-wise number of mutational differences for the whole data set and subsets (clades I, III and IV) was constructed as implemented in dnasp vs. 3.53 (Rozas & Rozas, 1999). A population, which has recently experienced a population growth and decline, shows a smooth, unimodal and Poisson-like distribution; significance was tested by coalescence. The age expansion number (τ = μ t with μ the mutation rate and t the expansion time in generations) was used to compare the timing of population expansions among bullhead clades. (2) A nested clade analysis (NCA) (Templeton, 1998) was performed on an unrooted minimum spanning network (MSP) drawn by hand and using the MSP output of arlequin vs. 2.0 (Schneider et al., 2000) with probabilities assessed as implemented in the software program parsprob vs. 1.1 (available at http://bioag.byu.edu/zoology/crandall_lab/programs.htm). The null hypothesis was tested that there were no associations between the haplotypes and geographical locations at different genealogical levels. The nesting design was constructed following the rules described in Templeton et al. (1992). The program geodis vs. 2.0 (Posada et al., 2000) was used for implementing the calculations of the distance measures and their statistical significance. The distance matrix consisted of the shortest river distances as measured manually on maps. Within drainage, distances between populations were measured along present river courses. Distances between populations from different drainages were measured using the information on historic connections such as river captures and common ancestral drainage systems. If more than one connection between two populations was possible, the shortest distance was incorporated. The historic connections used are indicated in Fig. 1. Problems arose regarding the historic interpretation of the connections among some river basins. In such cases the Euclidean distance was used (straight lines in Fig. 1). All UK samples were measured relative to a central point in the North Sea (51°52′ 00′ N; 2°37′ 00′ E), which is based on the assumption that the Thames was connected to the Scheldt, Maas and Rhine during periods of low sea level (Gibbard, 1988). As the matrix of river distances was constructed by hand, errors were checked by conducting a cluster analysis of the matrix. The hydrographical distances estimated from the data were Dc(x), Dn(x), I–Tc(x) and I–Tn(x) (Templeton, 1998). Dc(x) is the average distance of all individuals of clade x from their geographical centre, indicating how widespread a clade is. Dn(x) is the average distance of all members of clade x from the geographical centre of nesting clade y, estimating how far individuals of clade x haplotypes are from individuals bearing clade y haplotypes. I–Tc(x) and I–Tn(x) are the average Dc and Dn values of all the interior clades within the nesting clade minus the average Dc and Dn values for all interior clades within nesting clade y. They give an estimate of the distribution of old vs. young clades. The distribution of all distance measures was determined by recalculating all distances over 10 000 random permutations of clades against sampling locality. Interpretation of the contingency tests followed the updated (24.10.01) version of the inference key of Templeton (1998) (http://bioag.byu.edu/zoology/crandall_lab/geodis.htm).
Alignment of sequences (GENBANK accession numbers AF381189-AF381217) revealed a total of 59 haplotypes among the 123 bullheads from the 35 sites analysed at the 771 bp control region sequence (data set I) (Appendix 1). Seventy-two positions (9.3%) were polymorphic, of which 45 positions (5.9%) were parsimony informative, including three indels. No evidence for the presence of nuclear copies was found (e.g. intraindividual allelic variation). When reducing the data to the 482 bp control region sequence, which is the fragment in common with the data of Englbrecht et al. (2000), 40 haplotypes were found (Appendix 1). All haplotypes were drainage-specific (Danube, Rhine, Elbe, North Sea and Gulf of Biscay). The number of haplotypes and nucleotide diversity (π) (based on the 482 bp fragment and including the data of Englbrecht et al. (2000) within drainage) was lowest in the Elbe, Upper Danube, Upper Rhine, Poland, Seine, Adour and Brittany (Table 2), higher in the Baltic (π = 0.00644), Lower Danube (0.00700), Lower Rhine (0.00599), and highest in the Netherlands (0.015535) because of the presence of haplotypes of lineage I, III and IV. Although sample sizes were often small, most catchments indeed harboured very few haplotypes [ E06 (Englbrecht et al., 2000), V23, V28, V29 (Table 1) and sites mentioned in Knapen et al., 2002]; but see E05, E43 (Englbrecht et al., 2000), V20, V21 and V34). Additional individuals were genotyped at those sites where we noticed considerable diversity on a small number of fish.
Table 2. Number of fish collected ( n ), haplotype number ( nh ) and type, and nucleotide diversity within drainage (data set II, 482 bp) of the control region of bullhead.
The base composition for all sequences was not significantly different (5%χ2-test on the average composition) (puzzle). The molecular-clock was not enforced since puzzle showed rejection of the molecular clock hypothesis. There was no indication of saturation, either in the full data set of 771 bp, or clade I, III or IV (DAMBE). Between clade sequence differences of the control region (data set II) point to weak differences between clades I and II, and strong differences between clades I and IV, II and IV–V–VII (Table 3).
Table 3. Matrix of between clade sequence difference (p-distances including standard error) of the mitochondrial control region of Cottus gobio (data set II, 482 bp).
0.006 ± 0.003
0.010 ± 0.004
0.018 ± 0.006
0.031 ± 0.007
0.039 ± 0.009
0.021 ± 0.006
0.028 ± 0.007
0.031 ± 0.008
0.023 ± 0.007
0.014 ± 0.005
0.023 ± 0.006
0.023 ± 0.007
0.024 ± 0.006
0.028 ± 0.007
0.023 ± 0.007
0.027 ± 0.007
0.031 ± 0.008
0.022 ± 0.006
0.020 ± 0.006
0.021 ± 0.007
0.023 ± 0.007
Maximum likelihood, MP and distance-based phylogenetic analysis based on neighbour joining of Kimura-2 parameter distances were performed on the 771 bp fragment (Figs 1 and 2). The five major clades were congruent among the three methods, although on average bootstrap support was low because of the small number of phylogenetically informative sites. The root of the bullhead haplotypes was positioned within clade I, indicating that this clade contains the most ancestral haplotypes. Clade IV was located at the tip of the phenogram. Clade I showed a deeper branching profile than clades III and IV; haplotypes from the Danube – Tatra mountains (HV1 and HV2) and the Elbe (HV8 to HV11) are distinct. Unexpectedly, the haplotypes from the Scheldt – V28 and V29 (HV33 and HV34) clustered within the Central European clade I.
A minimum spanning tree was constructed from the squared distance matrix among haplotypes of the 482 bp fragment (data set II) to understand the phylogenetic relationships between the haplotypes and to infer the evolutionary history (Fig. 3). It yielded the major groupings (clades) as defined by Englbrecht et al. (2000), with the addition of a new clade (Scorff, VII). The tree was used to assign clade levels for a NCA (see below). The parsimony criterion was set at nine mutations, and excluded the Adour samples from NCA.
The pair-wise number of mutational differences in haplotype sequence (771 bp – data set I) deviated significantly from the expected distribution under a population growth–decline model (Fig. 4). Frequency peaks were observed at 2, 13, 18 and 24–27 mutations, the latter peak representing differences between the major clades (except clade II). Mismatch distributions differed considerably among clades with the mean number of differences being the lowest in clade III (average number of mutational differences ‘standard deviation: 1.8 ‘1.3) and IV (2.7 ‘1.6), and the highest in clade I (14.5 ‘9.3). In the case of clade I three frequency peaks were observed. The mismatch distribution within clade IV fitted well the model of population growth–decline. The observed values of the age expansion parameter (τ) differed considerably among clades (I: 3392, III: 5000 and IV: 1787 generations).
A population-based analysis was undertaken using a hierarchical analysis of variance (anova) (Schneider et al., 2000) and an NCA (Templeton et al., 1992). The first method tracks population dynamics by taking into account the sequence information and haplotype frequency at various arbitrarily defined hierarchical levels. The second test takes into account haplotype frequencies, their phylogenetic hierarchy and geographical distances.
The hierarchical amova, based on eight post hoc defined groups (UK, Scheldt, Maas, the Netherlands, Brittany, Main, Elbe and Danube), showed a high variance among groups and among populations within groups in both the 771 bp (data set I) and 482 bp fragment data (data set II) (Table 4). Variance within populations was small with low diversity. Thus, differences between the eight groups and between all 33 samples are higher than any variance observed within each population. Indeed, only the Dutch samples showed high haplotype diversity, whilst most samples had only one or two haplotypes.
Table 4. Hierarchical analysis of genetic structure analysis of molecular variance ( amova ) in bullhead for the eight partitions of samples at the full 771 bp control region sequence and a shortened 482 bp control region sequence of bullhead at 33 sites (data set I). The eight groups have been defined post hoc (see text).
Source of variation
D-loop (771 bp)
D-loop (482 bp)
Statistical significance is shown at the 1% level;
* * * and the 0.01% level;
Samples included in the analysis are the same as in Table 1 and Fig. 1; d.f., degrees of freedom.
The second approach, NCA, differentiated between historic and recurrent events by incorporating distance information, frequency and mutational information. Of the 40 contingency tests at six hierarchical levels, 15 hypotheses could not be assessed, eight represented restricted gene flow with isolation by distance, seven represented a contiguous range expansion, four past fragmentation events and six were inconclusive (Appendix 2). When adjusting for spurious results (Bonferroni correction) the hypotheses of clades 2–14, 3–4 and 4–2 couldn't be tested any more; the hypothesis of clade 1–30 changed from restricted gene flow to a contiguous range expansion. At the highest level (6-level clade) the hypothesis could not be tested. Of the 5-level clades, only clade 5–2 (clades III to VII) differentiated significantly by contiguous range expansion. Of the 4-level clades, 4–1 was not significant, 4–2 and 4–3 had an inconclusive outcome and 4–4 a contiguous range expansion. Contiguous range expansions and fragmentation events dominate the eastern range (sample V1 – Turiec River; sample E15 (HE11) – Kolpa River; samples E36–40 – Vistula River) whereas fragmentation (sample E53 – Seine River; samples E54–56 – Adour River; sample V12 – Scorff River) and restricted gene flow dominate the western range (Appendix 2). Interesting in clade IV is the gene flow between the Scheldt and the Neet basin (1–30) and the range expansion to the Humber (2–14). Thus, the texture of the pattern differs between the western and eastern group; the former is dominated by a single clade which shows smaller internal expansions and fragmentations, whereas the latter includes a set of clades which are the result of successive expansions and fragmentations (Fig. 5).
Spatial differentiation west of the rivers Vistula and Danube
Most haplotypes found in this study were assigned to one of six clades defined by Englbrecht et al. (2000) and confirmed in Northern Europe by Kontula & Väinölä (2001), excluding the regions of clades V (Seine basin) and VI (Adour basin) that were not sampled. The Breton population was identified as a new phylogenetically distinct clade (clade VII). Bullhead from the Loire basin is distinct (Eppe et al., 1999) and might either belong to the Breton clade (which is likely given the local geography), or represent an eighth clade. Similar uncertainty exists for the Garonne basin (Eppe et al., 1999), which possibly encompasses the Adour clade, or may represent a new clade. The sampling range is now sufficiently comprehensive to describe in detail the geographical extent of clades I, III and IV (Fig. 5). The range of clade I extends over the catchments of Danube and Rhine (Oberrhein and Hochrhein, but in two cases also in the Lower Rhine), Weser, Elbe and the Baltic region (Kontula & Väinölä, 2001). Hybridization between an eastern group and a western group is evident in northern Scandinavia (Kontula & Väinölä, 2001). Unexpectedly, one sample in central Belgium belongs to clade I, which may represent the first documented introduction of bullhead, a species considered less susceptible to anthropogenic transfers. Clade II extends from Poland into the eastern Baltic (Kontula & Väinölä, 2001). Clade III haplotypes were observed in the Lower Rhine and the Maas drainage, one population of the Scheldt and in the Middle Rhine. The range of clade IV haplotypes extends from the Lower Rhine into the Middle Rhine, the Scheldt and the United Kingdom. The range of bullhead to the east (including the Dnjepr and Volga catchments) includes clade I and possibly another clade. Clades V (Seine), VI (Adour), VII (Brittany – Loire?), and may be an eighth clade (Garonne) stretch along the French Atlantic coast.
Temporal differentiation of the clades
An important but contentious issue is the timing of the events, which led to the differentiation of the various distinct clades. Three sources of information are considered here. First, although molecular clock rates are not necessarily consistent across taxa and loci, there are some widely accepted generalizations for applying molecular clocks. Noncoding regions, such as the mitochondrial control region, are expected to evolve faster than coding regions because of the lack of mutational constraints. This has been demonstrated for many groups of animals including cichlid fishes (Meyer et al., 1990; 6–7% divergence per million years), Arctic charr (Brunner et al., 2001; 5–10% per million years) and butterfly fishes (McMillan et al., 1999) with as much as 33–100% per million years at selected hotspots. As Cichlidae and Cottidae belong to the order of the Percomorpha, one option is to adopt the calibration from cichlid fishes (6%) to estimate divergence times in C. gobio(Table 5).
Table 5. Calibration of the control region of bullhead according to three rates by means of pair-wise comparisons among clades. Ages are given in million years (ma).
Secondly, molecular clock estimates of the control region of bullhead itself have been made by Englbrecht et al. (2000) (1% per million years) and Kontula & Väinölä (2001) (2.4–8.4% per million years). Two values (1 and 6% per million years) were used to estimate the divergence times among the major clades of haplotypes (Table 5).
Thirdly, it might be possible to identify an association between major mutation events and postulated population expansions (related to glaciations) by incorporating information from the pair-wise haplotype divergence estimates at the control region. Thienemann (1950) identified in Central Europe a faunal group of ‘southern ice cap marginal species’ (including bullhead) in response to glaciation. They inhabited the Alps and Central Europe during preglacial times and moved to the south of the unglaciated belt of Southern Europe during the Ice Ages. Rapid expansions of bullhead can be envisaged at two times during the glacial/interglacial phases. At the beginning of the interglacials, when the climate warms, fish either follow the receding ice margin northwards as vast stretches of habitat become available, or move upstream into mountain refugia, which induces major discontinuities or even southerly extinctions. At the end of the interglacials, when rapid climatic cooling forces the (sub) boreal zone southwards, bullhead descend from the refugia and expand southwards. The northern range becomes covered with ice sheets, and thus inaccessible for bullhead.
The four major phases of mutations appear to coincide with the population expansions at the end of the four interglacials southwards and within lowlands, and at the beginning of the interglacials, northwards. By using the assumptions for a molecular clock of the control region, the four peaks in pair-wise sequence differences (0.4, 1.7, 2.6 and 3.5%) would correspond to approximately 0–11, 110–130, 210–230 and 310–330 ka BP, respectively, which translates into a molecular clock of about 9% per million years (Table 5).
Molecular clock rates should be interpreted with caution as lineage-specific rate variation has been demonstrated in many groups (Rambaut & Bromham, 1998). We have assumed a constant rate, although a molecular clock likelihood ratio test rejects the clocklike behaviour. Small and isolated populations of bullhead might suffer from drift, whereas large populations are more likely to be under selective pressure. In addition, local extinction of intermediary lineages seems to be a recurrent feature in bullhead.
A scenario for the colonization history of bullhead
Thus, during warm interglacial epochs, such as today, the bullhead must have survived in refugia at higher altitudes; the distribution was probably patchy and not unlike that of today. The putative palaeorefugia of clade I are rather complex, as the Danube catchment is bordered by numerous high altitude regions: Alps (4000 m), Tatra mountains (2600 m), Carpathians (2300 m) and Dinarides (2400 m) (Fig. 5). Notice also that the populations from Borovo (Carpathians), Kolpa (Dinarides), Nera (Carpathians) (Englbrecht et al., 2000) and Turiec (Tatra) in the Middle and Lower Danube are well differentiated, and thus have been isolated for some time. Clade II bullhead probably retreated into the Tatra Mountains (2600 m). The remaining putative refugia are located along the south-western Palearctic. Clade III most recently (Würm) retreated in the Vosges (1000 m) during a warm period or maybe in a limestone area with a milder microclimate during a cold period (Stewart & Lister, 2001). A more ancient (during the Günz and Mindel) northern location cannot be excluded. Clade IV probably survived in the Ardennes (B–F) (700 m) and possibly during the last Glacial in the limestone region of Dartmoor (600 m) (Stewart & Lister, 2001). The Langres/Morvan region (900 m) (clade V), Central Massif (1800 m) (clade VII) and Western Pyrenees (3300 m) (clade VI) complete the string of western refugia. The upper temperature limit to maintain a stable population of bullhead probably approaches 20 °C in summer as judged from its current distribution range, which overlaps largely with brown trout Salmo trutta (Lelek, 1980).
From the molecular clock data and the location of the palaeorefugia, the colonization route and timing of the colonizations of the western Palearctic by bullhead can be completed as follows (Fig. 6). From the early Pleistocene onwards, bullhead occupied the Lower Danube as temperatures remained mild across the western Palearctic (Lowe & Walker, 1997). Although a dispersal wave before the first glacial is not visible in the sequence variation, at the beginning of the first glacial (Günz about 430 kA BP) warming resulted in a first north-western dispersal wave along a route following most likely the Dnjepr River, Pritpjat Swamps, Vistula River and Baltic Plain catchments (Englbrecht et al., 2000; Nagel, 2000). Most populations of this old Günz colonization have disappeared because of more recent glacials (Bernatchez & Wilson, 1998), but clade III survived at the western edge of the Baltic Plain. At the end of the Günz glacial (325 ka BP) bullhead expanded from clade III further west (clade IV). The Mindel interglacial resulted in a contraction of clades III and IV, and expansion southward of clade VI and VII. At the beginning of the Riss interglacial, the clades III, IV, VI and VII retreated into refugia, and group I expanded northwards (subclades I east and I west). At the end of the Riss interglacial, group IV expanded along the English Channel into the Seine Basin (clade V), and several Danubian subclades developed in the Dinarides, Carpathians and Tatra mountains. Clade II expanded from clade I during the Würm glacial at about 70 ka BP, just before a major glacial cooling event (Lowe & Walker, 1997) and possibly because of river capture in the Tatra mountains. During the current Würm interglacial, clades I and II have expanded north, whereas clades III, IV and probably V, VI and VII have locally expanded after retreating from their respective interglacial refugia.
The clades I, II, III and IV show the impact of a recent population expansion. Phylogeny of the control region suggests a simultaneous radiation from a few ancestral haplotypes within each clade (clades III and IV). Control region and microsatellite population data (Hänfling et al., 2002) confirm the discreteness of most populations within each drainage basin: bullheads are slow dispersers on a short time scale. The timing of the most recent population expansion is probably the beginning of the last interglacial (20 ka BP) and might harbour elements from the before last interglacial (150 ka BP) (see above). For example, British bullhead (clade IV) seems to occur in three groups (Fig. 2): Cornwall (the oldest), Humber (most recently colonized from the Upper Scheldt) and Humber, Wales and Cornwall (colonized from the Middle Scheldt). As far south, 50°N of the UK, was covered by ice during the Riss and Würm glacials, the colonization source(s) must have been to the south-east. Hence, unlike the conclusions of Englbrecht et al. (2000), our analyses point towards the strong impact of glacial periodicity on bullhead population dynamics.
Morphometric, allozyme, microsatellite and mitochondrial information point to the discreteness of most populations (Riffel & Schreiber, 1995; Hänfling & Brandl, 1998a,b; Eppe et al., 1999; Knapen et al., 2002). Limited gene flow for a certain amount of time was sufficient to cause drift among semi-isolated populations. However, there are at least two regions where this is not the case with bullhead. An interpretation within a coalescent theory framework would mean that the populations of the Lower Rhine represent a recent secondary contact. Many haplotypes belonging to clades III and IV are found in the same drainage. There is even a haplotype of clade I present in the Ketelmeer (NL) and several haplotypes in the Middle Rhine (Englbrecht et al., 2000; Nolte, pers. comm.). Such features explain the suggestion of de Nie (1996) that a ‘lowland’ bullhead population inhabiting the lakes and slow flowing major rivers (the mixed clade I, III and IV identified in this study) and an ‘upland’ (clade III) bullhead population inhabiting the small fast flowing mid-altitude rivers are found in the Netherlands. A similar suture zone is observed in the northern Baltic where two morphs of clade I hybridize (Kontula & Väinölä, 2001). Initially both groups were morphologically and ecologically identified as two species. Again the regional habitat has only recently become vacant based on the timing of isostatic changes in northern Scandinavia during the Holocene.
The Late Pleistocene structure of most (sub)boreal and temperate zone freshwater fishes across Europe [Atlantic salmon (Nilsson et al., 2001), brown trout (Bernatchez, 2001), Arctic charr (Brunner et al., 2001), chub (Durand et al., 1999), grayling (Koskinen et al., 2000), perch (Nesbøet al., 1999) and whitefish (Bernatchez & Dodson, 1994; Hansen et al., 1999)] typically shows a Danubian clade, an Atlantic (western) clade and an eastern clade. There is no other species than bullhead with such high and ancient clade diversity at the western edge of the Palearctic. In addition, many fish taxa show a number of regional clades in the northern Mediterranean which, however, rarely have given rise to widespread populations as in terrestrial animals (Hewitt, 2000). Palaeorefugia of freshwater fish are species-specific, as their individual life-style, ecophysiology and life-history traits determine their location. Postglacial colonization by most fishes seems initially to have happened along an east–west axis along the Danube catchment, whereas more recent population expansions have occurred on a north–south axis. This fits the arguments of Bernatchez & Wilson (1998) who showed that latitude correlates with a breakpoint in nucleotide diversity and tree branching between nonglaciated and glaciated areas (at 46°N), with evolutionary rates of dispersal of northern species being 10 times larger than southern species.
Phylogenetic, nested clade and hierarchical analysis of molecular variance show that C. gobio has one of the most diverse and fine grained genetic patterns of the terrestrial and aquatic animals analysed so far in the western Palearctic (Bernatchez & Wilson, 1998; Hewitt, 2000). Colonization happened along two major routes and in several waves, which can be dated with great precision. The distinction between clades is unambiguous and points to periods of colonization caused by habitat expansion followed by population fragmentation caused by habitat loss, and again followed by population expansion. The species has thus responded actively and regularly to the fluctuations in its natural range. An issue remaining is the population structure in the Russian Plain, where some of the most ancestral populations might be found. A study in this area has therefore priority.
This project has been funded by the Ministry of the Flemish Community (AMINAL/TWOL and VLINA/97/01) and an EU Marie-Curie fellowship to B. Hänfling, A. Davey, T. De Mol, A. Eklöv, P. Gerard, E. Hartgers, J. Holèik, M. Holl, D. Hopkins, R. Klupp, S. Mcguinty, J.-C. Philippart, P. Seeuws, M. Todd, S. Tomms, B. Trigg, H. Verreycken and many fishery owners collected samples or helped during the sampling process. T. Kontula provided Finnish reference samples and A. Goto tissue of C. cognatus. C. Englbrecht made sequences kindly available and J. Freyhof provided the coordinates of these samples. D. Posada kindly helped with the NCA. We benefited greatly from discussions with C. Englbrecht, A. Gomez, T. Huyse and J. Van Houdt, and the comments from two anonymous reviewers. This paper has been written in memory of Huw Griffith.