Natural selection in the water: freshwater invasion and adaptation by water colour in the Amazonian pufferfish


  • G. M. COOKE,

    1. Molecular Ecology Laboratory, Department of Biological Sciences, Macquarie University, Sydney, NSW, Australia
    Search for more papers by this author
  • N. L. CHAO,

    1. Departamento de Ciências Pesqueiras, Universidade Federal do Amazonas, Manaus, Brazil
    Search for more papers by this author

    1. Molecular Ecology Laboratory, Department of Biological Sciences, Macquarie University, Sydney, NSW, Australia
    2. Molecular Ecology Laboratory, School of Biological Sciences, Flinders University, Adelaide, SA, Australia
    Search for more papers by this author

Luciano B. Beheregaray, Molecular Ecology Laboratory, School of Biological Sciences, Flinders University, Adelaide, SA 5001, Australia.
Tel.: +61 882 015 243; fax: +61 882 013 015; e-mail:


Natural selection and ecological adaptation are ultimately responsible for much of the origin of biodiversity. Yet, the identification of divergent natural selection has been hindered by the spatial complexity of natural systems, the difficulty in identifying genes under selection and their relationship to environment, and the confounding genomic effects of time. Here, we employed genome scans, population genetics and sequence-based phylogeographic methods to identify divergent natural selection on population boundaries in a freshwater invader, the Amazonian pufferfish, Colomesus asellus. We sampled extensively across markedly different hydrochemical settings in the Amazon Basin and use ‘water colour’ to test for ecological isolation. We distinguish the relative contribution of natural selection across hydrochemical gradients from biogeographic history in the origin and maintenance of population boundaries within a single species and across a complex ecosystem. We show that spatially distinct population structure generated by multiple forces (i.e. water colour and vicariant biogeographic history) can be identified if the confounding effects of genetic drift have not accumulated between selective populations. Our findings have repercussions for studies aimed at identifying engines of biodiversity and assessing their temporal progression in understudied and ecologically complex tropical ecosystems.


Natural selection and its role in ecological adaptation is quintessential in understanding the origin of biodiversity. Yet, despite this, the role of ecological adaptation and divergence in the face of homogenizing gene flow remains controversial (Coyne & Orr, 2004; Bolnick & Fitzpatrick, 2007; Coyne, 2007; Nosil, 2008; Schluter, 2009). One reason for such controversy is the difficulty in disentangling ecological divergence from vicariant biogeographic history. This distinction usually requires a system in which the existence of an allopatric phase is very unlikely in the context of evolutionary history (Endler, 1982; Coyne, 2007; Nosil, 2008). In fact, spatially defined speciation processes have dominated discussions until recently and allopatry has generally been used as the null (Coyne & Orr, 2004). Yet, many of the global centres of biodiversity are highly dynamic and poorly studied tropical ecosystems for which even large-scale vicariant events are under documented (Beheregaray, 2008). Although the role of ecological divergence in the accumulation of tropical diversity is becoming increasingly recognized (e.g. Smith et al., 1997, 2001; Schneider et al., 1999; Garcia-Paris et al., 2000; Ogden & Thorpe, 2002; Cooke et al., 2012a), the engines of biodiversity in the tropics remain controversial and largely identified within the context of allopatric isolation (for review see Moritz et al., 2000; Hoorn et al., 2010). Fortunately, recent advances in population genomics are enabling scientists to incorporate information from adaptive variation into ecological and evolutionary studies of ecologically important nonmodel species (Travers et al., 2007; Ellegren, 2008a,b; Mardis, 2008). This has opened up an avenue of research opportunities for surveys of ecological divergence in understudied, species-rich ecosystems.

One such advance is the ‘genome scan’ whereby loci associated with selected genomic regions either directly or via hitchhiking are identified (Jensen et al., 2007). This method is based on the understanding that genetic drift, inbreeding and migration have genome wide effects, whereas selection leaves signatures only at those loci that are adaptive to a particular condition (Luikart et al., 2003), a phenomenon known as ‘heterogeneous genomic divergence’(reviewed in Nosil et al., 2009). Here, differentiation will accumulate in regions under selection, whereas in other regions, genetic drift will require extended periods of time to accumulate (Wu, 2001; Gavrilets & Vose, 2005; Nosil et al., 2009). Thus, examining heterogeneous genetic divergence via a ‘genome scan’ can be particularly informative in populations at an early stage of divergence. In such cases, loci associated with selected genomic regions (genomic islands) still reflect the evolutionary history of adaptive divergence (Via & West, 2008; Via, 2009). An FST-based genome scan (sensuBeaumont & Balding, 2004) can be used to assess the role of selection in the origin and maintenance of diversity across ecological gradients or divides (Nosil et al., 2009). Simultaneously, loci identified as putatively neutral can be employed to address more classical questions of population genetics (Luikart et al., 2003; Nosil et al., 2009).

Freshwater fishes within Amazonia offer an interesting model for hypothesis testing across the speciation spectrum. Fish diversity in the Neotropics is unrivalled, with around 18% of the Earth’s known fish species concentrated in < 0.003% of the Earth’s water (Vari & Malabarba, 1998; Albert et al., 2011). Within this system, fish dispersal depends on the complex formation and fusion of anabranching riverine environments. Because of this, tracing the chronology of palaeogeographic events has improved our understanding of how allopatric scenarios have shaped Amazonian fish diversity (e.g. Hubert & Renno, 2006; Ready et al., 2006; Hubert et al., 2007; Willis et al., 2007; Cooke et al., 2009, 2012b; Sistrom et al., 2009; Piggott et al., 2011). Yet, the Amazonian aquatic environment also sustains marked hydrochemical and environmental discontinuities that impose physiological constraints upon its aquatic communities (Junk et al., 1983; Henderson & Crampton, 1997; Rodriguez & Lewis, 1997; Saint-Paul et al., 2000; Petry et al., 2003; Crampton, 2011). Based largely on optical and sedimentary characteristics, three water types within the basin have been identified (Sioli, 1984): (i) white water (Andean origin), which is turbid in nature and characterized by large amounts of dissolved solids; (ii) clear water (drains Brazil’s Precambrian shields), which is comparatively transparent containing low content of dissolved solids and (iii) black water, which is transparent yet stained by tannins and humic acids leached from vegetation, differing most dramatically from the latter by its low pH (Fig. 1; Table 1). Considering that ecological gradients or divides can influence population structure via the action of divergent natural selection (Endler, 1977; Beheregaray & Sunnucks, 2001; Hendry, 2004; Nosil et al., 2005; Crispo & Chapman, 2008), hydrochemical discontinuities may therefore be a powerful force in the accumulation of Amazonian fish diversity.

Figure 1.

 Sampling localities of Colomesus asellus in the Amazon Basin. Each site is distinguished by a unique colour and site label. Inset (i) shows the meeting of the black waters of the Negro River with the white waters of the Amazon River, (ii) shows the meeting of the Amazon River with the clear waters of the Tapajós River and (iii) shows the sampling area within northern South America.

Table 1.   Water ‘colour’, sampling locations, sample size, and average hydrochemical variables collected from 2005 to 2008 (turbidity, cm; temperature, °C; pH; dissolved oxygen (mg/L), OD; oxygen saturation, O2%).
 NegroN13°4′44.00″ S/60°14′44.00″ W376.
 MadeiraM13°28′14.00″ S/58°52′5.00″ W285.529.77.15.782.4
 AmazonA13°20′40.00″ S/60°7′10.00″ W3112.328.87.26.786.3
 AmazonA23°6′56.00″ S/59°32′19.00″ W2818.829.67.16.385.2
 AmazonA33°4′39.00″ S/58°13′13.00″ W4318.328.77.14.884.0
 AmazonA42°33′7.00″ S/57°1′59.00″ W4010.529.27.26.485.6
 AmazonA52°10′21.00″ S/54°58′21.00″ W4512.529.07.26.382.0
 AmazonA62°28′10.00″ S/54°30′5.00″ W521529.77.26.687.9
 Rio TapajósT12°52′17.00″ S/55°9′38.00″ W2111829.

Here, we employ a combination of genome scans, population genetics and sequence-based phylogeographic methods to assess the relative roles of divergent natural selection and vicariant biogeographic history in a freshwater invader, the Amazonian pufferfish, Colomesus asellus (Tyler, 1964). The Amazonian puffer is distributed throughout the central Amazon Basin and is relatively abundant in the shallow habitat of the white and clear water rivers; however, it has also been reported in the brackish waters of the Amazon estuary (Araujo-Lima, 1994; Monks, 2006). The sister species, Colomesus psittacus, is distributed around the north of South America in brackish and coastal waters. Colomesus asellus are thought to have split from C. psittacus and invaded South America during the Miocene marine transgressions in northern Venezuela associated with the northward-draining palaeolake, Lake Pebas [25–11 Million years ago (Ma)] (Lovejoy et al., 1998; Monsch, 1998; Hoorn et al., 2010; Yamanoue et al., 2011). Our study system consists of four large river systems (Amazon, Negro, Tapajós and Madeira Rivers), three major hydrochemical settings (white, black and clear water), two ecotones (where black meets white and white meets clear water) and a ‘control’ (Madeira River) in which two river systems of the same water type meet (Fig. 1i, ii). Further, the white waters of the Madeira River intersect the white waters of the Amazon River mid-transect.

On the basis of the contemporary patterns of hydrochemistry within our study system, in concert with the putative biogeographic history of the puffer and the geomorphological history of the Amazon Basin, we have two predictions. Firstly, if the ancestor of C. asellus colonized South America via the northward-flowing Lake Pebas, we predict a west-to-east trajectory of colonization with signals of demographic and range expansion events associated with the modern formation of the Amazon River. Secondly, if water colour represents a strong mechanism of divergent natural selection, signatures of selection, identified by outlier loci, and overall population divergence should be greater between selective environments than within them. Here, we show evidence for both ecological divergence due to water colour and historical vicariance powering population diversification in Amazonia. Using both genome scans and phylogeographic approaches, we identify the contribution of divergent natural selection and genetic drift in generating population boundaries within C. asellus over different time scales. We show that our research framework can address fundamental evolutionary questions about biodiversity in Amazonia, a highly complex and poorly studied ecosystem. Findings from this study also have implications on how we identify the engines of biodiversity in other species-rich, complex ecosystems.

Materials and methods


Field expeditions in the Amazon Basin took place between January and February in 2005 and in 2008. We explored a vast area of approximate riverine distance of 2200 km that encompasses four major river systems and the three hydrochemical settings of the Amazon Basin. These included the Amazon River (white water), Negro River (black water), Madeira River (white water) and Tapajós River (clear water). The 290 puffers were sampled at nine sites (Fig. 1; Table 1), and from most populations, we obtained up to 25 individuals in each sampling year.

Fish were caught in sandy and shallow beaches using a seine net, euthanized, and muscle tissue was preserved in 95% ethanol. Geographic coordinates of sampling sites were recorded using a global positioning system, and temperature, pH, visibility, dissolved oxygen and oxygen saturation were measured at each site (Table 1). We also included geographically distant samples from Peru and Colombia for improving our phylogeographic analysis of the direction and timing of freshwater invasion and colonization of central Amazonas. Peruvian samples were collected from the western end of the species distribution (Amazon River, near Iquitos – an area representing the palaeolake Lake Pebas). Colombian samples came from the Orinoco Basin (Cano Macera), the northern end of the species distribution.

Laboratory procedures

DNA was extracted using a salting-out method (Sunnucks & Hales, 1996) and data obtained from the mitochondrial (mtDNA) and nuclear genomes. The mtDNA adenosine triphosphatase subunits 6 and 8 (ATPase 6 and 8 genes) were amplified via polymerase chain reaction (PCR) using primers and conditions specified in Corrigan et al. (2008). PCR products were sequenced using BigDyeTM terminator conditions in an AB3730xl (Applied Biosystems, Foster City, CA, USA) sequencer.

Our nuclear data set was based on 460 polymorphic loci. These were amplified fragment length polymorphisms (AFLP) obtained using a modified protocol after Zenger et al. (2006). Here, restriction fragments were amplified in a single selective PCR using fluorescently terminally labelled EcoRI selective primer carrying three selective nucleotides (5′-GAC TGC GTA CCA ATT C + AGT or AAC-3′) and unlabelled MseI selective primers carrying four selective nucleotides (5′-GAT GAG TCC TGA GTA A + CAAC, CTGC, CAGC or CTTC-3′). Fragments were denatured and separated on an AB3130xl sequencer using Liz500 labelled 500 bp size standard.

The presence and absence of AFLP fragments at 1-bp intervals were determined using Genemapper 4.0 (Applied Biosystems, Foster City, CA, USA), within a fragment of 50–500 bp. Peaks with < 50 relative fluorescent units were not scored, and every bin fragment position identified using Genemapper was checked manually. AFLPScore 1.4 (Whitlock et al., 2008) was employed to normalize and score our data within the predefined fragment locations as well as calculate mismatch error rates for each primer combination. AFLPScore determines optimal thresholds that both minimize genotyping error and maximize the numbers of retained loci using a mismatch error rate analysis (Whitlock et al., 2008).

Phylogeography and demographic history

The mtDNA sequence data were aligned using Sequencher 4.1 (Gene Codes Corporation, Ann Arbor, MI, USA), and genealogical relationships within and among sampled populations were investigated by constructing a haplotype network in Tcs (Templeton et al., 1992; Clement et al., 2000). Demographic history was assessed by computing mismatch distributions in Arlequin3.01 (Excoffier et al., 2005) for the entire ATPase 6 and 8 data set, and for three geographic regions: (i) populations before and including the confluence of the Negro River (A1, N1 and A1), (ii) populations after the confluence of the Negro River and before the confluence of the Tapajós River (M1, A3, A4 and A5), and (iii) populations after and including the Tapajós River (T1 and A6). Mismatch analysis tests agreement of the data set with a model of demographic growth (Rogers & Harpending, 1992; Excoffier et al., 2005), by calculating the estimator of time to expansion (τ) and the mutation parameter (θ) (Schneider & Excoffier, 1999). From this, we used = τ/(2μ) to estimate the timing of possible population expansions where t is given in generations and μ is mutation rate for ATPase 6 and 8. We used an ATPase 6 and 8 mutation rate of 1.4% per million years proposed for geminate fishes across the Panama Isthmus (Bermingham et al., 1997). Additionally, Fu’s (1997) test of selective neutrality was employed to assess the signal of expansion in the data set. In the event of demographic expansion, large negative FS values are generally observed (Fu, 1997).

Detecting natural selection

We used both a frequentist method, Dfdist (Beaumont & Nichols, 1996), and a Bayesian method, BayeScan (Foll & Gaggiotti, 2008), to detect selection in the AFLP data. BayeScan was used to directly estimate the probability of each locus being subject to selection for the whole data set ( ‘Outlier loci’ were assessed based on the posterior distribution of αi where a positive value is indicative of positive selection and a negative value indicates stabilizing selection. The significance of αi for each locus is estimated in BayeScan using a Reversible Jump MCMC algorithm and loci were classified according to Jeffreys (1961) scale of evidence.

Secondly, we used Dfdist following the approach of Beaumont & Balding (2004) to detect ‘outlier loci’ in pairwise population comparisons. This methodology is based on the premise that loci under divergent selection will display significantly higher FST values than the majority of neutral loci in a sample. For each analysis, a critical frequency for the most common alleles of 0.98 was chosen and those loci with allele frequencies higher than 0.98 were excluded from the empirical distribution. Next, using a trimmed mean FST of 30%, a null distribution of FST was generated using 50 000 realizations. This was compared to the empirical distribution and outlier loci were identified at the upper 95th quantile using a smoothing parameter of 0.04. Using Dfdist, we searched for (positive) directional selection between nine adjacent population comparisons (see Fig. 3i). Pairwise comparisons were made between sites A1 and A2 (N1 samples were pooled with A2 samples because N1 = 3 fish collected near the Negro River outflow into the Amazon River), A2 and M1, M1 and A3, A2 and A3, A3 and A4, A4 and A5, A5 and A6, A5 and T1, and A6 and T1.

Loci detected as positive outliers were divided into two groups: (i) nonrepeat outliers that were detected in just one pairwise comparison and (ii) repeat outliers that were detected in multiple comparisons. The remaining loci that were not positive outliers in any comparison were classified as putatively neutral. To minimize the chances of type I errors, nonrepeat outliers were considered potential false positive irrespective of their P-values. Thus, only repeat-outlier loci and loci detected using both methods were classified as loci under selection and consequently removed from the total data set for subsequent data analyses.

Summary statistics of population diversity and genetic structure

Sequence (mtDNA) diversity was estimated as haplotypic diversity and nucleotide diversity (Nei, 1987) per population in Arlequin 3.11 (Excoffier et al., 2005). AFLP allele frequencies were estimated in AFLP-Surv (Vekemans et al., 2002) using the Bayesian approach of Zhivotovsky (1999) for dominant markers with a nonuniform prior and assuming Hardy–Weinberg equilibrium. The number and proportion of polymorphic loci, expected heterozygosity and average gene diversity were calculated, with the unbiased estimator of Lynch & Milligan (1994).

Genetic structure between adjacent populations was calculated for mtDNA (ΦST: Excoffier et al., 1992) in Arlequin, and for both the total and the neutral AFLP (FST: Lynch & Milligan, 1994) data sets in AFLP-Surv (Vekemans, 2002). Due to the dominant nature of AFLPs, the inbreeding coefficient (f) cannot be calculated directly. Therefore, an alternative Bayesian approach was employed to estimate an FST analogue (ΘB) for dominant markers that accounts for uncertainty in f (Holsinger et al., 2002). Using the program Hickory 1.1 (Holsinger & Lewis, 2005), models were fitted to the data under three hypotheses: (i) both f and ΘB are unknown and ≥ 0 (full model); (ii) = 0 but ΘB is unknown (= 0 model); (iii) f is unknown and there is no genetic structure (ΘB = 0 model). Depending on the model, f, ΘB or both are estimated. A uniform prior distribution of f was assumed for both the ‘full’ and ‘ΘB = 0’ models. Default Hickory values for sampling and chain length parameters were used (burn-in = 50 000, sample = 250 000, thin = 50), and model selection was based on the deviance information criterion (DIC; see Holsinger & Wallace, 2004).

The probability of connectivity between populations is likely to be influenced by the unidirectional river flow and geographic distance (McGlashen et al., 2001). Therefore, we tested for the association between genetic (ΦST) and geographic distance (Isolation by distance, IBD; Wright, 1943) in the mtDNA and AFLP data sets for (i) all sampled populations and (ii) sampled populations along the Amazon River (A1–A6). The significance of the association was tested using a Mantel permutation test (Mantel, 1967) in Arlequin 3.1 (Excoffier et al., 2005) for mtDNA and Genalx 6.1 (Peakall & Smouse, 2006) for AFLPs.

The degree of population structure was further explored with a hierarchical analysis of molecular variance (amova) using Arlequin for mtDNA (Excoffier et al., 1992) and Genalx for neutral AFLP (Peakall & Smouse, 2006). Two amovas were ran for each data set: (i) populations before and including the confluence of the Negro River (A1, N1 and A1), (ii) populations after the confluence of the Negro River and before the confluence of the Tapajós River (M1, A3, A4 and A5), and (iii) Populations after and including the Tapajós River (T1 and A6). The second amova was conducted to control for the confluence of a river system into the Amazon River by looking for structure associated with the white water Madeira River. Using only the populations situated along the Amazon River between the Negro and Tapajós, we created two white water groups: (i) the Madeira and (ii) Amazonas sites A2–A5.

Genetic subdivision was also investigated using the nonparametric, model-based clustering method implemented in Structure v.2.3.1, which accommodates AFLP data (Falush et al., 2003, 2007). Using the admixture model, we determined the number of populations, K, by comparing log-likelihood ratios across three independent runs by varying the assumed number of K for the total AFLP and neutral AFLP data sets (Evanno et al., 2005). Each run consisted of a burn-in phase of 100 000 iterations, followed by 1 000 000 iterations.


We generated 797 bp of mtDNA ATPase 6 and 8 genes for 238 samples including Colombia and Peruvian ‘outgroups’. Of these, 88 unique haplotypes were identified (GenBank accession numbers JQ818261-JQ818350). These were composed of 90 variable characters, of which 42 were parsimony informative. AFLP profiles were resolved for 290 individuals, with a total of 460 polymorphic loci (Dryad repository doi:10.5061/dryad.m6k89833). Mismatch error rate calculated by AFLPScore was on average 5% (± 2.9%) per primer combination. This is in the lower threshold of the acceptable range of 5–10% AFLP error rates (Bonin et al., 2007).

Phylogeography and demographic history

A strong phylogeographic signal of haplotypes clustered in an easterly (upstream to downstream) pattern was generally observed in the mtDNA haplotype network (Fig. 2). Indeed, within the network there appeared to be three predominant groups. The first consists largely of westernmost sampled haplotypes that predominantly originate west of the confluence of the Negro and Amazon River. This group also includes the westernmost sample from Iquitos (Peru) and the north-western sample from the Orinoco Basin (Colombia). The second grouping is arranged in a star shape with many recently derived haplotypes. It is composed mostly of fish from the central sites in the Amazon and Madeira Rivers. Moving downstream, the final grouping is dominated by fish from the Tapajós River and the two eastern most Amazon River sites.

Figure 2.

 Statistical parsimony network for mtDNA ATPase 6 and 8 haplotypes. Relationships among haplotypes are estimated using the parsimony method of Templeton et al. (1992). Each circle denotes a unique haplotype and the area of the circle is proportional to its frequency. The colour/s of the circle represents the sampling locality as in Fig. 1 and the lines along the bottom of the network refer to ‘haplogroups’ 1–3.

Results from mismatch analysis and Fu’s test of neutrality recovered important insights into the demographic history of C. asellus (Table 2) consistent with the west-to-east trajectory of colonization inferred from the haplotype network (Fig. 2). Mismatch analysis of our westernmost group did not fit to the model of demographic expansion, a result typical of older populations (Rogers & Harpending, 1992). Further, Fu’s test of neutrality gave a positive FS value. In marked contrast, mismatch analysis of our central and eastern haplogroups did not deviate from a model expected under demographic expansion, a result corroborated by large negative FS values. While date estimates calculated using this method must be interpreted with caution, the estimated τ values suggest the demographic expansion occurred first in the central Amazonian sites (approximately 2.5 Ma) and later in the most easterly ones (approximately 1.5 Ma). For the entire data set, there was a nonsignificant deviation from a model expected under demographic expansion, and a large negative FS value was found. Estimated τ values suggest that this demographic expansion occurred approximately 3 Ma.

Table 2.   Demographic analysis for the total data set and for all the data and for (i) populations before and including the confluence of the Negro River (A1, N1 and A2), (ii) Populations after the confluence of the Negro River, and before the confluence of the Tapajós River (M1, A3, A4, A5) and (iii), populations after and including the Tapajós River (T1 and A6). Results are summarized by the Sum of Squared Deviations (SSD), Raggedness Index (R), and Fu’s test of neutrality (FS). Estimates of time since expansion are shown for analyses exhibiting evidence of demographic expansion.
 SSDPRPFSPTime since expansionRange (α = 0.05)
All Data0.0070.540.0120.681−24.8980.000∼3 Ma1 to 4.3 My
(ii)0.0170.3490.0280.476−25.6170.000∼2.5 Ma500 ka to 4 My
(iii)0.0050.7080.0180.795−10.2430.001∼1.5 Ma500 ka to 2.8 My

Detecting natural selection and population structure

A total of 54 outlier AFLP loci (11.7% of the total) were detected across eight of the nine pairwise comparisons at the 99% quantile (Fig. 3i; Table 5). Of these, 16 (3.5%) were repeat-outlier loci. Of these repeat-outlier loci, the majority were located in the more easterly populations from A4 to A6 and T1 (Fig. 3), near the confluence of the clear waters of the Tapajós and white waters of the Negro River. Fifteen of these repeat loci were detected in population comparisons including A6. In contrast, only four repeat loci were found west of A3, and none were found in A1.

Figure 3.

 Results from Dfdist analysis: (i) the empirical distributions of FST values for amplified fragment length polymorphisms (AFLP) loci for each population pairwise comparison overlain on a schematic representation of the sampling localities. The x-axes are heterozygosity and the y-axes are FST. The solid line represents the 99th quantile, and dots exceeding the 99th quantile are ‘outlier loci’, (ii) a schematic representation of the distribution of repeat-outlier loci between sample sites. Each bar represents the area shared by n loci.

The hierarchical Bayesian approach of BayeScan performed on all data identified 166 loci with positive αi values. Applying Jeffreys (1961) scale of evidence, six of these loci that were supported with log10 (Bayes Factor) > 2 can be considered as ‘decisive’ evidence for selection, two that were supported log10 (Bayes Factor) > 1.5 can be considered ‘very strong’ evidence for selection, three that were supported log10 (Bayes Factor) > 1 can be considered ‘strong’ evidence for selection and five that were supported with log10 (Bayes Factor) > 0.5 can be considered ‘substantial’ evidence for selection. Of the 54 outliers detected using Dfdist, 20 of these were also detected using BayeScan. After removing repeat outliers, our data set consisted of 444 putatively neutral polymorphic loci.

For the mtDNA sequence data, nucleotide diversity ranged from 0.003 to 0.007 and haplotypic diversity from 0.451 to 0.967 (excluding N1 for which there were only 3 samples). From our putatively neutral AFLP loci, expected heterozygosity ranged between 0.192 and 0.269 (also excluding N1; Table 3).

Table 3.   Genetic diversity at mtDNA and putatively neutral amplified fragment length polymorphisms (AFLP) markers. Mitochondrial diversity is summarized by the number of haplotypes (N), haplotypic diversity (h) and nucleotide diversity (π). AFLP diversity is summarized by the number of polymorphic loci per population (NPL) and expected heterozygosity (He).
SitemtDNA diversityAFLP diversity
N133    1 (0.272)0.008 (0.007)3510.136
A12350.451 (0.121)0.003 (0.002)312670.269
A227110.692 (0.097)0.005 (0.003)282230.230
M128110.855 (0.046)0.003 (0.002)272520.240
A333180.934 (0.027)0.005 (0.003)432900.257
A429120.921 (0.026)0.005 (0.003)402920.261
A535230.961 (0.018)0.006 (0.003)453130.264
A635150.918 (0.025)0.007 (0.004)522440.218
T121170.967 (0.030)0.007 (0.004)211860.192
Total23888 Total290444 

High global AFLP population structure was observed across puffer populations (ΘB = 0.439055, Table 4). The full model, which assumes both population structure and f ≥ 0, fitted our data best having the lowest DIC value (DIC = 12202.4). A strong signal of pairwise population genetic structure was observed for both FST and ΦST comparisons, indicating relatively low levels of dispersal between populations. Pairwise comparisons that fall at the confluence of either the Negro or Tapajós Rivers into the Amazon River had higher FSTST values (e.g. A2 vs. M1; ΦST = 0.406, FST = 0.041; A5 vs. T1, ΦST = 0.270, FST = 0.068). In contrast, pairwise comparisons between white water sites, not interrupted by the confluence of the Negro or Tapajós Rivers, generally had lower values (e.g. A3 vs. A4 ΦST = 0.016, FST = 0.007; Table 5). In general, neutral pairwise FST values were lower than total FST values, most notably in easterly population comparisons.

Table 4.   Amplified fragment length polymorphisms data fitted to models of population structure using the Bayesian method in Hickory. Parameters estimated are the inbreeding coefficient, f, and the FST analogue ΘB. Dbar is a measure of how well the model fits the data, pD is the number of parameters being estimated, and deviance information criterion (DIC) is the model selection criteria.
Mean (SD)95% CIMean (SD)95% CI
  1. *The optimal model selected according to this criterion.

Full*0.135 (0.023)0.092–0.1830.439 (0.013)0.414–0.46410185.32017.0612202.4
= 00.413 (0.012)0.389–0.43810152.62127.7212280.3
ΘB = 00.057 (0.019)0.022–0.0970.370 (0.014)0.343–0.39819160.3424.45919584.8
Table 5.   Number of outlier and repeat-outlier amplified fragment length polymorphisms (AFLP) loci found in each pairwise comparison, the total and neutral AFLP FST values, and the mtDNA ΦST value.
ComparisonNo. 99% OutliersNo. repeat outliersTotal FSTNeutral FSTmtDNA ΦST
  1. *P  0.05.

A1 vs. A2200.056*0.056*0.047*
A2 vs. A3640.054*0.049*0.232*
A2 vs. M1000.049*0.041*0.406*
M1 vs. A3930.061*0.047*0.082*
A3 vs. A4110.0080.0070.016
A4 vs. A51890.074*0.059*0.010
A5 vs. A62480.064*0.055*0.211*
A5 vs. T1540.080*0.069*0.270*
T1 vs. A6970.069*0.059*0.064*

Mantel tests revealed a positive correlation between all sampled populations and geographic distance for both mtDNA (= 0.357, = 0.03) and neutral AFLP data (= 0.295, = 0.007; Fig. 4). Further, Mantel tests of populations along the Amazon River only (excluding tributary samples) had a positive correlation for AFLP data (= 0.505, = 0.001). The latter was marginally nonsignificant for mtDNA data (= 0.361, = 0.07), a result potentially influenced by the small number of sample comparisons.

Figure 4.

 Results from the Mantel test for both amplified fragment length polymorphisms (AFLP) data and mtDNA ATPase 6 and 8, based on pairwise population genetic differentiation (ΦST) and geographic distance (km).

The amovas for both mtDNA and AFLP identified consistent patterns of regional population structure associated with water colour (Table 6). amova 1, which tested for genetic structure associated with ecotones of the Negro and Tapajós Rivers, was significant between regions in both cases (mtDNA: ΦCT = 0.3350,  0.000, AFLP: ΦRT = 0.03, < 0.001). In contrast, amova 2, which tested for genetic structure associated with the confluence of the Madeira River into the Amazon River (i.e. white vs. white water), was not significant for either DNA data set (mtDNA: ΦCT = −0.014, = 0.398, AFLP: ΦRT = 0.4, = 0.235).

Table 6.   Analysis of molecular variance (amova) for mtDNA and amplified fragment length polymorphisms (AFLP) data. In amova 1, regions include: (i) populations before and including the confluence of the Negro River (A1, RN and A2), (ii). Populations after the confluence of the Negro River, and before the confluence of the Tapajós River (M1, A3, A4, A5) and (iii), populations after and including the Tapajós River (T1 and A6). In amova 2 regions include: (i) the Madeira River, and (ii) white water sites A2–A5. FI stands for Fixation Index.
Data typeamova 1amova 2
Source of variation% VariationFIPSource of variation% VariationFIP
mtDNAAmong regions33.50ΦCT: 0.33500.000Among regions0.00ΦCT: −0.0140.398
Among populations3.29ΦSC: 0.04940.002Among populations15.36ΦSC: 0.15140.000
Among individuals63.21ΦST: 0.36790.000Among individuals86.09ΦST: 0.13910.000
AFLPAmong regions3.00ΦRT: 0.0300.000Among regions0.00ΦRT: 0.40.235
Among populations7.00ΦPR: 0.0670.000Among populations8.00ΦPR: 0.0810.000
Among individuals91.00ΦPT: 0.0950.000Among individuals92.00ΦPT: 0.0810.000

Structure analysis for the AFLP data sets (n = 460, includes neutral and outlier loci) identified a prominent phylogeographic break between A2 and sites downstream, coinciding with the confluence of the Negro River into the Amazon River (Fig. 5i, ii). Using the ad hoc statistic ΔK, this Structure analysis estimated the most likely number of populations was four six [ln Pr (X |  K =  6) = −27827.7], yet this did not resolve any further clusters from K = 4 [ln Pr (X | K = 3) = −29243.2]. Thus, we chose K = 4 as the most likely number of populations for our total AFLP data set (Fig. 5i), which coincides with our inferred phylogeographic breaks. Additionally, the break between A5 and T1, which coincided with the confluence of the Tapajós River, was also supported. An additional phylogeographic break was observed between white water sites A4 and A5. Such a break was not unexpected given the marked topographic feature that exists between these two sites, normally demarcated by the Obidos County. Here, the Amazon River channel is at its narrowest point in Brazilian territory (approximately 2.4 km), the current has high velocity water flow and the river channel is very deep (approximately 19 m; Mertes et al., 1996). Because of this, the area between A4 and A5 is dominated by rocky beds instead of the preferred puffer habitat (i.e. sandy and shallow beaches), which is more typically found along margins of the Amazon River.

Figure 5.

 Structure results of total amplified fragment length polymorphisms (AFLP) data set (n = 460) and the putatively neutral AFLP data set (n = 444). Individuals are groups by sampling location, and each individual is represented by one vertical column. = the number of inferred genetic populations for each data set.

The Structure analysis based on putatively neutral AFLP loci (= 444) and ΔK estimated that the most likely number of populations was six [ln Pr (X |  K =  6) = −27049.8], yet this did not resolve any further clusters from K = 3 [ln Pr (X | K = 3) = −28309.7]. Thus, we chose K = 3 as the most likely number of populations for our putatively neutral loci. Again, the strong break downstream to the meeting of the Negro and Amazon (A2) was present. Removal of outlier loci appeared to have no effect on the presence of this barrier within our data set. Similar to our total AFLP data set (Fig. 5i), there was a strong phylogeographic break between A4 and A5. This barrier was also unaffected by the removal of outlier loci. Most interestingly however, the break between A5 and T1 was no longer present in our analysis when outlier loci were excluded from our data set. This suggests that any population structure generated in this region was a product of divergent selection acting on those loci. Finally, for both the total and the neutral AFLP data sets, our white water control sample (Madeira River) grouped with other typical white water samples from the Amazon River.


This study provides evidence for the effect of divergent natural selection associated with water colour entwined in the biogeographic history of a freshwater invader, the Amazonian Puffer C. asellus. Using genome scans and phylogeographic approaches, we identified two major genetic breaks in C. asellus populations within the Amazon Basin. The first break occurs at the confluence of the black waters of the Negro River and the white waters of the Amazon River, whereas the second break occurs at the confluence of the Amazon River and the clear waters of the Tapajós River. Consistent with the prediction of a west-to-east trajectory of colonization associated with the last stages of formation of the Amazon River, we detected a strong signal of IBD with demographic expansion downstream. Evidence for divergent natural selection was detected most strongly around the confluence of the Tapajós into the Amazon River, yet there was reduced evidence for selection around the confluence of the Negro River. No evidence for selection was obtained when comparing populations from the same selective environment. Below, we discuss on the relative contribution of vicariant biogeographic history and then of natural selection in the origin and maintenance of population divergence.

A freshwater invasion and colonization

Although the precise timing of the Neogene marine incursions into South America is controversial, it has become generally accepted that a connection existed between the Upper Amazon and the Caribbean (for reviews see Lundberg et al., 1998; Hoorn et al., 2010). A growing body of multidisciplinary studies has provided strong evidence for such connection. This includes studies on fossilized marine fishes (Monsch, 1998), isotope data that confirm the presence of saline waters in the upper Amazon (Hoorn, 1993, 1994; Monsch, 1998; Vonhof et al., 2003), and biogeographic and phylogenetic evidence for multiple freshwater invasions into Amazonia from marine ancestral lineages (Lovejoy et al., 1998, 2006, 2010; Cooke et al., 2012b). Of particular importance to the transition of marine to freshwater taxa has been the documentation of Lake Pebas, a semi-permanent (24–11 Ma) wetland complex in the upper Amazon. Lake Pebas was connected to the Caribbean and is thought to have behaved as a complex series of interconnected lakes of inconstant salinity that enabled the progression of taxa from marine to freshwater (Wesselingh et al., 2002; Hoorn et al., 2010).

The colonization of C. asellus into the freshwaters of South America via Lake Pebas during the Miocene is consistent not only with fossil data (Monsch, 1998), but also with our phylogeographic reconstruction. Our results strongly indicate a west-to-east trajectory of colonization along the modern Amazon River, a pattern supported by the genealogical distributions of matrilines, statistics of demographic expansion and Mantel tests (Table 2, Figs 2 and 4). For example, mtDNA haplotypes generally cluster in a downstream pattern (west to east). In addition, there is evidence for sequential demographic expansions, one after the confluence of the Negro River approximately 2.5 Ma, and later, after the confluence of the Tapajós River approximately 1.5 Ma. Further, the more westerly (upstream) samples showed the genealogical architecture expected of older populations (Rogers & Harpending, 1992).

Although it is generally accepted that the modern drainage system of the Amazon River and its largest tributaries were established in the late Miocene (Hoorn et al., 1995; Lundberg et al., 1998), palaeogeographic work by Campbell et al. (2006) has dated the full establishment of the modern Amazon River drainage system to the late Pliocene approximately 2.5 Ma (Campbell et al., 2006). Accordingly, the modern Amazon River may have formed as a compound response to the breaching of the eastern rim of the sedimentary basin known as the Içá Formation and erosion of the proto-Amazon River (Rossetti et al., 2005; Campbell et al., 2006). This is proposed to have occurred at the onset of the Plio-Pleistocene glacial climate cycle during the lowest sea level stand of the middle Miocene with frequent geomorphological changes that continued well into the Holocene (Latrubesse & Franzinelli, 2005; Rossetti et al., 2005; Campbell et al., 2006). Interestingly, our data show a strong phylogeographic break that parallels this region of geomorphological activity (between sites A2 and M1; Fig. 6iii). This is most evident in Structure analyses from the AFLP data (inclusive and exclusive of outlier loci; Fig. 5). Further, demographic expansion within the haplogroup consisting predominantly of samples east of the Negro River, and west of the Tapajós River, has been dated to approximately 2.5 Ma.

Figure 6.

 Phylogeographic scenario depicting the invasion and colonization route of Colomesus asellus into South America: (i) Entry point into South America via the northward-flowing connection between Lake Pebas, the upper Amazon and the Caribbean, (ii) the evolution of freshwater tolerance encouraged by the dynamic saline/freshwater environment of Lake Pebas, (iii) termination of the marine incursions and the erosion of Içá Formation and colonization of western freshwater habitat (iv) the development of the modern transcontinental west-to-east flow and subsequent colonization of the Amazon River (v) contemporary population subdivision of C. asellus and the modern drainage systems of South America including adaptive divergence at the junction of the Amazon and Tapajós Rivers.

On the basis of our results and of the large body of literature on palaeogeography mentioned above, we propose a convincing phylogeographic scenario for the freshwater invasion and colonization of C. asellus into the Amazon Basin (Fig. 6). Initiated by Miocene marine incursions, the puffer ancestor entered South America via the Caribbean and colonized the dynamic saline environment of the upper Amazon wetland system. The varying salinity levels and intermittent connectivity of Lake Pebas to the Caribbean would have encouraged the evolution of freshwater tolerance isolating an incipient freshwater lineage from sister marine lineages (Wesselingh et al., 2002). Continuing until the late Miocene, C. asellus underwent range expansions, colonizing the available shallow freshwater habitats throughout western Amazonia and the early Andean foreland basin (west of the Purus Arch). These events likely extended into the Palaeo-Orinoco system in central Colombia via the northward-flowing Palaeo-Amazon (Hoorn et al., 1995). Evidence for this early colonization can be seen in the relationship of the Orinoco samples with our older westerly haplotypes (Fig. 2). Thus, the colonization and expansions of puffers in the Palaeo-Amazon and Orinoco systems would have continued until the late Miocene after which the proto-Amazon River breached the Purus Arch. At that time, approximately 2.5 Ma, the modern west-to-east transcontinental flow of the Amazon River established and puffers subsequently colonized downstream.

Adaptation through natural selection

Following the establishment of the west-to-east-flowing Amazon River, the subsequent west-to-east IBD (Fig. 4) and genetic structure along the Amazon River and its tributaries could be explained by certain migratory constraints of C. asellus including its overall limited mobility and low larval dispersal (Araujo-Lima, 1994). Further, the complex anabranching Amazonian aquatic environment could have created numerous micro- and macro-allopatric events allowing genetic drift to accumulate between populations (e.g. between sites A4 and A5). However, our data, both the mitochondrial and nuclear DNA, consistently show significant phylogeographic breaks at the confluence of the black waters of the Negro River into the white waters of the Amazon River and after the confluence of the clear waters of the Tapajós into the Amazon River (Table 6, Fig. 5). Yet, we see no significant structure associated with the confluence of the white waters of the Madeira River into the white waters of the Amazon River. This finding is important because it rules out the possibility of population structure generated by the confluence of a tributary alone (e.g. Fernandes et al., 2004). Indeed, hydrochemical variables such as oxygen availability have been identified as key factors influencing the distribution of C. asellus larvae (Junk et al., 1983; Araujo-Lima, 1994), and sensory adaptations such as optical sensitivity (Appleby & Muntz, 1979) and prey detection have shown to play a major role in the determination of Amazonian fish community structure (Tejerina-Garro et al., 1998). Thus, the striking hydrochemical discontinuities of the Amazonian aquatic environment likely drive adaptive divergence across these selective environments in C. asellus.

Here, using genome scans and phylogeographic analyses, we have identified outlier loci geographically clustered around the junction of ‘white’ and ‘clear’ waters (Fig. 3) and higher levels of genetic differentiation between selective environments than within (Table 6). Most interestingly however, we have also identified the contribution of these outlier loci to population structure (Fig. 5i, ii).

The Dfdist outlier analysis revealed that approximately 3.5% of AFLP loci repeatedly deviated from neutral expectations, signifying them as likely candidates for markers affected by divergent natural selection. Because these loci were repeatedly identified as positive outliers, they are less likely to be false positives (Cooper, 2000; Campbell & Bernatchez, 2004). Due to the anonymous nature of AFLP loci detection, the outlier loci represent regions of the genome under selection but probably not the precise mutation under selection themselves (Jensen et al., 2007). Nonetheless, their presence enabled us to test the role of selection in the maintenance of population boundaries across hydrochemical discontinuities in the Amazon Basin. We found that the majority of positive outliers were identified in population pairwise comparisons involving the easterly populations around the junction of the white waters of the Amazon River and the clear waters of the Tapajós River (Fig. 3), whereas few were found west of A4. Most striking was that after the removal of all repeat-outlier loci, population structure associated with the confluence of the Tapajós River was lost from our AFLP data (e.g. when comparing Structure results between total and putatively neutral data sets; Fig. 5i, ii). In contrast however, population structure associated with the confluence of the Negro River was unaffected after the removal of outlier loci (Table 4; Fig. 5i, ii). Thus, our data suggest strong ecologically driven divergent selection on multiple traits perhaps associated with resource or habitat use coinciding with hydrochemical differences of white and clear water.

There is a chance, however, that the easterly pattern of outlier loci observed may have been a product of ‘allele surfing’ (Edmonds et al., 2004; Klopfstein et al., 2006; Excoffier & Ray, 2008). Under this scenario, the downstream range expansion would have led to the spread of rare alleles that, over time, reached high frequencies in the populations located on the edge of the expansion (A5, T1, A6). Although allele surfing may have contributed in part to inflated estimates of positive selection between clear and white water habitats, it remains parsimonious that adaptive divergence is occurring across the ecotone. Indeed, a similar signature of divergent natural selection has been found in unrelated yet co-distributed species Triportheus albus (Cooke et al., 2012a) and Plagioscion squamosissimus (Cooke et al., 2012b). Furthermore, allele surfing is more likely to occur in small and fast growing populations (Klopfstein et al., 2006), and based on diversity estimates and personal observation, this is unlikely the case in the Amazonian pufferfish.

When previously interbreeding populations are isolated from the homogenizing effects of gene flow, neutral DNA will diverge via genetic drift. In the same way, neutral gene flow will decrease and genetic drift will increase as adaptive divergence progresses through time (Nosil et al., 2008). Therefore, we would also expect a neutral signature of structure associated with the junction of the Amazon and Tapajós Rivers if divergent natural selection has persisted long enough for drift to accumulate. Given the loss of nuclear population structure at this aquatic interface after removal of outlier loci, we can conclude that divergent natural selection in this region is a powerful, yet relatively recent phenomenon. Thus, niche preference, niche adaptation and assortative mating are in their infancy across this ecotone. Because of this, pinpointing ecologically driven divergent selection as the mechanism for population subdivision here appears convincing. Under this scenario, genetic changes resulting from selection between spatially and ecologically explicit young lineages have not yet become confounded further by genetic differences that accumulate after isolation (Schluter, 2000; Via, 2009).

In striking contrast, the phylogeographic barrier between the Amazon River and the Tapajós River is well represented in our mtDNA data (Fig. 2). Due to the lower effective population size of the mtDNA molecule and its independence from the nuclear genome, the incipient population subdivision observed in the mtDNA data is in essence ‘ecologically allopatric’. Assuming it is unassociated with regions of adaptive divergence or divergence hitchhiking, it has responded independently between the parapatric lineages via either genetic drift or selection within each ecotone (Via, 2009). In this way, ‘ecological allopatry’ can leave a signature of IBD and restricted gene flow in which divergence increases with adaptation, in the same way that divergence increases with geographic distance (Nosil et al., 2008).

There are however two plausible but not necessarily exclusive explanations for the maintenance of population structure after the confluence of the black waters of Negro River into the white waters of the Amazon River, despite the removal of loci under selection. Firstly, the westerly populations appear older, and barrier-induced divergence of neutral loci via genetic drift will accumulate through time once natural selection has initiated (ecological allopatry; Nosil et al., 2008; Via, 2009). Therefore, even after the removal of outlier loci, we would expect to see neutral population genetic structure here if natural selection was indeed occurring as a product of hydrochemistry. On the contrary however, few outlier loci were actually identified around the junction of the black and white waters, and C. asellus are not typically found in the Negro River (Araujo-Lima, 1994). Therefore, it is more likely that population structure here was induced allopatrically resulting from geomorphological changes in the region since 2.5 Ma. Subsequently, moderate isolation has been maintained via the dynamic geomorphological activity that has persisted at the confluence of the Negro River and Amazon River into the Holocene (Latrubesse & Franzinelli, 2005; Rossetti et al., 2005).

Our phylogeographic scenario now has an additional dimension that includes divergent natural selection resulting from hydrochemistry (Fig. 6). Following the final erosion of the western portion of the Amazon River, C. assellus colonized eastward downstream. The consequent migration would have been constrained by limited mobility and larval dispersal (Araujo-Lima, 1994). Most recently however, divergent natural selection has established at the interface of the white waters of the Amazon River and the clear waters of the Tapajós River.


By distinguishing the contribution of vicariant biogeographic history from the power of divergent natural selection in C. asellus, we have expanded upon a classical phylogeographic approach. In doing so, we have demonstrated that diversity and spatially distinct population structure can be generated and maintained via multiple forces within a single species across a complex environment. Importantly however, we have shown that time may be the critical factor disabling the identification of natural selection in the wild. Because ecological divergence between white and clear water environments was relatively recent in C. asellus populations, the confounding effects of ecological allopatry (genetic drift) have not accumulated within the nuclear genome. This provided us with a unique opportunity to assess the contribution of ecologically mediated divergence versus allopatry.

Because drift will accumulate with time between ecotypes, disguising natural selection as allopatric divergence within the genome (Via, 2009), it is likely that the contribution of natural selection and ecological speciation to biodiversity has been largely underestimated within the literature. This has very important implications for the identification of engines of biodiversity and consequently the conservation of evolutionary processes (Schneider et al., 1999), especially within species-rich ecosystems.

Although there is no certainty that ecologically diverged lineages will eventuate as reproductively isolated species (Futuyma, 1987; Coyne & Orr, 2004), future work on divergent selection should focus on population-level interactions of genetics and ecology, rather than interspecific studies in which the confounding genetic differences have accumulated (Schluter, 2001; Via, 2009). We are currently entering an exciting era of evolutionary biology in which complimentary research approaches combining gene mapping, population genomics and the identification of functionally important genetic variation promise to define the interface between environment and selection. In this way, studies endeavouring to understand the precise mechanisms and interactions that couple genotype and environment are nurturing a renewed support for Darwin’s theory of evolution by natural selection.


The authors thank Nathan R. Lovejoy for providing samples from the Orinoco Basin, Colombia and Iquitos, Peru and C. Moritz and A. Hendry for their helpful comments on the manuscript. This study was funded by the Discovery Program of the Australian Research Council (ARC grant DP0556496 to L. Beheregaray), and by Macquarie University through a postgraduate travel grant and research award to G. Cooke. Logistics and local arrangements were supported in part through the Brazilian National Council of Research and Technology CNPq-SEAP No. 408782/2006-4 Collection permit is under IBAMA No. 1920550, and ethical approval was received from Macquarie University, Approval number: 2007/033 under G. Cooke.

Data deposited at Dryad: doi: 10.5061/dryad.m6k89833