Can AFLP genome scans detect small islands of differentiation? The case of shell sculpture variation in the periwinkle Echinolittorina hawaiiensis


Correspondence and present address: Kimberly Tice, Kalaupapa National Historical Park, PO Box 2222, Kalaupapa, HI 96742, USA. Tel.: 808 567 6802 x1510; fax: 808 567 6682; e-mail:


Genome scans have identified candidate regions of the genome undergoing selection in a wide variety of organisms, yet have rarely been applied to broadly dispersing marine organisms experiencing divergent selection pressures, where high recombination rates can reduce the extent of linkage disequilibrium (LD) and the ability to detect genomic regions under selection. The broadly dispersing periwinkle Echinolittorina hawaiiensis exhibits a heritable shell sculpture polymorphism that is correlated with environmental variation. To elucidate the genetic basis of phenotypic variation, a genome scan using over 1000 AFLP loci was conducted on smooth and sculptured snails from divergent habitats at four replicate sites. Approximately 5% of loci were identified as outliers with Dfdist, whereas no outliers were identified by BayeScan. Closer examination of the Dfdist outliers supported the conclusion that these loci were false positives. These results highlight the importance of controlling for Type I error using multiple outlier detection approaches, multitest corrections and replicate population comparisons. Assuming shell phenotypes have a genetic basis, our failure to detect outliers suggests that the life history of the target species needs to be considered when designing a genome scan.


Emerging molecular techniques make it possible to detect the signal of natural selection within the genomes of nonmodel species (Luikart et al., 2003; Beaumont, 2005; Nielsen et al., 2009). Genome scans are based on the idea, proposed by Lewontin & Krakauer (1973), that loci experiencing divergent selection (outlier loci) will show higher levels of genetic differentiation (FST) than neutral loci. When analysis of such interlocus divergence is coupled with the ability to score thousands of sequence polymorphisms by amplified fragment length polymorphism (AFLP), genome scans provide a practical solution to identifying candidate genes under selection (Beaumont, 2005). Further, when the appropriate population comparisons are made, genome scans can identify links between environmental gradients and genes under selection (Storz, 2005). However, when using anonymous markers such as AFLPs, it is highly unlikely that markers identified as outliers are themselves the target of divergent selection (Stinchcombe & Hoekstra, 2008). Rather, the genome scan approach relies on the genetic hitchhiking effect, in which selection also affects gene frequencies at loci linked to those directly experiencing selection (Maynard Smith & Haigh, 1974; Via, 2009). Thus, the efficacy of the genome scan approach depends on the strength of this hitchhiking effect, which is largely determined by the strength of selection and the recombination rate within genomic regions under selection (Storz, 2005; Nosil et al., 2009).

Genome scans have identified candidate regions of the genome undergoing selection in a wide variety of organisms, ranging from walking stick insects (Nosil et al., 2008) to lake whitefish (Campbell & Bernatchez, 2004). In poorly dispersing marine species, such as the direct developing periwinkle Littorina saxatilis, genome scans have consistently identified outlier loci between ecotypes that inhabit different intertidal shore levels and habitats (Wilding et al., 2001; Galindo et al., 2009). Yet genome scans have rarely been applied to broadly dispersing marine organisms experiencing divergent selection pressures (but see Murray & Hare, 2006), where high recombination rates can reduce the extent of LD and the ability to detect regions of the genome under selection.

Echinolittorina hawaiiensis (formerly Littorina picta and Nodilittorina hawaiiensis, Rosewater & Kadolsky, 1981) is a littorinid snail endemic to the Hawaiian Islands that has a pelagic larval duration of 3–4 weeks (Struhsaker & Costlow, 1968). Like many intertidal snail species that experience steep environmental gradients (Boulding, 1990), E. hawaiiensis demonstrates striking intraspecific variation in shell form (Fig. 1). Shell variation in this species is continuous, but smooth and sculptured forms predominate and are most commonly found in different habitat types (Struhsaker, 1968; Reid, 2007; K. Tice, personal observation). Sculptured snails tend to live on high angle benches with steep frontal slopes, in dry areas exposed to sea spray but no direct wave force, whereas smooth snails are found on moist, low angle benches with shallow frontal slopes, where they experience direct water flow. In a common garden experiment that involved culturing larvae from known parents, Struhsaker (1968) found a positive relationship between shell sculpturing in parents and their larvae, as well as differences in larval growth rates and larval survivorship between smooth and sculptured snails. Further, this study also demonstrated that smooth and sculptured adult phenotypes had differential survivorship between habitats, supporting the hypothesis that natural selection maintains genetically based polymorphism in shell morphology.

Figure 1.

 Sites on the main Hawaiian Islands of Kaua’i (Kealia, Port Allen) and O’ahu (Kewalo, Pua’ena) from which sculptured (left) and smooth (right) morphotypes of Echinolittorina hawaiiensis were sampled.

Here, we used a genome scan design that minimizes the effects of population history and false positives to detect the signature of selection in the genome of E. hawaiiensis. To identify true outliers and reduce false positives caused by demographic history or population substructure (Excoffier et al., 2009), we explicitly tested for substructure within the data and used two outlier detection approaches that make different demographic assumptions, Dfdist (Beaumont & Nichols, 1996) and BayeScan (Foll & Gaggiotti, 2008). Further, smooth and sculptured snails were compared in four replicate populations with similar ranges of environmental variation. We also implemented multitest corrections to determine how Type I error may have influenced the identification of outlier loci. Finally, we examined loci identified as outliers to search for patterns indicative of selection, including parallel trends in divergence and structuring of populations by morphotype.

Materials and methods

Study sites and sampling

When analysing hundreds or thousands of loci in a genome scan, the potential for falsely identifying neutral loci as outliers due to chance is substantial (Storz, 2005; Nielsen et al., 2009; Butlin, 2010). We therefore used an experimental design that minimizes Type I error by sampling multiple populations that occur independently across the environmental gradient of interest (Luikart et al., 2003; Nosil et al., 2009). Specifically, we collected snails from four independent environmental gradients in the Main Hawaiian Islands where the habitats of the smooth and sculptured morphotypes are found in close proximity: Kealia, Port Allen, Kewalo and Pua’ena (Fig. 1). Within each of these sites, 60 snails of each morphotype were collected from their respective habitats. Kealia (22°05′N, 159°18′W) and Port Allen (21°53′N, 159°35′W) are man-made jetties composed of large, basalt boulders. Sculptured morphotypes were located in the splash zone on the seaward sides of the jetties, whereas smooth snails were found closer to the water on the protected sides. The Kewalo site (21°17′N, 157°51′W) is a seawall also composed of large, basalt boulders. This seawall wraps around a peninsula, resulting in divergent habitat types similar to those found at Kealia and Port Allen. Sculptured morphotypes were found in the splash zone on the section of the seawall exposed to the open sea, whereas smooth morphotypes were located on a section of the seawall in a harbour channel. In contrast to these three sites, Pua’ena (21°36′N, 158°06′W), described and studied by Struhsaker (1968), is a rugose, reef limestone bench, and here the distributions of the smooth and sculptured snails overlapped. Both morphotypes were collected from the same section of the limestone bench.

DNA isolation and AFLP genotyping

Snails were flash-frozen with liquid nitrogen, and tissues were stored at −80 °C until DNA isolation. Genomic DNA was isolated from foot tissue using E.Z.N.A. Mollusc DNA Isolation kits (Omega Bio-tek, Inc., Norcross, GA, USA) following the manufacturer’s protocols. Genomic DNA was quantified using spectrophotometry (NanoDrop; Thermo Scientific, Inc., Waltham, MA, USA) and diluted to 100 ng μL−1. DNA from a total of 233 snails (28–31 snails/morphotype per site) was successfully amplified for AFLP analysis using a modified version of the protocol of Vos et al. (1995). For each individual, 500 μL of DNA was restricted at 37 °C for 2 h in a 25-μL reaction consisting of 5 U EcoRI, 3 U MseI, 1× NEBuffer 2 and 1× BSA (New England Biolabs, Ipswich, MA, USA). The restriction enzymes were then inactivated by incubating at 70 °C for 15 min. Adapters were ligated to the restriction enzyme cut sites by bringing the 25 μL of digested DNA to 50 μL with a solution containing 1× ligase buffer (Roche, Indianapolis, IN, USA), 0.2 μmEcoRI adapter, 2 μmMseI adapter and 0.4 U T4 DNA ligase (Roche) and incubating at 16 °C for 16 h. Preselective PCRs were then performed in 20-μl volumes containing 4 μL of diluted (1 : 10) ligation product, 1× NH4 PCR buffer (Bioline, Tauton, MA, USA), 2.5 mm MgCl2, 187.5 μm of each dNTP, 1 μm of each preselective primer (Table 1) and 1 U Biolase DNA polymerase (Bioline). PCR conditions were 72 °C for 2 min, 20 cycles of 94 °C for 20 s, 56 °C for 30 s, 72 °C for 2 min and a final step of 60 °C for 30 min. Selective PCRs were performed in 20-μl volumes containing 4 μL diluted (1 : 10) preselective PCR product, 1× NH4 PCR buffer (Bioline), 2.5 mm MgCl2, 187.5 μm of each dNTP, 0.25 μm of one Eco+AXX selective primer, 0.5 μm of one Mse+CXX selective primer and 1 U Biolase DNA polymerase (Bioline). PCR conditions were 94 °C for 2 min, 10 cycles of 94 °C for 20 s, 66 °C (decreasing by 1 °C each cycle) for 30 s, 72 °C for 2 min, 20 cycles of 94 °C for 20 s, 56 °C for 30 s, 72 °C for 2 min and a final step of 60 °C for 30 min. We tested 42 different selective primer combinations, and chose eight (Table 1) that amplified consistently, were polymorphic, had a high signal-to-noise ratio and produced fragments that were distributed throughout the available 150–500 bp size range (Meudt & Clarke, 2007). Selective PCR products from primer combinations A–D and E–F were pooled separately in 1 : 1 : 1 : 2 ratios and genotyped using an Applied Biosystems (ABI, Carlsbad, CA, USA) 3730XL automated capillary sequencer. Electropherograms were analysed using GeneMapper version 3.7 software (Applied Biosystems). The height (in relative fluorescence units, RFU) and size (in base pairs, bp) of all fragments > 200 RFU and between 150 and 500 bp were recorded. Fragments < 150 bp in length were not included to reduce the impact of homoplasy, which is greatest for small fragments (Vekemans et al., 2002; Caballero et al., 2008).

Table 1.   Primer sequences and combinations used in the AFLP analysis. The number of loci for each combination that were more than 95% repeatable is given in parentheses.
PrimerSequence (5′–3′)
Preselective primers
Selective primers
Primer combinations
 A (170)Eco+ACT/Mse+CCT
 B (125)Eco+AAG/Mse+CGA
 C (93)Eco+AGG/Mse+CGA
 D (123)Eco+AGC/Mse+CCT
 E (174)Eco+ACT/Mse+CAT
 F (167)Eco+AAG/Mse+CAA
 G (117)Eco+AGG/Mse+CAA
 H (104)Eco+AGC/Mse+CAA

To ensure repeatability of AFLP profiles, all samples were genotyped twice for every primer combination, and replicate samples were run on separate gels. Each gel included samples from both ecotypes and all four sites. To reduce error and subjectivity resulting from scoring AFLP profiles manually (Bonin et al., 2004), scoring was semi-automated using the software programs Peakmatcher (DeHaan et al., 2002; and AFLPScore (Whitlock et al., 2008; Peakmatcher automatically creates marker categories based on the repeatability of markers across replicates. Minimum repeatability was set to 95% with bin sizes ranging from 0.4 to 1 bp; all other settings were kept at the default values. Marker categories generated by Peakmatcher were then input back into GeneMapper, and peak heights for the specified marker categories were obtained for each sample. Peak heights were then used by AFLPScore to determine the optimal threshold that minimizes genotyping error while maximizing the number of retained markers. Markers that are likely to contribute high error rates to the data set are excluded, and a binary table for the presence and absence of retained markers is generated. The locus threshold (the minimum average peak height for a marker category to be retained) was 400 RFU for all primer pairs. The phenotype threshold (the peak height above which a marker is scored as present) ranged from 300–400 RFU. AFLPScore then calculates the mismatch error rate, which is defined as the ratio between the observed number of phenotypic differences and the total number of phenotypic comparisons (the number of loci multiplied by the number of pairs of profiles; Bonin et al., 2004; Whitlock et al., 2008).

The mean fragment size of all AFLP loci studied, the mean number of AFLP fragments per individual, the mean percentage of loci polymorphic at the 5% level in each population and the expected heterozygosity (HE) of each population, assuming Hardy–Weinberg equilibrium, were calculated using the program aflp-surv Version 1 (Vekemans, 2002;

Outlier detection

We made two types of comparisons: between morphotypes and between sites. Between-morphotype comparisons are comparisons of sculptured and smooth snails found within each site, to see whether divergent selection has resulted in genetic polymorphism. There were four between-morphotype comparisons (one for each site studied). Between-site comparisons are comparisons of snails of the same morphotype found at different sites. Four of these comparisons were also made (smooth snails between the two Kaua’i sites, sculptured snails between the two Kaua’i sites and the same two comparisons between the O’ahu sites). If divergent selection is primarily acting between the habitats of smooth and sculptured snails, we would expect to find fewer true outlier loci in these comparisons.

We used two approaches to identify outlier loci. The first approach was developed by Beaumont & Nichols (1996) and implemented in the software package Dfdist ( In this software package, Weir & Cockerham’s (1984) estimator of FST, θ, is first calculated for each locus. The software then performs coalescent simulations to generate data sets with a distribution of θ close to the empirical distribution. The simulations were carried out by generating 50 000 loci using a model with two populations. The use of the trimmed mean for target FST values is recommended by Caballero et al. (2008), as this removes loci that are most likely to be influenced by selection (Beaumont & Balding, 2004) and provides a theoretically neutral baseline against which potential outlier loci can be compared. The trimmed mean was calculated from the empirical data set by removing the highest 30% and lowest 30% of FST values and taking the mean of the remaining loci, but this value was negative for all comparisons. As negative FST values are theoretically impossible, following the example of Galindo et al. (2009), the mean of the lowest positive FST values was calculated for the between-morphotype (0.0006) and between-site comparisons (0.001). Because the between-site comparisons made within morphotypes serve as a null expectation for distributions of FST, 0.001 was used for all comparisons. Simulations were robust to changes in the value of θ (i.e. 4), and a value of 1.4 (calculated from the analyses of the mitochondrial gene COI, K. Tice unpublished data) was used. All remaining parameters were kept at the default values. We used the 95th and 99th quantiles of FST as thresholds for outlier loci. Loci that fell above these thresholds have unusually high values of FST, and are those loci that are potentially under selection.

Two problems exist with the outlier detection method implemented in the Dfdist package. First, the program uses a relatively simple finite island model, in which all populations are assumed to have equal sizes and exchange migrants at the same rate. A violation of this model can lead to a high false-positive rate (Foll & Gaggiotti, 2008). Second, the expected FST distribution is generated from simulations using the empirical data set. However, the potential presence of selected loci in the data set can lead to biases in the estimation of this distribution, and the trimmed mean approach for establishing a neutral baseline is fairly subjective. We therefore used a second approach that estimates the probability that each locus is subject to selection using a Bayesian method implemented by the software package BayeScan (Foll & Gaggiotti, 2008; BayeScan defines two alternative models: one that includes the effect of selection and another in which the effect of selection is excluded. Model choice is instead based on Bayes factors, which in this case is simply the ratio of posterior model probabilities. Evidence for selection is based on Jeffreys’ scale of evidence for Bayes factors as described in the program manual.

Finally, whereas some simulations have shown that these two outlier detection methods are relatively robust to departures from demographic models (Beaumont & Nichols, 1996; Beaumont & Balding, 2004), a recent simulation study (Excoffier et al., 2009) has shown that population substructure considerably increases the number of false positives. To determine whether population substructure could be inflating the number of false positives, we explicitly tested for the presence of substructure within smooth and sculptured samples using a hierarchical amova. See Test of parallel divergence and population structure below for details.

Classifying outliers and multitest correction

No outliers were identified using BayeScan, so all further analyses were conducted using results from Dfdist (see Outlier detection for criteria). Due to the large number of loci studied, the risk of Type I error is high. However, it is much less likely that the same locus would appear as an outlier in more than one comparison. Therefore, outliers were classified as nonrepeated and repeated. Nonrepeated outliers are loci that were identified as outliers in only one of eight pairwise comparisons made, whereas repeated outliers were identified in more than one pairwise comparison. Repeated outliers are much more likely to be true outlier loci, rather than the result of Type I error. Further, if selection is primarily acting between the habitats of smooth and sculptured snails, we would expect outliers to be found primarily in the between-morphotype comparisons. Therefore, we classified the nonrepeated and repeated outliers based on the types of comparisons in which they were found: between-morphotype comparisons, between-site comparisons or both.

To further evaluate the role of Type I error and its effect on outlier detection, false discovery rate (FDR) and sequential goodness-of-fit (Sgof) multitest corrections were performed on the outliers identified by Dfdist and Q values were calculated using the program SGoF+ (Carvajal-Rodriguez et al., 2009; Unlike other methods, the Sgof multiple test correction increases its statistical power in proportion to the number of tests used. This makes it very useful for exploratory studies with a high number of tests.

Test of parallel divergence and population structure

If similar selective forces are acting at all study sites, we predict parallel trends in divergence in the same loci that meet our criteria as outliers (Campbell & Bernatchez, 2004; Mealor & Hild, 2006). Such a pattern is likely when dispersal is high, because a beneficial mutation will spread rapidly throughout the species’ range. We classified a locus as showing a parallel trend in divergence if the frequency of band presence in one morphotype was at least 5% greater than in the other morphotype at all four sites. Loci that did not meet this criterion were classified as not showing a parallel trend. A chi-squared test was then used to determine whether parallel trends in divergence occurred more or less frequently than expected in outlier and nonoutlier loci.

Further, if selection is acting on outlier loci, but not on nonoutlier loci, we would predict that outlier loci would exhibit a population structure reflecting morphotype, whereas nonoutlier loci might be structured by site or show no structure as E. hawaiiensis has a high dispersal potential. Therefore, we partitioned loci into nonoutliers and outliers according to the Dfdist results, and genetic structure for each type was determined by a nested analysis of molecular variance (amova) using Arlequin version 3.11 (Excoffier et al., 2005). Snails were grouped according to morphotype, with each of the four study sites nested within each morphotype. Although it is frequently used in outlier studies using AFLP loci (Oetjen & Reusch, 2007; Galindo et al., 2009), Arlequin is not designed for use with dominant markers and requires treating the multilocus AFLP phenotype as a haplotype and using similarity or distance indices in the amova (Holsinger et al., 2002). Therefore, population structure was also analysed using the Bayesian program Hickory version 1.1 (Holsinger et al., 2002), which allows direct estimation of FST from dominant markers. Because population structure was so minimal for nonoutlier loci, Hickory was only used to analyse population structure in outlier loci. Hickory cannot perform nested analyses, and so three separate analyses were performed: population structure between morphotypes, population structure between sites within the smooth morphotype and population structure between sites within the sculptured morphotype. For each analysis, the default parameters were used and all four models were compared. The full model allows for inbreeding, the = 0 model implies no inbreeding, the θII = 0 model implies no differentiation between populations, and the f free model decouples the estimates of f and θII. Model choice was based on the Deviance Information Criterion (DIC), which is similar to the Aikake Information Criterion, while taking into account how well the model fits the data (Dbar) and the number of parameters being estimated (pD), as recommended in the program manual.

We visualized the patterns of divergence in outliers and nonoutliers by constructing neighbour-joining trees. Genetic differentiation (FST) among populations was calculated using 1000 bootstraps and the default parameters in aflp-surv version 1.0 (Vekemans, 2002). aflp-surv was then used to generate 1000 bootstrapped Nei’s genetic distance matrices for each class of loci. These matrices were used to construct neighbour-joining trees with the program neighbor, which were then input into the program consense to create unrooted 50% majority rule consensus trees. neighbor and consense are programs within phylip 3.68 (Felsenstein, 2008).


AFLP repeatability

Duplicate samples of all specimens were analysed at 1073 AFLP loci, 1067 of which were segregating loci (not present or absent in all individuals). The mean mismatch error rate was 1.44% (range 0.96–2.05%) for each of the eight primer combinations used. The mean number of AFLP fragments identified in each specimen was 92.2, and the mean fragment size was 358.5 (SD 93.9). Averaged over all eight populations, the mean percentage of polymorphic loci at the 5% level was 23.8% (SD 2.68) and the mean heterozygosity was 0.07 (SD 0.0009).

Outlier detection and multitest correction

In the Dfdist analyses, the average FST generated by the simulations was slightly greater than the target FST. Target FST values were 0.001 for all eight comparisons, but the average simulated FST ranged from 0.0054 to 0.0062. In all eight comparisons, an average of only 34% of loci (range 30–44%) had P-values that placed them above the 50th quantile, whereas 66% fell below it. Thus, the elevated target FST values appear to have shifted the simulated FST distribution upwards, so that the detection of outliers is slightly conservative. Despite this conservatism, outliers were detected at the 95th quantile in all eight comparisons and at the 99th quantile in all four of the between-morphotype comparisons and in one of the between-site comparisons (Fig. 2). In the between-morphotype comparisons, 34 loci (3.17%) were identified as outliers at the 95th quantile, whereas at the 99th quantile seven outliers (0.65%) were identified. In the between-site comparisons, 32 loci (2.98%) were identified as outliers at the 95th quantile, whereas at the 99th quantile two loci (0.19%) were identified (Table 2). Table 2 further classifies outlier loci as either nonrepeated outliers or repeated outliers. Of the 55 different outliers identified at the 95th quantile, only 13 were repeated outliers. Only two of those loci were repeated in two independent between-morphotype comparisons; the remaining 11 were identified in one between-morphotype comparison and one between-site comparison, and those comparisons almost always shared a population, so they were not truly independent. Of the eight outliers at the 99th quantile, only one locus was identified as a repeated outlier, in both a between-morphotype and a between-site comparison.

Figure 2.

 Outlier detection results from Dfdist analyses. Plots show FST values, conditional on heterozygosity, of the 1073 AFLP loci studied. Plots in the upper two rows show between-morphotype comparisons, whereas those in the lower two rows show comparisons within morphotypes but between sites. The thin and thick lines in each plot represent the 95th and 99th quantiles, respectively, of the simulated FST values predicted under neutrality obtained with Dfdist. Nonoutlier loci are represented by white dots, outlier loci that exceed the 95th quantile are shown as grey dots, and outliers that exceed the 99th quantile are depicted as black triangles.

Table 2.   Categorization of the 1073 AFLP loci by the software package Dfdist.
Locus type*Type of comparison†Number of loci at 95th (99th) quantiles
  1. *Nonoutliers: loci that fall below the 95th quantile for FST values predicted for neutral loci; nonrepeated outliers: loci exceeding the 95th (99th) quantile in one comparison; repeated outliers: loci exceeding the 95th (99th) quantile in two comparisons.

  2. †Between-morphotype comparisons: comparisons of smooth and sculptured snails found within each site; between-site comparisons: comparisons of snails of the same morphotype from different sites.

Nonoutlier 1018 (1065)
Nonrepeated outlierBetween morphotype21 (6)
Between site21 (1)
Repeated outlierBetween morphotype2 (0)
Between site0 (0)
Both11 (1)

In contrast to the Dfdist results, using BayeScan, no outlier loci were identified. Using Jeffrey’s scale of evidence, there is substantial evidence that a locus is under selection when the log10(BF) is > 0.5 (see program manual). However, we identified no loci where log10(BF) was > 0.26. For the between-morphotype comparisons, log10(BF) ranged from 0.14 to 0.26, with an average of 0.22. For the between-site comparisons, log10(BF) ranged from 0.13 to 0.23, with an average of 0.18.

To determine whether Type I error might be contributing to the contrasting results found using Dfdist and BayeScan, we applied multitest corrections to the Dfdist results. After either FDR or sequential goodness-of-fit (Sgof) multitest corrections at alpha = 0.05, no loci were identified as outliers in the Dfdist analyses. For the eight comparisons, Q values ranged from 0.26 to 0.78.

Parallel divergence and population structure

The observed number of loci showing parallel trends was not significantly different from that expected for either nonoutlier or outlier loci (Table 3, inline image = 0.054, = 0.816).

Table 3.   Chi-square (χ2) test to determine whether parallel trends in divergence occurred more or less frequently than expected in nonoutlier and outlier loci. A Yates correction was applied due to an expected value < 5; this did not alter the significance of the result.
Locus typeParallel trend?ObservedExpectedχ2
P value   0.816

Population genetic structure analyses suggest that the outlier loci identified are not differentiating between morphotypes. In analyses of molecular variance (amova; Table 4), for both nonoutlier and outlier loci, morphotype explained < 1% of the variation in the data, and over 97% of the variation was found within sites. For the nonoutlier loci, there was no significant differentiation between sites within morphotypes (FSC,6 = −0.00361, > 0.999) or between morphotypes (FCT,1 = −0.00075, = 0.94). For the outlier loci, differentiation between sites within morphotypes was small but significant (FSC,6 = 0.026, < 0.001), whereas there was no significant differentiation between morphotypes (FCT,1 =−0.00014, = 0.55). Similarly, the Hickory analyses (Table 5) reveal that the little genetic structure that does exist between populations is greater between sites (θII = 0.025 for smooth snails and 0.022 for sculptured snails) than between morphotypes (θII = 0.00079). In all three comparisons, we chose the full model, which always had the lowest DIC score, although each model would lead to a similar conclusion (Table 5). Neighbour-joining trees constructed using outlier and nonoutlier loci showed slightly different topologies, although neither was well resolved (Fig. 3). The exception was one node in the outlier tree with a bootstrap value of 73%, which united the smooth snails from Kewalo and Pua’ena.

Table 4.   Analysis of molecular variance of 1073 AFLP markers for populations of smooth and sculptured Echinolittorina hawaiiensis from four sites in the main Hawaiian Islands. AFLP loci were grouped into nonoutlier and outlier loci and analysed separately.
Sourced.f.Sum of squaresVariance% of totalF-statisticsSignificance
Nonoutlier loci
 Between morphotypes140.776−0.0380.07FCT = −0.00075= 0.94
 Between sites within morphotypes6270.987−0.188−0.36FSC = −0.00361> 0.999
 Within sites22511351.49150.451100.44FST = −0.00436> 0.999
Outlier loci
 Between morphotypes110.359−0.001−0.01FCT = −0.00014= 0.55
 Between sites within morphotypes662.7100.1582.64FSC = 0.02639< 0.001
 Within sites2251314.2445.84197.37FST = 0.02626< 0.001
Table 5.   Genetic structure analysis of outlier loci using the Bayesian method implemented in Hickory version 1.1. Genetic structure was analysed between morphotypes and between sites within each morphotype (smooth and sculptured). The preferred model for each comparison is shown in boldface. See text for explanations of the model selection criteria and the four models.
  1. *= an estimate of FIS, or inbreeding within populations.

  2. θII = amount of genetic differentiation between populations; comparable to Weir and Cockerham’s FST.

Between morphotypes
 Full model574.196501.25773.659648.5750.7380.00795
 = 0575.917496.72979.118655.105 0.00489
 θII = 0629.205576.0253.185682.390.969 
 f free626.884479.766147.119774.0030.5010.04334
Between sites within smooth snails
 Full model805.165697.974107.191912.3560.5900.02492
 = 0804.808691.223113.585918.393 0.17369
 θII = 0951.433900.9150.5231001.960.968 
 f free810.232659.235150.997961.2290.5060.04074
Between sites within sculptured snails
 Full model793.36692.68100.68894.040.6230.02217
 = 0793.967686.039107.928901.894 0.01489
 θII = 0919.067868.34750.72969.7870.968 
 f free796.538650.123146.415942.9530.4960.03724
Figure 3.

 Unrooted neighbour-joining 50% majority rule consensus trees based on Nei’s genetic distance between populations based on (a) nonoutlier loci and (b) outlier loci. Bootstrap values > 50% are indicated.


Detecting small islands of differentiation in the presence of high recombination

A genome scan using over 1000 AFLP markers and repeated comparisons between smooth and sculptured morphotypes failed to detect convincing regions of genomic differentiation in E. hawaiiensis. The ability to detect selection with a genome scan depends on an interaction between the genetic architecture of the trait, the genomic sampling density of the molecular markers and the effects of LD between the genomic region(s) under selection and marker loci. Our ability to detect a single locus under selection may be quite small. We used 1073 AFLPs, but the genome size of the related Echinolittorina punctata (formerly Littorina punctata) is approximately 792 million bp (Vitturi et al., 1995). Assuming a similar size for the E. hawaiiensis genome, on average there are > 700 kb between our marker loci. Extensive LD will increase the ability of a moderate-sized genome scan to detect selection, but LD can vary widely across the genome and across species and will be affected by the recombination rate, the strength of selection, population history, mating system and the age of the selected allele (Storz, 2005; Vasemagi & Primmer, 2005; Stinchcombe & Hoekstra, 2008). In outcrossing species, LD often extends < 1 kb (Vasemagi & Primmer, 2005; Bonin, 2008). This is evidenced in quantitative trait loci (QTL) associated with wing patterning in Heliconius butterflies, in which there is little LD between sites separated by only 500 bp (Counterman et al., 2010). Similarly, genomic regions identified as outliers in another intertidal snail, L. saxatilis, are small and independent, with differentiation extending only a few hundred bases (Wood et al., 2008). Whereas Via & West (2008) did find large ‘islands’ of differentiation extending 10 centimorgans (cM) on either side of QTL in pea aphids, simulations by De Kovel (2006) demonstrated that marker spacing should be about 0.5 cM to detect intermediate strength selection on new mutations in a large population. Similarly, Teshima et al. (2006) found that although outlier approaches will identify several interesting candidate genes, they will also miss many, and in some cases most, loci of interest. In fact, in humans, Hirschhorn & Daly (2005) suggested that a million SNPs might be necessary to detect disease genes.

Outlier loci have been identified between ecotypes of the temperate snail L. saxatilis, a system with morphological parallels to E. hawaiiensis. In England, high and low shore L. saxatilis ecotypes were compared in three separate populations with an AFLP genome scan of 306 loci. In all three populations, the same 15 loci (5%) were identified as outliers at the stringent 99th quantile (Wilding et al., 2001). Further, L. saxatilis ecotypes were also studied at three sites in Spain, and after multitest corrections approximately 3% of 2356 loci were still classified as outliers at the 99th quantile (Galindo et al., 2009). These results are clearly different from those for E. hawaiiensis yet cannot solely be explained by differences in genome size between species or increased marker density between studies. The genome size of L. saxatilis, at 1 billion bp (Vitturi et al., 1995), is likely to be larger than that of E. hawaiiensis. Although Galindo et al. (2009) doubled the number of markers used compared with our study, orders-of-magnitude increases would be required to significantly increase genomic sampling density. Assuming that targets of selection are not located on chromosomal inversions in L. saxatilis, we suggest that differences in dispersal between these two study systems have fundamental effects on the extent of genomic hitchhiking between loci under selection and marker loci. Littorina saxatilis has direct development and ‘crawl away’ benthic juveniles, which strongly limits dispersal (Erlandsson et al., 1998). On the other hand, E. hawaiiensis has an estimated 3–4-week planktonic larval duration that allows for broad dispersal (Struhsaker & Costlow, 1968). In subdivided populations such as L. saxatilis, LD will extend further along chromosomes than in panmictic populations such as E. hawaiiensis because a reduced effective migration rate (due to decreases in migration, survival and successful interbreeding) reduces the effective recombination rate (Charlesworth et al., 1997; Via, 2009).

LD resulting from population subdivision may also explain why a vast majority of genome scans using moderate marker densities detect outlier loci (see review by Nosil et al., 2009). The common frog Rana temporaria has a genome size of over 4 billion bp (Vinogradov, 1998), and Bonin et al. (2006) found 8–14% of 392 AFLP loci were outliers. Similarly, the lake whitefish Coregonus clupeaformis has a genome size of over 2 billion bp (Hardie & Hebert, 2003), and Campbell & Bernatchez (2004) found approximately 3% of 440 AFLP loci were outliers. In contrast, we are aware of only one other genome scan in a broadly dispersing marine species, the oyster Crassostrea virginica (Murray & Hare, 2006). In this study, none of the 215 AFLP loci examined were significant outliers after multiple test corrections. It is clear in other well-studied marine systems with high dispersal that natural selection plays a key role in maintaining polymorphism at single loci (Hilbish & Koehn, 1985; Schmidt & Rand, 2001). If shell sculpture in E. hawaiiensis is also controlled by a single locus or a few QTL with major effects, then the probability of detecting selection with a thousand AFLPs is quite low.

An alternative explanation for our failure to detect outliers in this comprehensive genome scan is that variation in shell phenotypes is not a result of Mendelian or quantitative genetic variation. The common garden results of Struhsaker (1968) have been questioned by Reid (2007) who suggests that variable shell morphologies observed in larvae were artefacts from larval culturing techniques. Further, Reid cites other studies of intertidal snails with broad dispersal, such as Echinolittorina australis (Yeap et al., 2001), that show shell variation has large environmental components. Nonetheless, differences in other traits studied by Struhsaker, including larval growth rates and survivorship, are difficult to explain solely by experimental artefacts because cultures were maintained on similar algal diets.

Dfdist, BayeScan and Type I error

The two outlier detection approaches used provided slightly different results, emphasizing a need to control for Type I error and to use caution when interpreting genome scans of AFLP markers. First, using the program Dfdist (Beaumont & Nichols, 1996), 5% of loci were identified as outliers at the 95th quantile and 0.7% at the 99th quantile (Fig. 2, Table 2). Two loci were identified as outliers in two independent between-morphotype comparisons at the 95th quantile, and these two loci represent the best candidates for further study and provide some suggestion that there may be differentiation between morphotypes. However, after applying a multitest correction due to the large number of tests performed, the number of significant outliers fell to zero in all eight pairwise population comparisons. In agreement with these results, BayeScan also identified no significant outlier loci in any of the eight pairwise population comparisons. Few outlier studies have applied multitest corrections (but see Storz & Nachman, 2003; Murray & Hare, 2006; Galindo et al., 2009), and (as in this study) the percentage of loci identified as outliers in many studies is often less than the percentage expected due to Type I error. For example, using the 95th quantile as the criterion for outlier status, the percentage of loci identified as outliers in the whitefish C. clupeaformis was 1.4–3.2% (Campbell & Bernatchez, 2004); in Norway spruce (Picea abies), 2.5–3.3% (Acheréet al., 2005); in the grass Hesperostipa comata, 2.6% (Mealor & Hild, 2006); and in the bird Andropadus virens, 3.2% (Smith et al., 2008). For studies that did not perform further analyses on these outlier loci (Acheréet al., 2005; Smith et al., 2008), the conclusion that they are influenced by selection is dubious. However, some of these studies did further analyse the outlier data and found patterns suggesting that some loci may be experiencing divergent selection (Campbell & Bernatchez, 2004; Mealor & Hild, 2006). For example, in whitefish, this conclusion was reinforced through studies that demonstrated an association between outlier and QTLs (Rogers & Bernatchez, 2005, 2007).

Given these precedents, we conducted further analyses to determine whether any patterns exist in our outlier data to suggest that they may indeed be experiencing divergent selection. First, we found no excess of parallel trends in divergence in outlier loci (Table 3). Second, an analysis of molecular variance (amova) suggested that the outlier loci identified are not differentiating between morphotypes, as would be expected if selection is acting between them (Table 4). Similarly, a Bayesian population genetic analysis of the outlier loci found greater structure between sites, rather than between morphotypes (Table 5). Finally, neighbour-joining trees constructed using outlier and nonoutlier loci were not well resolved (Fig. 3). However, in the outlier tree, the smooth snails from the two O’ahu sites were united with a bootstrap value of 73%. Although this provides some suggestion that selection is causing parallel divergence at some locus or loci in these two populations, the evidence is scant in comparison with other studies. For example, in L. saxatilis populations in England, trees constructed using all loci grouped populations by morphotype with high bootstrap support; when outlier loci were removed, populations instead grouped by site (Wilding et al., 2001). Similarly, for host races of leaf beetles, trees constructed using different host-specific outliers supported host-associated monophyly, whereas a tree constructed with nonoutlier loci grouped sympatric, different host pairs (Egan et al., 2008). Thus, patterns analysed in the outlier loci detected in this study do not support the notion that these regions of the genome are differentiated due to divergent selection on shell morphology between the habitats.

Our results emphasize caution when interpreting outliers in genome scans (Butlin, 2010). Dfdist is the most popular outlier detection method (Caballero et al., 2008; Galindo et al., 2009), and this study shows that using this method alone could result in identifying neutral loci as outliers due to Type I error. In fact, recent simulation studies have shown that, compared with Dfdist, BayeScan is more efficient at detecting outliers (Pérez-Figueroa et al., 2010) and has lower Type I error (Pérez-Figueroa et al., 2010; Narum & Hess, 2011). To avoid spurious conclusions, using two or more outlier detection methods has been advocated (Luikart et al., 2003; Vasemagi & Primmer, 2005), but rarely applied (but see Bonin et al., 2006; Oetjen & Reusch, 2007; Eveno et al., 2008; Namroud et al., 2008). Additionally, performing multiple independent population comparisons across habitats and morphotypes (Wilding et al., 2001; Bonin et al., 2006; Mealor & Hild, 2006; Egan et al., 2008; Nosil et al., 2008; Williams & Oleksiak, 2008; Galindo et al., 2009), using a conservative significance level (Wilding et al., 2001; Jump et al., 2006; Murray & Hare, 2006; Egan et al., 2008; Nosil et al., 2008; Williams & Oleksiak, 2008; Galindo et al., 2009), using a multitest correction (Storz & Nachman, 2003; Galindo et al., 2009) and looking for patterns in the data that are unlikely to appear by chance, including parallel trends in divergence or associations with environmental variables (Campbell & Bernatchez, 2004; Jump et al., 2006; Mealor & Hild, 2006), are encouraged.

Some of our suggestions may be interpreted as overly stringent. For example, some multiple test corrections have been criticized for being too conservative and reducing the power of outlier detection methods (Murray & Hare, 2006; Galindo et al., 2009). Further, requiring loci to be identified as outliers in multiple population comparisons may result in overlooking some selected loci that may be of interest. This assumes that the same loci are fixed in response to similar environmental conditions, although research suggests this may not be the case (Stinchcombe & Hoekstra, 2008). There may be multiple ways for a certain phenotype to evolve, so that even if the same selective forces are favouring the same phenotype in multiple populations, the underlying genotypes will differ and not be identified as repeated outliers (Campbell & Bernatchez, 2004; Nosil et al., 2008; Galindo et al., 2009). Ultimately, the measures taken to control the FDR depend on the goal of the study. If the identification of outliers is a first step in identifying putative candidate genes under selection, a high rate of false positives may be acceptable. However, if the results of genome scans themselves are used to draw conclusions about how natural selection shapes genomic variation, our study clearly indicates that the FDR needs to be controlled.


We thank D.D. Kapan, R.H. Cowie and A.D. Taylor for suggestions and assistance, P. Conde-Padín and E. Rolán-Alvarez for an introduction to the AFLP technique and L.W. Smith for help with collections. The comments of two anonymous reviewers and A. Gardner improved an earlier version of the manuscript. This work was funded by the Ecology, Evolution and Conservation Biology program of the University of Hawai’i (NSF GK–12 grant DGE05–38550 to K.Y. Kaneshiro), a University of Hawai’i Arts and Sciences Advisory Council Award, a C.H. and M.B. Edmondson Research Grant, an E.A. Kay Endowed Scholarship, a Jessie D. Kay Memorial Fellowship and a Hawaiian Malacological Society Research Award to K.A. Tice.

Data deposited at Dryad: doi: 10.5061/dryad.q6t5g