3 Current address: Lawrence Berkeley National Lab, Building 84 Room 355, One Cyclotron Road, Berkeley California 94720
The colonization of novel habitats involves complex interactions between founder events, selection, and ongoing migration, and can lead to diverse evolutionary outcomes from local extinction to adaptation to speciation. Although there have been several studies of the demography of colonization of remote habitats, less is known about the demographic consequences of colonization of novel habitats within a continuous species range. Populations of the Eastern Fence Lizard, Sceloporus undulatus, are continuously distributed across two dramatic transitions in substrate color in southern New Mexico and have undergone rapid adaptation following colonization of these novel environments. Blanched forms inhabit the gypsum sand dunes of White Sands and melanic forms are found on the black basalt rocks of the Carrizozo lava flow. Each of these habitats formed within the last 10,000 years, allowing comparison of genetic signatures of population history for two independent colonizations from the same source population. We present evidence on phenotypic variation in lizard color, environmental variation in substrate color, and sequence variation for mitochondrial DNA and 19 independent nuclear loci. To confirm the influence of natural selection and gene flow in this system, we show that phenotypic variation is best explained by environmental variation and that neutral genetic variation is related to distance between populations, not partitioned by habitat. The historical demography of colonization was inferred using an Approximate Bayesian Computation (ABC) framework that incorporates known geological information and allows for ongoing migration with the source population. The inferences differed somewhat between mtDNA and nuclear markers, but overall provided strong evidence of historical size reductions in both white sand and black lava populations at the time of colonization. Populations in both novel habitats appear to have undergone partial but incomplete recovery from the initial bottleneck. Both ABC analyses and measures of mtDNA sequence diversity also suggested that population reductions were more severe in the black lava compared to the white sands habitat. Differences observed between habitats may be explained by differences in colonization time, habitat geometry, and strength or response to natural selection for substrate matching. Finally, effective population size reductions in this system appear to be more dramatic when colonization is accompanied by a change in selection regime. Our analyses are consistent with a demographic cost of adaptation to novel environments and show that it is possible to infer aspects of the historical demography of local adaptation even in the presence of ongoing gene flow.
How adaptive, demographic, and genetic processes interact during colonization of novel habitats is of central concern to studies of speciation and for the management of threatened or invasive taxa (e.g., Tufto 2001; Lenormand 2002; Nosil et al. 2005). Of particular interest is whether colonizing populations experience reduced effective population size as a consequence of natural selection and/or gene flow. Natural selection can suppress population growth in novel environments due to high mortality associated with initial maladaptation (e.g., Lande and Shannon 1996). Theory also suggests that colonizing populations may best persist, and eventually adapt, with intermediate migration rates. Too little migration can lead to extinction via demographic processes or loss of genetic variance, and too much migration can hinder the process of local adaptation, and thus population persistence (Lenormand 2002; Holt et al. 2004; Nosil and Crespi 2004, Alleaume-Benharira et al. 2006). However, predictions from these theoretical models depend on a number of complex and interacting factors (e.g., mating system, genetic architecture of traits, density dependence of fitness). Therefore it is unclear whether natural populations that colonize novel habitats, and which experience both selection and gene flow, should exhibit a particular demographic signature associated with the early stages of adaptation.
Molecular data have been used with increasing sophistication to infer the demography of colonization of isolated habitats (e.g., Estoup et al. 2001; Gaggiotti et al. 2004) and to infer population reductions in isolated populations (e.g., Groombridge et al. 2000). Less attention has been given to estimation from molecular data of historical changes in effective population size under strong selection, especially in the presence of ongoing migration (but see Hadly et al. 2004). Further because colonization events are often unique, there is rarely an opportunity in natural systems to compare population responses to colonization under specified conditions.
Here we use molecular evidence to infer the historical demography of colonization for natural populations of lizards that have recently colonized and adapted to novel habitats within a continuous population. Populations of the Eastern Fence Lizard, Sceloporus undulatus (Sceloporus cowlesi, Southwestern Fence Lizard, sensu Leaché and Reeder 2002), are continuously distributed across two distinct ecological gradients in substrate color in the Tularosa Basin of south-central New Mexico. Blanched color morphs occupy the white gypsum dunes of white sands (Lowe and Norris 1956), melanic color morphs occupy the black basalt rocks of the Carrizozo lava flow (Lewis 1949), and “wildtype” color morphs inhabit the brown soils of the surrounding Chihuahuan desert scrublands (Fig. 1).
Despite the independent geological origins of the white sand and black lava formations, their colonization history is likely quite similar. The entire Tularosa Basin, ringed by regions of geological uplift on all sides, was engulfed by a large inland lake during the last glacial maxima (S. G. Fryberger, unpubl. data). The drying of Lake Otero approximately 12,000 years ago represents a maximum age of recolonization of lizards into the Tularosa Basin. The gypsum sands of white sands began to form approximately 10,000 years before present (ybp) with most of the deposition complete by 5000 ybp (Kocurek, in press; S. G. Fryberger, unpubl. data). The exposure age of the black rocks that characterize the Carrizozo lava flow are also estimated at approximately 5000.
The two novel habitats represent independent but analogous experiments in selection. First, as detailed above, the white sand and black lava formations are of comparable age. Second, the white sand and black rock formations currently encompass areas of approximately the same size (275 km2 and 320 km2, respectively). Third, because the southern edge of the Carrizozo lava flow and the northern edge of the white sands dune fields are less than 20 km apart, abiotic factors in the two habitats other than substrate color are extremely similar. Fourth, we can infer that initial colonization of the two novel habitats occurred from the same source population because S. undulatus in this region derive from a single lineage (i.e., Sceloporus cowlesi sensu Leaché and Reeder 2002). Therefore colonists of the two habitats would have shared traits likely to influence demographic response to colonization (e.g., innate dispersal ability, breeding structure). Finally, both novel habitats provided selection pressure specifically for substrate matching, albeit in opposing directions. The importance of crypsis for avoiding avian predators has been well documented for small diurnal lizards in southern New Mexico (Luke 1989), and previous research indicates that marked variation in dorsal coloration among habitats is not due to phenotypic plasticity (Rosenblum 2005).
Here, we examine patterns of molecular diversity to test for population reductions, either transient or sustained, associated with colonization of, and adaptation to, the novel habitats. We first provide evidence for the role of natural selection and gene flow in this system and then ask whether the demographic effects of colonization are detectable and consistent between the two geologically independent environments.
Ninety-one S. undulatus individuals were sampled from three habitat categories. (1) White sand habitat: 29 blanched lizards (11 females, 18 males) were collected from three localities on the gypsum dunes of White Sands National Monument and White Sands Missile Range, Otero County. (2) Black lava habitat: 25 melanic S. undulatus (11 females, 14 males) were collected from three localities on the basalt rocks of the Carrizozo lava flow, Lincoln County. (3) Dark soil habitat: 37 wildtype lizards (19 females, 18 males) were collected from four localities on the yucca scrublands and blue gramma grasslands of Jornada Long-Term Ecological Research Station, White Sands Missile Range, and White Sands National Monument in Otero, Doña Ana, and Socorro Counties. Whenever possible, ten individuals were sampled per population, and comparable numbers of males and females were used. Representative substrate samples were also collected from each habitat type. Spatial locations of populations are illustrated in Figure 1.
Throughout, we refer to four categories of samples: (1) white sand (samples from the gypsum dunes of white sands), (2) black lava (samples from the Carrizozo lava flow), (3) dark soil (samples from localities with typical Chihuahuan desert substrate), and (4) Tularosa Basin dark soil (a subset of “dark soil” samples including only those populations found within the Tularosa Basin, populations G and H east of the San Andres Mountains in Fig. 1). We distinguish between “dark soil” and “Tularosa Basin dark soil” populations because the Tularosa Basin dark soil populations are of particular interest as a putative source for colonization of the novel habitats and as a colonization “control” (i.e., these populations colonized the Tularosa Basin but without a change in selection regime for substrate matching).
CHARACTERIZING GENETIC AND PHENOTYPIC VARIATION
Markers were developed for 19, unlinked anonymous nuclear loci as detailed in Rosenblum et al. (2007). Briefly, we constructed genomic libraries for two S. undulatus individuals, sequenced approximately 200 random clones, and optimized PCR and sequencing primers for 19 anonymous nuclear loci (none exhibited significant similarity to existing sequences in GenBank). Primers, PCR chemistry, and locus-specific annealing temperatures are given for the 19 loci (sun_001 through sun_019) in Rosenblum et al. (2007). PCR products were sequenced directly using Big Dye 3.1 cycle sequencing chemistry and visualized on an ABI 3730 (Applied Biosystems, Foster City, CA). ABI's KB-basecalling software was used, but all sequences were checked by eye in Sequencher (ver. 4.2, Gene Codes Corporation, Ann Arbor, MI) to ensure that variable sites and heterozygotes were scored correctly. Sequence data were obtained in one direction and truncated at the first insertion/deletion polymorphism. No fixed heterozygotes were observed, so variation was not due to coamplification of duplicated regions. We resolved gametic phase computationally (PHASE, Stephens and Donnelly 2003), and found that results of analyses were robust to alternative phase calls at positions below the confidence probability threshold of 90%. A total of 191 variable sites was recorded in the 4732 basepairs sequenced from the 19 nuclear loci. Despite multiple attempts, not all 91 individuals were sequenced for all loci, so final sample sizes for the loci varied somewhat (mean 84, range: 61–91). We evaluated the minimum number of recombination events within loci (Hudson and Kaplan 1985) using DnaSP (ver. 4.00, Rozas et al. 2003) and detected recombination at 11 of 19 loci. Therefore our model-based analyses include recombination (see below). No significant linkage disequilibrium (LD) was observed among loci after correcting for multiple comparisons using Arlequin (ver. 3.0.1, Excoffier et al. 2005b), indicating that the 19 loci were effectively unlinked. In addition to the nuclear dataset, 812 basepairs of the mitochondrial ND4 gene and associated tRNAs were sequenced for all but two individuals for which nuclear data were collected. Primers modified from Arevalo et al. (1994) were used to amplify and sequence this locus (ND4: 5′-CAC CTA TGA CTA CCA AAA GCT CAT GTA GAA GC-3′ and LEU: 5′-CAT TAC TTT TAC TTG GAT TTG CAC CA-3′), and 64 variable sites were recorded. Sequence data have been deposited in GenBank (nuclear data accession numbers: EF411269-EF412962; mitochondrial data accession numbers: EU045255-EU045304, DQ114057-DQ114065, DQ114067-DQ114076, DQ114097-DQ114098, DQ114102-DQ114105, DQ114108-DQ114121).
We quantified dorsal coloration for the majority of lizards collected (22 individuals from white sand habitat, 24 individuals from black lava habitat, and 23 individuals from dark soil habitat). We published quantitative measures of color from a subset of the dark soil and white sand samples previously (Rosenblum 2006), and data collection from the expanded set presented here follows the identical protocol. Briefly, color recordings were taken with an Ocean Optics USB 2000 spectrometer with a dual deuterium/tungsten halogen light source. The probe was oriented at 45 degrees, 1 cm away from the dorsal body surface. Lizard dorsal body coloration was characterized by averaging three readings along the dorsal midline: between the front limbs, at the center of the body, and between the hind limbs. All color recordings were obtained at approximately 30°C, and lizards were held on an intermediate substrate prior to making color measurements. Additionally, spectrometric readings were taken of substrate samples from each habitat type. Points along the spectra were averaged 10-fold into 3 nanometer (nm) bins. Spectral curves were therefore described by approximately 200 variables. Readings from 300 to 700 nm, the spectral range visible to squamates and their avian predators (Bennett and Cuthill 1994; Ellingson et al. 1995; Fleishman et al 1997; Cuthill et al. 1999), were used for analysis.
Principal Components Analysis (PCA) was used to quantify variation in color of S. undulatus across habitats. Spectral data were analyzed with PCA and then principal component factor scores (FS1, FS2, FS3) were analyzed with a multivariate analysis of variance (MANOVA). All analyses were performed with individuals grouped by habitat (white sand, black lava, and dark soil). If a MANOVA was significant, univariate tests were performed for FS1, FS2, and FS3 to determine which aspect of color explained most of the observed differences among lizards. Empirical findings show that principal component 1 (PC1) corresponds to brightness (light transmission intensity) whereas PC2 and PC3 generally contain information about chroma (color purity) and hue (wavelength of maximum slope) (Grill and Rush 2000). If an ANOVA on FS1, FS2, or FS3 was significant, post hoc Tukey HSD tests were used to determine which groups occupied significantly different regions of color space. In the dataset presented here, PC1 explained over 90% of the variance in dorsal coloration among habitats. We corroborated that PC1 scores were an accurate quantification of the brightness aspect of color by comparing results from PC1 with a more direct estimation of brightness: area under the spectral curve (AUC). Because these analyses returned nearly identical results, we refer to PC1 scores as measures of brightness throughout.
CORROBORATING THE ROLE OF SELECTION AND MIGRATION
To avoid making assumptions about the role of gene flow and natural selection in this system, we first conducted several analyses to better understand the partitioning of genetic and phenotypic variation across habitats. Unless otherwise specified, all sampled populations (n= 10) were included in analyses presented below.
We conducted an Analysis of Molecular Variance (AMOVA) and a test for Isolation by Distance (IBD) to determine whether genetic variation was partitioned by habitat (i.e., whether gene flow was restricted by habitat boundaries). Levels of population subdivision within species were computed using Tamura–Nei molecular distances (Tamura and Nei 1993) implemented in Arlequin (Schneider et al. 2000; Excoffier et al. 2005b). A hierarchical AMOVA was conducted with populations nested within the three habitat types. Confidence intervals for the global ΦST from AMOVA were estimated by permuting haplotypes among populations and among habitat groups. Confidence intervals for ΦSC were estimated by permuting haplotypes among populations within habitat groups, and those for ΦCT were estimated by permuting populations among habitat groups. Pairwise population comparisons were also conducted. For all ΦST analyses, 1000 permutations were performed to determine statistical significance. To determine whether genetic difference among populations could be explained solely by geography (i.e., IBD), a regression of FST/(1 −FST) against Log(geographic distance) was performed (Rousset 1997).
Following Rosenblum (2006) we used matrix correspondence tests (MCTs) (Smouse et al. 1986; Thorpe et al. 1996; Storz 2002) to ask whether phenotypic variation was better explained by selection (habitat variation) or drift and gene flow (neutral genetic variation). Three matrices were generated based on pairwise population comparisons. The first matrix described phenotypic variation in dorsal brightness and was generated using mean population PC1 scores for spectrophotometric data (absolute values of linear distances along PC1). The second matrix described variation in substrate color and was similarly based on absolute values of linear distances along PC1. The third matrix described neutral genetic variation among S. undulatus populations, and was comprised of pairwise estimates of linearized FST.
Both pairwise and partial MCTs were conducted. Pairwise MCTs were used to test for significant correspondence between pairwise combinations of the phenotypic, substrate, and genetic matrices. Partial MCTs were used to test for significant correspondence between the phenotypic matrix and the substrate matrix while controlling for neutral genetic divergence. This method is based on a partial regression (i.e., testing the correlation between two matrices while controlling for the effect of a third matrix) and effectively “removes” the component of population-level phenotypic divergence that would be expected due to observed levels of genetic subdivision. Because partial MCTs may be misleading when spatial autocorrelations of the dependent variables are important (Raufaste and Rousset 2001; Castellano and Balletto 2002; Rousset 2002)—a problem not encountered with pairwise MCTs—it is particularly informative to compare results of partial and pairwise tests. Statistical significance for all MCTs was assessed with permutation tests, and Bonferroni corrections were used to adjust significance levels for multiple comparisons conducted with pairwise MCTs.
RECONSTRUCTING THE DEMOGRAPHIC SIGNATURE OF COLONIZATION
We next conducted several analyses to determine whether we could detect population reductions in the novel habitats using multilocus sequence data. Two measures of nucleotide variability, π (Nei and Li 1979) and θ (Watterson 1975) were calculated using Arlequin (Schneider et al. 2000; Excoffier et al. 2005b). Nucleotide diversity, π, is based on the average number of nucleotide differences between two sequences randomly drawn from a sample, and θ is based on the proportion of segregating sites in a sample. Number of polymorphic sites, π, and θ were calculated for the entire pooled sample and also separately for all dark soil, Tularosa Basin dark soil, white sand, and black lava samples.
We also used an approximate Bayesian computation (ABC) framework (Beaumont et al. 2002; Hickerson et al. 2006) to infer the size of populations inhabiting the novel habitats (i.e., white sand and black lava) relative to their putative parental populations (Tularosa Basin dark soil) and to estimate the magnitude of transient population reductions associated with initial colonization of the novel habitats. Genetic divergence among collecting localities within each novel habitat was relatively low (i.e., white sands ΦST= 0.04 and black lava ΦST= 0.03) and our desired scale of inference was at the habitat level; therefore we pooled collecting localities within habitats. The ABC approach uses a customized model of population splitting to incorporate complexity associated with the characteristics of the molecular markers and the population history. The model also uses prior geological knowledge about the time frame of the colonizations. Importantly, the uniform priors of the ABC model allowed for realistic levels of recombination within loci as well as migration between source and colonized populations subsequent to colonization. Inferring specific values for migration and recombination was not our objective; rather these variables were included in the model to support our goal of obtaining accurate estimates of effective population sizes.
The ABC model consisted of a source population colonizing a novel population τ generations in the past with subsequent migration (M) in both directions (Fig. 2). All prior distributions were uniform and are listed in Table 1. For the nuclear dataset, we allowed μ (the per gene per locus mutation rate) to vary across the 19 loci by drawing θ from a prior distribution (where θ= 4 Ndμ and Nd is the diploid effective population size of the present-day source population). For the haploid mitochondrial dataset, θ= was 2Nμ (N is female effective population size). Effective population sizes were free to vary independently among novel population (Nn), ancestral source population (Na), and population during the bottleneck-colonization phase (Nb). The size of Nd (the source population) remained constant subsequent to colonization and was drawn from the uniform prior (0.0, θmax/4μ); Nn, Nb, and Na are given as sizes relative to Nd. The current effective population size in the novel habitat (Nn) exponentially grew from the bottlenecked effective population of size Nb at approximately 250 generations subsequent to the colonization time, τ. Exponential growth models are most commonly characterized with the coalescent [e.g., LAMARC (Kuhner 2006); customized ABC models (Ramakrishnan et al. 2005)]. Alternative growth models (e.g., instantaneous, logistic) do have subtle effects on the coalescent but may only be distinguished by very large samples (Polanski et al. 1998). If the true growth model for our sampled populations of S. undulatus was not exponential, Nb may be overestimated. However our inferences from traditional population genetics (π, θ), ABC, and IM analyses are consistent, suggesting that our conclusions are robust to particular model characteristics. Additionally, our study employs a large number of independent loci, thus avoiding the upward bias in estimates of growth rate that has been reported when only few loci are used (Kuhner et al. 1998).
Table 1. Parameters and their prior distributions for ABC analyses.
Fixed (2000–10,000 ybp)
Start of population growth from bottleneck
Fixed (250 years after τ)
(θ)i, i = 1,…,Y
Within locus population diversity parameter where θ= 4Nμ (N is the present effective population size and μ is the per gene per generation mutation rate)
Uniform (0.01 θmax)
Average across locus population mutation parameter
Calculated from (θ)i, i = 1,…,Y
The present population size within the novel habitat (relative to the present size of the source population size, Nd)
Uniform (0.0, 1.0)
The present population size within the source population
The ancestral population size of the source population (relative to the present size of the source population size, Nd)
Uniform (0.0, 2.0)
The relative population size within the novel environment at τ the time of colonization (relative to the present size of the source population size, Nd)
Uniform (0.0, Nn)
Migration between novel and source populations; M= 2Nm (m is the per generation probability of migration)
Uniform (0.0, 20.0)
Intragenic recombination rate; 4Nr (r is the crossover rate per base pair per generation)
Uniform (0.0, 200.0)
To constrain the parameter space in our ABC analyses, we fixed τ according to geological knowledge. Specifically we ran models with minimum and maximum estimates τ of 5000 and 10,000 ybp for white sands (Langford et al., unpubl. ms.; S. G. Fryberger, unpubl. data) and 2000 and 5000 ybp for the Carrizozo lava flow (Zimbelman and Johnston 2001). Although our primary focus was to infer population sizes in the novel habitats, we also ran a model for colonization of the Tularosa Basin following the drying of Lake Otero with minimum and maximum estimates τ of 10,000 and 20,000 ybp. Although population splitting times were narrowly constrained, we did not assume knowledge about migration or recombination, and accordingly incorporated uncertainty by drawing parameter values from uniform priors (Table 1). Although there are programs that coestimate migration and population splitting (such as IM; Hey and Nielsen 2004) or coestimate migration and recombination (LAMARC; Kuhner 2006), none incorporate all three processes at once. Further, these programs were unable to converge in a reasonable amount of time (several weeks) with this large multilocus dataset.
Under the general ABC framework, data are generated from a model determined by the parameter set Φ. These have a prior distribution P(Φ), and the data are summarized in a summary statistic vector D. The posterior distribution is then f(Φ | D.) ∝P(Φ)P(D| Φ). (Gelman et al. 2004), which is the conditional density that can be calculated by first estimating the density P(D, Φ) and dividing by an estimate of the marginal density P(D) given D= (D*). Our method for generating random observations from the posterior f(Φi|Di) uses a rejection/acceptance algorithm (Fu and Li 1997; Weiss and von Haeseler 1998; Plagnol and Tavare 2002) followed by a weighted local rejection step (Beaumont et al. 2002; Tallmon et al. 2004; Excoffier et al. 2005a). This is based on the idea that the parameter sets for which ∥Di–D*∥ is small comprise an approximate posterior random sample.
The vector D is made up of a two-dimensional array in which the seven columns correspond to seven classes of summary statistics and the number of rows corresponds to the number of loci (Y= 19). We use these seven classes of summary statistics and collect these from each locus such that the summary statistic vector is
and would include 7Y summary statistics. These include the total number of segregating sites of a locus normalized for sample size (Watterson 1975) and the corresponding statistic for the colonized novel population (θW)n. We also use πnet, the net pairwise nucleotide divergence between the source population and the novel population samples as well as π and πn, the average number of pairwise differences across the entire sample (π) and within the sample collected from the novel habitat (Tajima 1983; Takahata and Nei 1985). Additionally we use the denominator of Tajima's D (var(π–θW); Tajima 1989) from the entire sample as well as this summary statistic calculated from the colonized novel population (var(π–θW)n) samples. In sum these summary statistics make up the components of Tajima's D across the sample and additionally within the novel habitat. In calculating the vector D, we order rows 1 through Y within each column by the ascending values of Tajima's D collected from the novel population in each locus (Dn). The mitochondrial data-based ABC estimates were obtained separately using a single row version (Y= 1) of this summary statistic vector.
We were primarily concerned with using ABC to estimate the size of the novel population both at the current time (Nn) and at the time of colonization (Nb). For these two parameters we report the posterior densities, their means as well as their modes. To compare models of demographic history we use Bayes factors B(λ1,λ2) that quantify the amount of posterior support in favor of at least a 10-fold reduction (Nb < 0.1 (Mλ1) versus Nb > 0.1 (Mλ2). In this case, the Bayes factor comparing these models is
for population reductions. We do not report posterior densities for other parameters (e.g., migration) because the summary statistics chosen to comprise vector D were tailored to our objective of obtaining accurate estimate of effective population size.
To estimate Nn, Na, and Nb, we generated K= 500,000 simulated datasets under a standard coalescent model using random draws from the prior distribution f(Φ) for parameters to simulate the data. The proportion of K values accepted to sample the joint posterior distribution f(Φ|D) was 0.002. Although we considered other proportions (see below), simulations showed that an acceptance rate of 0.002 yielded the most accurate estimates. Three C programs (msprior, msDQH, and msstatsvector) were glued together by a Perl shell (msBayes) to: (1) sample from the prior f(Φ); (2) generate the finite sites DNA sequence data under the coalescent model given sample sizes identical to the observed sample; and (3) calculate the summary statistic vector D. The parameter estimates and corresponding joint posterior surfaces for Nn, Nd, and Nb were calculated using the density estimation method implemented in the R statistical package (Loader 1996; R Development Core Team 2004) using scripts kindly provided by M. Beaumont.
Although the number of summary statistics that we use in D could hinder obtaining accurate ABC estimates from K= 500,000 draws from the prior, we determined the ABC conditions from which to obtain reliable estimates from this summary statistic vector D by way of simulations. To this end we obtained ABC estimates on 1000 pseudo-observed (simulated) datasets using: (1) different numbers of accepted draws from the prior (1000 vs. 5000); (2) two different transformations of the accepted draws from the prior (simple rejection sampling vs. local regression); and (3) two different summaries of the posterior (mean vs. mode). The 1000 pseudo-observed datasets were simulated by randomly drawing from the prior and the 1000 estimates were repeated under the different ABC conditions above. In all cases, the same K= 500,000 draws from the prior were used for every set of 1000 ABC estimates using sample sizes identical to the black lava/Tularosa Basin dark soil comparison. To evaluate estimates under these various conditions we plotted the estimates with their true values.
We were able to compare ABC results to IM results for the nonrecombining mitochondrial case to demonstrate that our conclusions regarding effective population size were robust to analysis method. We ran IM with 10 MCMC coupled chains and a burn-in time of 500,000 steps. We adjusted heating values so that the initial update rates were greater than 40% and swap rates between adjacent chains were at least 50%. Migration was constrained to be symmetric with a maximum value of 20. Divergence time was constrained to be 0.2 (in units of the per gene per generation mutation rate). IM default settings were used for all other parameter priors.
CORROBORATING THE ROLE OF SELECTION AND MIGRATION
As expected, the color of both lizards and substrates varied dramatically among habitats. The MANOVA based on PCA factor scores was highly significant for substrate color [F6,6= 59.60; P < 0.001] and for lizard color [F6,128= 59.28; P < 0.001]. Univariate ANOVA and post hoc Tukey tests for each principal component showed that both substrates and lizards from different habitats occupied different regions of color space, with lizard color variation corresponding to substrate color variation. Significant differences were observed among all habitats along PC1 for substrates [F2,5= 85.48; P= 0.001] and for lizards [F2,66= 172.29; P < 0.001], and the vast majority of variation in substrate and lizard coloration was explained by PC1, the brightness aspect of color (92% and 98%, respectively). Significant differences among all habitats were observed along PC2 for substrate [F2,5= 53.71; P <0.001], but this axis explained only 2% of observed variation. Significant differences along PC3 were observed for lizard coloration [F2,66= 24.82; P < 0.001], with dark soil animals appearing unique, but again, this axis explained only 1% of observed variation. No significant differences were observed along PC3 for substrates or PC2 for lizards.
In contrast to patterns of phenotypic variation, AMOVA and IBD results indicated that there was no strong substructuring of genetic variation based on habitat, providing evidence for ongoing gene flow among populations. As expected given more rapid coalescence for markers with smaller effective population size, mitochondrial data suggested higher overall levels of population structure than nuclear data, but the hierarchical distributions of genetic variation were nearly identical for nuclear and mitochondrial markers. For the nuclear dataset ΦST= 0.12 (P < 0.001), ΦSC= 0.09 (75% of ΦST, P < 0.001), and ΦCT= 0.03 (25% of ΦST, P < 0.005). For the mitochondrial dataset ΦST= 0.51 (P < 0.001), ΦSC= 0.46 (90% of ΦST, P < 0.001), and ΦCT= 0.10 (20% of ΦST, P > 0.05). Global estimates of ΦST and ΦSC for both mitochondrial and nuclear datasets indicated that individuals in different habitats did not exhibit higher levels of differentiation than individuals within habitats. Further, the relative effect of permuting populations among habitat groups (ΦCT), the permutation of greatest interest, was the smallest for both datasets (and statistically undifferentiated from zero in the mitochondrial dataset). AMOVA therefore indicated that most genetic variation was found within and among populations rather than among different habitat groups. IBD analyses showed that there was a statistically significant relationship between FST[FST/(1 −FST)] and geography [Log(geographic distance)] for both nuclear and mitochondrial data [nuclear data: r2= 0.33, P <0.001; mtDNA data: r2= 0.16, P <0.005)]. Overall, IBD analyses are consistent with a model of gene flow in which migrants are exchanged among geographically proximate populations, regardless of habitat characteristics.
Previous studies have shown that color variation is not due to phenotypic plasticity (Rosenblum 2005). To determine whether patterns of phenotypic divergence among habitats are due to drift, or whether divergent selection should be invoked, we used MCTs (MCTs, Table 2). As predicted, the pairwise comparison between the phenotype and habitat matrices was highly significant. Pairwise comparisons between phenotype and genetic matrices and between habitat and genetic matrices were not significant for the mitochondrial dataset, indicating again that mitochondrial variation was not well correlated with habitat boundaries. For nuclear data, the pairwise comparisons between habitat and genetic matrices and between phenotype and genetic matrices were statistically significant. However, the salient result is that phenotypic variation (lizard color) was significantly correlated with habitat variation (substrate color) even when controlling for the effects of genetic variation (pairwise population FST) for both nuclear and mitochondrial datasets. In other words, phenotypic variation was better explained by environmental variation than by neutral genetic variation, exactly as predicted with strong, recent selection.
Table 2. Results from pairwise and partial Matrix Correspondence Tests for one mitochondrial locus (mtDNA) and 19 nuclear loci (nucDNA). For each test, the correlation coefficient (r) and P-value (P) are given. Asterisks (*) indicate statistically significant results following Bonferroni correction for multiple comparisons.
Matrix Correspondence Test
Partial: phenotype and habitat given genotype
Pairwise: phenotype and habitat
Pairwise: phenotype and genotype
Pairwise: habitat and genotype
RECONSTRUCTING THE DEMOGRAPHIC SIGNATURE OF COLONIZATION
Summary metrics of nuclear and mitochondrial diversity are presented in Table 3 and provide complementary perspectives on the demographic history of populations inhabiting novel environments in the Tularosa Basin. For multilocus nuclear data, levels of nucleotide diversity and nucleotide polymorphism were surprisingly similar among habitats, and standard deviations of π and θ for all three habitat types were overlapping (Table 3, Fig. 3). A stronger signal of reduced diversity in novel habitats was recovered with mitochondrial DNA. Populations in white sand and black lava habitats did exhibit significant reductions in both nucleotide diversity and nucleotide polymorphism at ND4 relative to those in the Tularosa Basin dark soil and the combined dark soil samples (Table 3, Fig. 3). This is particularly informative for Watterson's θ, which is normalized for differences in sample sizes among habitats. The mitochondrial data also revealed lower values of π and θ for black lava populations than white sand populations, a pattern not seen with nuclear markers.
Table 3. Summary statistics for the 19 nuclear loci (nucDNA) and the single mitochondrial locus (mtDNA). Number of individuals (No. Inds), number of variable sites (No. Var Sites), π (Tamura and Nei) and Watterson's θ are given for all combined samples, all combined dark soil samples (populations G-J in Fig. 1), non-Tularosa Basin dark soil subsample (populations I, J), Tularosa Basin dark soil subsample (populations G, H), white sand samples (populations A-C), and black lava samples (populations D-F).
mtDNA No. Inds
mtDNA No. Var. Sites
nucDNA No. Inds
nucDNA No. Var Sites
Dark Soil Combined
Non-Tularosa Basin Dark Soil
Tularosa Basin Dark Soil
The ABC simulation analysis of performance suggested that reliable estimates can be obtained with 1000 accepted draws from the prior (0.2% of K= 500,000 simulated draws from the prior); using the local regression algorithm, these estimates are moderately close to their true value (Fig. 4A). Only results from the local regression algorithm are presented because the simple rejection sampling method did not perform as well. Although the correlation between true values and corresponding estimates was more consistent with estimates based on the means, there was a notable upward bias if the true values were < 0.2 and downward bias if true values were > 0.2 (Fig. 4B). In contrast, the estimates based on the modes show the opposite pattern of bias (Fig. 4A). We choose to report both types of posterior summaries for the empirical data. We also explored the utility of using of means and variances (across loci) of our seven summary statistics instead of the vector D, but these simulation results showed that this summary of the data yields a much less reliable ABC estimator (not shown).
Although simple summary statistics only revealed strong evidence for reduced diversity in the novel habitats when calculated with mitochondrial data, the ABC analyses based on a wider array of summary statistics recovered signatures of small population size during the colonization phase for both novel habitats. This result was evident for both nuclear and mitochondrial datasets, although the strength of inference was more pronounced with the mtDNA data (Table 4, Fig. 5). A genetic signature of reduced effective population size was recovered for both white sand and black lava habitats. In each novel habitat, both founding effective population size (Nb) and current effective population size (Nn) were inferred to be small (relative to Nd), suggesting strong initial bottlenecks with partial but incomplete recovery. Again, we note that these inferences on population size allow for effects of post-colonization migration between populations in each novel habitat and the surrounding dark soil environment. Results from each novel environment are detailed below and presented in Table 4 and Figure 5.
Table 4. Point estimates and Bayes factors (BF) from ABC analyses. Results are given for two estimations of effective population size relative to the current population size in the dark soil habitat (Nd): current effective population size of the novel habitat (Nn) and historical effective population size of the novel habitat near the time of colonization (Nb). For each model, BFs are used to test the hypotheses of extreme (> 1/10) population reductions. Finally, results are given for two geologically plausible population splitting times in each habitat (i.e., minimum and maximum ages of the formations): 5000 versus 10,000 ybp for the white sands formation and 2000 versus 5000 ybp for the black lava formation.
White sand mtDNA (5000 ybp)
White sand mtDNA (10,000 ybp)
White sand nucDNA (5000 ybp)
White sand nucDNA (10,000 ybp)
Black lava mtDNA (2000 ybp)
Black lava mtDNA (5000 ybp)
Black lava nucDNA (2000 ybp)
Black lava nucDNA (5000 ybp)
BF (10-fold reduction)
BF (10-fold reduction)
In the white sand habitat, current effective population size (Nn) was estimated from nuclear data to be approximately one-half the effective size of the current Tularosa Basin dark soil population (Nd), and was inferred with mitochondrial data to be between one-tenth and one-third of Nd. The differences in magnitude between nuclear and mtDNA estimates of Nn were corroborated by the higher Bayes factors given the mtDNA data (BF = 2.72–3.68 for mtDNA whereas BF = 0.36–0.72 for nucDNA). The stronger inference given mtDNA data was also found when estimating Nb. Mitochondrial data suggested a dramatic population reduction (with point estimates between 0.01 and 0.19) associated with colonization, and this inference was strongly supported by Bayes factors (BF = 7.6–12.1). Conversely nuclear data provided weaker evidence for small historical effective population sizes in this habitat (BF = 0.02–2.88) particularly for the model employing the maximal age of white sand formation (10,000 ybp). Thus, there was evidence for an overall reduction in the size of the blanched population on white sands relative to the surrounding source population, but the signature of the initial bottleneck was less pronounced in the nuclear data.
In the black lava habitat, current effective population size (Nn) was estimated from nuclear data to be approximately one-half the size of the current Tularosa Basin dark soil population (Nd), and was inferred with mitochondrial data to be between one-tenth and one-half of Nd. Similar to patterns at white sands, Bayes factors provided only weak support for 10-fold reduced current effective population sizes (BF = 1.27–2.67 for mtDNA and BF = 0.30–1.24 for nucDNA). However, both nuclear and mitochondrial sequences provided strong support for dramatic historical reduction in effective population size (Nb) in the black lava habitat. Point estimates for Nb were small (0.09–0.22 for nuclear loci; 0.00–0.17 for mtDNA), and Bayes factors provided strong support for the hypothesis of a bottleneck associated with colonization (BF = 5.86–21.40). In sum, for the melanic population on black lava, evidence points to a very small colonizing effective population size with moderate subsequent growth to a present size that is, at most, one-half the size of Nd.
Our analysis of the post-glacial colonization of Tularosa Basin dark soil habitat from adjacent areas suggested that effective population size reductions associated with colonization were less severe when colonists did not experience a change in selection regime. For both mitochondrial and nuclear data, current Tularosa Basin dark soil effective population size (Nn) was estimated to be between one-tenth and one-half the size of the current dark soil population outside of the basin (Nd), with only weak support for a 10-fold difference in current effective population size (BF = 1.77–3.44). Most importantly there was no consistent support for a bottleneck associated with colonizing the Tularosa Basin. Depending on the population splitting time used, point estimates for Nb ranged from 0.01 to 0.98 and Bayes factors from 0.02 to 5.50.
Finally comparison between ABC and IM results for mitochondrial data suggests that our conclusions are robust to analysis method. Comparable to our ABC inference that current effective population sizes in the novel habitats (Nn) are smaller than that of the Tularosa Basin dark soil population (Nd), theta in the novel habitats (“q1”) was inferred by IM to be smaller than theta in the dark soil population (“q2”). In fact IM point estimates for relative population sizes in the colonized habitats (“q1/q2”= 1/3 − 1/20) were quite similar to those obtained with ABC. Comparable to our ABC inference that historical effective population sizes (Nb) were small in the novel habitats, the proportion of the ancestral population that was inferred by IM to found the novel populations (“1 − s”) was extremely small (point estimates ranging from 0.005 to 0.0015). Finally, both ABC and IM results suggested greater reductions in population size associated with colonizing the novel habitats than with colonizing the Tularosa Basin itself.
Although genetic and historical data have been combined previously to infer colonization histories (e.g., Estoup et al. 2004; Chan et al. 2006), our study is novel in applying these methods to colonization of and adaptation to novel environments in the context of continuing gene flow from the source population. That natural selection has shaped among-habitat differences in color of S. undulatus in this system is evident from the results of MCTs, which showed that phenotypic variation was best explained by environmental variation. We substantiated the hypothesis that gene flow has been unhampered by habitat boundaries by the good fit of the neutral genetic data to an IBD model and by the relative lack of genetic differentiation among habitats recovered by an AMOVA. An indication that colonization of white sand and black lava habitats may have been associated with population reductions came from reduced diversity of mitochondrial DNA in novel environments, the black lava in particular. However, the strongest evidence for transient or sustained reductions in effective population size associated with the colonization of white sand and black lava habitats came from ABC analyses based on a model that was explicitly designed to incorporate biological and geological information about this system. The signatures of small effective population sizes in the novel habitats are particularly compelling given how difficult it is to detect historical reductions in population size in the presence of continuing migration (e.g., Bjorklund 2003).
Although signatures of population reductions were observed for both novel habitats in the Tularosa Basin, we also observed relatively high levels of genetic diversity. For example π estimates calculated from nuclear markers for both novel habitats and for the overall sample were all close to 0.006, nearly an order of magnitude higher than that observed in humans (π= 0.00075, The International SNP Map Working Group 2001). There are a number of mechanisms that could account for the relatively high levels of current nucleotide variability while still remaining consistent with our inference that populations in white sand and black lava environments passed through a period of small effective population size: for example, initial colonization by a small but genetically diverse group of founders, multiple colonization events, post-colonization population expansion, and ongoing gene flow. One mechanism directly implied by our data is population expansion. The ABC inference of population reductions in both novel habitats looking backward in time is equivalent to a population expansion looking forward in time. Population expansion associated with colonization of the novel Tularosa Basin habitats is consistent with the observation that directional selection coupled with an opportunity for population growth accompanies many examples of rapid evolution (Reznick and Ghalambor 2001). Although no direct test of the ecological community assemblage at the time of colonization is possible, a less-diverse lizard community is found on the white sand and black lava formations compared to the surrounding dark soil habitat (E. Rosenblum, pers. obs.; Dixon 1967). If S. undulatus was one of only few ecologically similar species able to establish in the novel environments for ecological or adaptive reasons, exploitation of resources in these habitats could have allowed for rapid population expansion following colonization. It may also be that migration rates are sufficient to retain genetic diversity and population viability, but not so large as to overwhelm local adaptation (Lenormand 2002; Holt et al. 2004). Uncertainty about migration was incorporated in the ABC model presented here but migration rates were not explicitly estimated; the balance between selection and gene flow in this system is the subject of future work with different analytical methods (e.g., cline analysis).
The broad-brush demographic and adaptive patterns observed in the white sand and black lava habitats are remarkably similar, suggesting that the signature of colonization is somewhat predictable even in populations with complex histories. In both habitats dorsal coloration evolved to better match substrate coloration, in both habitats there is evidence for a period of small effective population size followed by moderate recovery, in both habitats mitochondrial markers recorded more extreme reductions in genetic diversity than did nuclear markers, and in both habitats high levels of nuclear diversity were largely maintained during the colonization process or recovered by subsequent immigration. However, we also observed some important differences in demographic history between habitats. Most notably, the signature from the nuclear data of a population bottleneck near the time of colonization was stronger for the black lava population than for the white sand population. Although evidence of reduced effective population size associated with colonization was recovered with mitochondrial data for the white sand population, nuclear data did not provide consistent support for a dramatic bottleneck in this habitat. Additionally estimates of θ and π based on mitochondrial data were significantly smaller for the black lava population compared to the white sand population. Data therefore suggest that a more dramatic population reduction, possibly followed by a more rapid population recovery, was associated with colonization or adaptation to the black lava habitat.
There are several possible explanations for why populations colonizing the black lava environment may have suffered a greater initial reduction in population size, more rapid subsequent population growth, or why we have a clearer ability to detect effective population size changes in this habitat. First, the greater potential age of the white sands system may mean that earlier demographic events have been obscured by subsequent migration from dark soil populations or mutation accumulation. Second, although the sizes of the white sand and black lava formations are comparable, the lava formation is fairly linear whereas the sand formation is more circular. Theoretical studies have demonstrated that genetic diversity and effective population size can be correlated with habitat geometry (Wilkins 2004) and proximity to habitat edge (Wilkins and Wakeley 2002). For example, the larger edge to area ratio of the black lava habitat could have resulted in higher post-colonization immigration rates, facilitating population recovery after an initial bottleneck (Holt et al. 2004; Alleaume-Benharira et al. 2006). Third, there may have been differences in the strength of natural selection between the black lava and the white sand environments that could explain the more dramatic historical bottleneck in the black lava habitat. For example, the geometry of the black lava formation provides more vegetated edge habitat for avian predators and could have lead to higher predator-induced mortality rates. Finally, there may have been differences in the genetic architecture of color traits in the black lava and the white sand environments. For example, levels of standing genetic variation for color traits, the number of genes contributing to blanched versus melanic coloration, the tempo and effect size of substitutions at these genes, and allelic dominance patterns all could influence the speed with which populations approached local dorsal color optima. Tests for association between novel phenotypes and a candidate locus, Mc1r, revealed a significant, but incomplete association between Mc1r genotype and the blanched phenotype but not the melanic phenotype, suggesting that the genetic basis of the two novel phenotypes may be different (Rosenblum et al. 2004; E. B. Rosenblum, unpubl. data).
Our study addresses the question of whether effective population sizes are reduced during colonization of, and adaptation to, novel environments. Half a century ago, Haldane formalized the idea that natural selection may extract a demographic “price” from a population experiencing a change in selection regime (Haldane 1957; see also Pease et al. 1989; Burger and Lynch 1995; Lande and Shannon 1996). Simply put, the fixation of alleles conferring fitness advantages in novel environments is not instantaneous, and the mortality associated with phenotypes that are poorly adapted can be high (depending on the genetic basis of the selected trait and whether fitness is density dependent). Thus, even populations that persist through environmental change likely experience a period of negative growth and may retain a demographic signature of natural selection. Similarly, gene flow from adjacent sources may retard local adaptation and lengthen the period a colonizing population is suffering the high demographic costs of maladaptation (Lenormand 2002). Here we have demonstrated evidence for reductions in two S. undulatus populations that have been subject to strong natural selection and ongoing gene flow. Of particular interest are the small effective population sizes observed in the novel habitats relative to that in the dark soil population, which colonized the Tularosa Basin in the last 12,000 years but did not experience a change in selection regime. Additionally, results suggest that colonization coupled with selection (i.e., dark soil ancestors moving into white sand and black lava habitats) led to more dramatic population reductions than colonization alone (i.e., dark soil ancestors moving into newly available dark soil habitat). Empirical analysis of changes in effective population size in other populations will begin to shed light on how selection, gene flow, and population demography interact during the establishment and persistence of populations in novel environments.
Associate Editor: A. Mooers
We thank M. Slatkin, N. Belfiore, A. Estoup, and J. Novembre for helpful discussion on SNP data collection and analysis. We thank M. Kiparsky, C. Colvin, and D. Betz for assistance in the field. Funding for this work was provided by a National Science Foundation Doctoral Dissertation Improvement Grant (to EBR, #DEB-0309327), the Museum of Vertebrate Zoology (to MJH), and income from the Walter and Virginia Gill Chair (to CM).