Supported by NSERC Canada and the Canadian Foundation for Innovation.
John R. Stinchcombe, Department of Ecology and Evolutionary Biology & Centre for the Analysis of Genome Evolution & Function, University of Toronto, Toronto, ON M5S3B2, Canada. Tel: 416-978-8514; Fax: 416-978-5878; E-mail: firstname.lastname@example.org
Introduced species frequently show geographic differentiation, and when differentiation mirrors the ancestral range, it is often taken as evidence of adaptive evolution. The mouse-ear cress (Arabidopsis thaliana) was introduced to North America from Eurasia 150–200 years ago, providing an opportunity to study parallel adaptation in a genetic model organism. Here, we test for clinal variation in flowering time using 199 North American (NA) accessions of A. thaliana, and evaluate the contributions of major flowering time genes FRI, FLC, and PHYC as well as potential ecological mechanisms underlying differentiation. We find evidence for substantial within population genetic variation in quantitative traits and flowering time, and putatively adaptive longitudinal differentiation, despite low levels of variation at FRI, FLC, and PHYC and genome-wide reductions in population structure relative to Eurasian (EA) samples. The observed longitudinal cline in flowering time in North America is parallel to an EA cline, robust to the effects of population structure, and associated with geographic variation in winter precipitation and temperature. We detected major effects of FRI on quantitative traits associated with reproductive fitness, although the haplotype associated with higher fitness remains rare in North America. Collectively, our results suggest the evolution of parallel flowering time clines through novel genetic mechanisms.
The relative role of contingent versus deterministic forces continues to be an unresolved issue in evolutionary biology (Gould 1985, Travisano et al. 1995; Losos et al. 1998; Fenster and Galloway 2000; Simões et al. 2008). Echoes of this debate can be found in diverse research areas, including (but not limited to) the role of drift and selection shaping patterns of nucleotide polymorphism (Kimura 1983; Gillespie 1991), adaptive versus nonadaptive explanations for organismal features (Gould and Lewontin 1979; Mayr 1983), and mechanisms producing geographic clines within species (Endler 1977; Vasemägi 2006). Evaluating prevalence of convergent evolution (sensu Arendt and Reznick 2008) provides a clear approach for distinguishing these possibilities: it is unlikely that the same, seemingly adaptive features of organisms will evolve more than once by stochastic forces alone. Moreover, the expanding set of tools for investigating the molecular genetic basis to traits in model and nonmodel organisms offers the possibility of examining whether the same or independent genetic, developmental, and physiological mechanisms were used to produce similar phenotypes (Arendt and Reznick 2008). Here, we examine whether introduced populations of Arabidopsis thaliana have evolved geographic clines in flowering time and other life-history traits that are similar to those exhibited in the native range (Caicedo et al. 2004; Stinchcombe et al. 2004; Samis et al. 2008) and whether the same genes are associated with natural variation in flowering time.
Testing for parallel1 clines in phenotypic traits provides a straightforward way to examine the relative role of adaptive versus nonadaptive mechanisms in producing clinal variation. While systematic associations between phenotypes and environmental variables or geographic proxies (e.g., latitude, longitude, altitude) are traditionally interpreted as evidence for adaptive evolution, they can also be generated by nonadaptive processes (Endler 1977)—especially for traits with a relatively simple genetic basis (Endler 1977; Vasemägi 2006). Consequently, the occurrence of independent clines is strong evidence of adaptive mechanisms, particularly in traits with a complex, quantitative genetic basis. For example, if putatively adaptive phenotypes change in the same direction with increasing latitude and altitude, this suggests adaptation to climatic gradients that are either similar or favor similar phenotypes (see, e.g., Stinchcombe et al. 2004; Montesinos-Navaro et al. 2011).
Table 1. Number and frequency of haplotypes at three flowering time loci genotyped in North American (NA) and Eurasian (EA) lines of Arabidopsis thaliana. The frequency of different haplotypes was compared with a χ2 test, first using the entire sample, and second, by randomly sampling one accession per latitude and longitude in each continent to equalize sampling regimes. For the latter, the 2.5th and 97.5th percentile of the χ2 distribution are shown from 1000 random samples, with the P-value for the 97.5th percentile of the χ2 distribution.
Specifically, we use North American (NA) populations of A. thaliana to ask: (1) What is the pattern of population genetic structure in NA Arabidopsis populations? We use these data to establish a null model as a point of comparison for phenotypic traits. (2) What are the allele frequencies at the flowering time loci FRI, FLC, and PHYC, and are they different from the ancestral range? (3) Is there evidence for the evolution of parallel phenotypic clines in the introduced range, and if so, what are the likely ecological mechanisms? (4) What is the relative contribution of population structure, candidate genes (FRI, FLC, and PHYC), and climatic variation to the pattern of quantitative genetic differentiation in the introduced range?
Material and Methods
We used natural accessions of A. thaliana from the introduced NA range that we obtained from the Arabidopsis stock center (Arabidopsis Biological Resource Center [ABRC]; N= 92 lines, 16 populations) and recent collections from natural populations (N= 108 lines, 20 populations), with one population including lines from ABRC and recent collections (Fig. 1; Table S1). Geographic sampling was opportunistic, but spanned 10° latitude and 16° longitude. All new lines described here will be deposited with the ABRC, or available directly from the Stinchcombe lab. For comparison, we also obtained a sample of native range, Eurasian (EA) lines (N= 20 lines, 19 localities) from ABRC. We used these parental plants for DNA analysis, and to generate seeds for common garden experiments. After seven days wet stratification at 4°C to synchronize germination, we germinated and grew seeds from all collections in a growth chamber at the University of Toronto for one generation under the following standard conditions (Weigel and Glazebrook 2002): 23°C for 24 h with 19 h light at 200 µmol/m2/sec and 5 h dark, and without vernalization. All lines flowered and produced seed without vernalization, although a subset (∼6%) produced insufficient seeds in time to be included in the common garden.
We collected leaf material from parental lines and extracted DNA using the Qiagen DNeasy plant kit (Qiagen Inc., Toronto, Canada). We screened each line for variation at three functional loci, FRI, FLC, and PHYC. We designed primers to flank previously reported polymorphic regions in each locus (Table S2), and screened lines using PCR amplification (with Fermentas MBI reagents; Fermentas Canada Inc., Burlington, Canada). All loci were amplified using previously published, PCR-based genotyping protocols (FRI: Stinchcombe et al. 2004; FLC: Caicedo et al. 2004; PHYC: Samis et al. 2008). We sequenced a subset of amplification products from each locus (six to 15 lines per locus, 17 lines for more than 1 locus and 40 individuals overall) to verify sequence homology with published sequences from the native range. Because we focused on previously characterized variation at these loci, and no additional variants were detected in the sequenced subset, it should be noted with caution that other undetected variants may occur within these loci. All EA lines we used were FRI–wt (Table S1).
To evaluate genome-wide levels of polymorphism, we genotyped 188 NA lines (Table S1) using 149 SNP loci (Platt et al. 2010); these SNPs were developed for quantifying population diversification in A. thaliana (Platt et al. 2010). We cleaned the SNP data to remove poor quality loci (i.e., loci with ≥25% low-quality calls across 188 lines/locus) and lines (≥50% low-quality or missing calls of 149 loci/line). The final dataset included 136 SNP loci across 179 individuals. Additional SNP genotypes for EA lines were obtained from the dataset of Platt et al. (2010).
We planted F1 seed from 35 NA populations (median = 2 lines per population), and 19 EA populations (1 line per population for all except 1) into plug trays (1.9 × 4.4 cm cells) with germination mix. We stratified seeds at 5°C for seven days in the dark, and then allowed them to germinate in the University of Toronto glasshouse (at approximately 22°C). Prior to transplanting, we exposed seedlings to two days of glasshouse growth at 15°C, and a day of natural conditions beside the gardens, to initiate hardening. We next transplanted 1095 seedlings with their soil plugs at 10 cm intervals into raised beds (2 m wide × 3 m long × 30 cm deep, and filled with Premier Promix PGX; Premier Tech Horticulture, Rivière-du-Loup, Canada) on the roof of the EEB Department (Earth Sciences Centre) at the University of Toronto on 2 October and 3 October 2008, when seeds naturally germinate in many NA populations. We used outdoor plantings to more realistically simulate natural cues and variability in light, water, snow, temperature, and other abiotic and biotic forces. We weeded soil plugs to one seedling (range 1–12 days old) per position prior to planting, and again as necessary, keeping the oldest seedling. Our experiment had five spatial blocks spread across two raised beds, with each block containing one replicate per line (219 plants = 199 NA, 20 EA). We covered the raised beds with shade cloth for the first seven nights, and watered as needed for one month to promote establishment.
We recorded survival three times before snow cover in late November, and opportunistically with snow melts throughout winter until the following April. Starting in spring, we tracked plants on a daily basis to determine bolting date (i.e., visible differentiation and elongation of the apical inflorescence), and first flower opening from April until August. At bolting, we measured rosette diameter and counted rosette leaves. At senescence, we harvested plants individually, dried them at 55°C, and then counted the total number of fruits per plant to estimate of plant fitness (Westerman and Lawrence 1970). The sample size of NA lines used in the common garden experiment (N= 199) is slightly reduced compared to the genotyping analysis (212), because not all genotyped plants produced sufficient seed in time for the start of the experiment.
Distribution and population structure
We used logistic regression to examine the geographic distribution of haplotypes at FRI, FLC, and PHYC, with latitude and longitude as predictors, where FRI was categorized as functional/null, FLC as A or B haplotype, and PHYC as Ler or Col haplotype. We used χ2 tests to compare the frequency of flowering time genotypes in North America to a similar-sized sample in the native range. To compare the allele frequency differences at flowering time loci to the “genome-wide” expectation, we also compared allele frequency changes at 136 SNPs. Because sampling regimes differed (with more within-population sampling in NA than EA), we randomly selected a single accession per geographic locality and ran the χ2 analyses again, equalizing within-population sampling per locality for both Europe and North America. We repeated random sampling and χ2 analysis 1000 times and present the range of χ2 values, and P-values from the 97.5th percentile of the χ2 statistic as an upper bound. EA flowering time genotypes were obtained from previous studies (Samis et al. 2008).
We evaluated the degree of admixture or differentiation among lines and regions for genome-wide polymorphisms using STRUCTURE (V2.2), a model-based clustering program (Pritchard et al. 2000). Although STRUCTURE is based on an evolutionary model, our goal was not to model the actual population history of the A. thaliana in North America based on its specifications, but instead to estimate aspects of population genetic structure that could lead to the spurious appearance of clines or gene–trait associations. It is important to note that alternative approaches, such as EIGENSTRAT, which is not based on an evolutionary model, typically recover similar levels population genetic structure, also with clear geographic interpretations (see, e.g., Patterson et al. 2006; Price et al. 2006; Zhao et al. 2007). We performed three separate analyses: (1) all NA lines, (2) NA and EA lines without any geographic information, and (3) with EA lines as a reference sample for NA lines. The first and second analyses grouped samples based on genotype information alone, while the third analysis forced introduced NA genotypes to cluster with inferred ancestral EA clusters. We interpreted each observed cluster (K) to represent ancestrally related genotypes where each individual is proportionally assigned to clusters with a predicted probability (ancestry coefficient). Assignment to more than one cluster suggests an admixed genome. Each algorithm ran with a burn-in of 10,000 Markov Chain Monte Carlo (MCMC) iterations, and parameters were estimated over 300,000 iterations for five replicates at K= 1–8 using correlated allele frequencies. For the third analysis, we used the PFROMPOPFLAGONLY option of STRUCTURE to ensure that ancestry coefficients for NA lines were estimated from the EA distribution (Murgia et al. 2006). We estimated the number of clusters, K, following Evanno et al. (2005). For clustering analyses, we included 179 NA lines from this study, and 228 EA lines from 139 populations genotyped at the same 136 SNP loci (from Platt et al. 2010). For all analyses, we defined populations by latitude and longitude. We analyzed our data as if A. thaliana were haploid, which is consistent with the largely selfing mating system of A. thaliana and its high homozygosity.
Morphological and phenological traits
We assessed survival as the proportion of five replicates per line surviving across two intervals: from two weeks after planting until the following April (winter survival), and until reproduction (survive to reproduce) using mixed-model analysis of variance (ANOVA) including continent (fixed) and population (random). Variation in phenotypes (days to bolting and flowering, rosette leaf number and rosette diameter at bolting, and lifetime fruit number) was assessed using nested, mixed effects ANOVAs. For initial models, we included region as a fixed effect (NA, EA), and the random effects of block, population nested within continent, and line nested within population within continent.
We subsequently performed separate analyses, by region, for traits that showed significant differences between EA and NA. For Europe, we tested for variation among lines and among blocks, omitting the population effect (only 1 locality in EA was represented by >1 line), while for NA samples we tested for variation among populations, lines within populations, and blocks. Significance of random effects was determined by calculating the difference between reduced and full model –2log-likelihood values, using a one-tailed χ2 test with 1 degrees of freedom (df). We evaluated covariation among phenotypes and fruit production using pairwise Pearson's correlations on inbred line means within each region separately; we compared correlation coefficients between regions with Z-tests.
Because none of the five replicates from one population (KYF) survived to reproduce, analyses of reproductive traits included 34 NA populations. We log10-transformed fruit number to reduce heterogeneity of variances. We analyzed data using SAS (V9.2) and JMP (V8.0.1, SAS Institute, Cary, NC, USA).
Geographic differentiation in flowering phenotypes and fitness
Based on previous observations that geographic clines in flowering time and correlated traits in the native range were associated with geographic variation in climate (Caicedo et al. 2004; Stinchcombe et al. 2004; Samis et al. 2008), we tested for signatures of geographic differentiation in our common garden experiment using three nested analysis of covariance (ANCOVA) models, analyzing NA and EA separately. In all analyses of geographic differentiation, we use inbred line mean phenotypes. (Using line means is akin to considering line a fixed effect to calculate an average, which avoids the attendant problems of analyzing BLUPs for random effects; see Hadfield et al. 2010.) First, we conducted a simple test by regressing traits on latitude and longitude (Model A; N= 196). Second, we tested for geographic clines after controlling for spurious associations due to population structure by including ancestry coefficients as a covariate in the first model (Model B; N= 176). We interpret geographic clines that remain significant with the inclusion of ancestry coefficients as robust to the effects of population structure—that is, they are less likely to be caused by spurious associations. Stochastic and demographic effects should be captured by the patterns inferred from a large number of presumably neutral SNPs (Keller and Taylor 2008; Keller et al. 2009). Ancestry coefficients, representing genome-wide patterns of differentiation inferred from the 136 SNPs, were derived from cluster analysis using only NA lines from this study (K= 2), and represent the proportional membership of each line to each inferred ancestral cluster. We transformed ancestry coefficients using log contrasts (log[p1/p2]; Samis et al. 2008). Third, we tested if geographic variation in plant traits above and beyond population structure was explained by the average climate at the site of origin (Model C, equation 1; N= 176):
Principal components (PCs) were used to summarize the climate at the site of origin, generated from independent analyses of three climatic variables previously used in studies of Arabidopsis climatic adaptation (Stinchcombe et al. 2004; Rutter and Fenster 2007; Baird et al. 2011): mean monthly temperature, mean total precipitation, and mean diurnal temperature range (year range 1961–1990 climate normals, 5′ intervals; Mitchell and Jones 2005) from October to April, that is, over the critical phases of germination, vernalization, and the initiation of flowering in A. thaliana. Climate data for each NA A. thaliana population were extracted from the CRU dataset in ArcGIS 9.3 (ESRI). PCs were generated from correlation matrices, and those with eigenvalues greater than one (explaining >15% of the variation in climate variables) were saved from each principal component analysis (PCA; Quinn and Keough 2002). The loss of a significant latitudinal or longitudinal effect in Model C suggests that variation in the focal climate variable (or a highly correlated factor) may drive geographic variation in the trait; retention of a significant cline suggests that the focal climate variable does not explain the cline. Supplemental models using either mean monthly climate variables, or absolute differences between monthly climate at the site of origin and the common garden (the calibrated common garden approach of Rutter and Fenster 2007) showed similar results, and as such we only present results based on PCs.
We tested for geographic clines among EA lines in the experiment using the reduced geographic model (Model A), as well as after accounting for population structure (Model B). A supplementary cluster analysis with EA lines using published SNP data (Platt et al. 2010) was conducted to generate EA ancestry coefficients (K= 5, data not shown).
Relative effects of geographic origin, climate, and flowering time loci for phenotypes
We used a modified association mapping approach to examine the relative influence of site of origin, climate, and flowering time genes on phenotypes. We analyzed inbred line means for flowering time phenotypes, fitness (N= 172), and survival (N= 173) using the following ANCOVA model (Model D, Equation 2):
where µ is the model intercept (fit by default), latitude and longitude are continuous covariates representing the site of origin, and FRI and FLC represent genotype classes (functional or null, and A or B, respectively) where the interaction term accounts for epistatic effects among alleles at each locus (Caicedo et al. 2004). Haplotypes at the PHYC locus did not contribute significantly to the model, possibly because of the exceptionally low number of lines with the rare allele, and was subsequently excluded. Ancestry coefficients were the same as those for geographic cline ANCOVAs. We did not assess genetic associations in EA lines because of the limited sample size, and because all EA lines were selected for being FRI–wt. Because of the imbalanced distribution of flowering time genotypes for FRI and FLC, we verified the statistical significance of the FRI, FLC, and FRI×FLC terms using permutation testing (see, e.g., Cassell 2002; Heath 2010); permutation-based P-values had identical patterns of significance and marginal significance, and as such we present traditional hypothesis tests based on F-tests.
We elected not to use mixed model approaches that take advantage of both population structure and relative kinship (e.g., TASSEL: Bradbury et al. 2007; EMMA: Kang et al. 2008), because analyses including geographic covariates (latitude and longitude) either would not converge (TASSEL) or because it is unclear how to include such covariates (EMMA). While our modified structured association mapping approach might have greater false positives (at the level of locus detection) due to population structure, it is the most straightforward method for testing our hypotheses about the relative influence of site of origin, climate, and flowering time genes on phenotypes. Past studies (Nordborg et al. 2005) have documented extensive haplotype sharing between pairs of NA A. thaliana accessions, suggesting that the resolution of our candidate gene mapping is likely to be reduced compared to EA samples.
Distribution and frequency of haplotypes
As suggested by previous investigations (see, e.g., Nordborg et al. 2005), our surveys of previously characterized allelic variation in NA lines revealed minimal variation in functional genes related to flowering time, with 85% of lines (180/212) exhibiting the common genotype, FRI–wt/FLC–A/PHYC–Ler. The frequencies of alternate genotypes were relatively low (<3.5% each), and no individuals exhibiting one genotype: FRI–wt/FLC–B/PHYC–Col (Table S3). Two alleles previously detected in EA samples of A. thaliana (Caicedo et al. 2004; Balasubramanian et al. 2006) were not detected in this NA sample: the 16-bp FRI deletion C and the Col-type FLC insertion. The Ler-type FLC insertion was detected in only three lines, and was not considered further. In contrast to the frequency of alleles in the native range, where FRI, FLC, and PHYC haplotypes occurred at frequencies close to 50%, one allele at each locus was significantly more rare in NA, occurring at frequencies of <10% (all P < 0.0001; Table 1). When we randomly sampled a single accession per locality, we still detected significant differences in haplotype frequencies between NA and EA, suggesting that differences in frequencies detected in the global sample are not due to sampling multiple accessions per locality in NA compared to Europe (Table 1). However, when compared to allele frequency changes at the 136 SNPs, the flowering time genes do not show greater differentiation than expected based on the “genome-wide” patterns of allele frequency changes (Fig. S3). Finally, while it is worth considering that our genotyping approach may have missed new variants either within or linked to the characterized loci, it is clear that variation characterized in EA samples differs in frequency from that occurring in North America.
The NA lines we sampled encompass a range of approximately 10° latitude and 16° longitude (Table S1). Within this range, PHYC haplotypes did not exhibit significant geographic variation in their distribution (Col 39.7 ± 0.65°N, –77.7 ± 1.49°W; Ler 39.6 ± 0.23°N, 78.5 ± 0.35°W; logistic regression, model P= 0.79). In contrast, both FRI and FLC genotype classes were differentiated geographically (FRI model P= 0.004, likelihood ratio χ2 tests, latitude χ2= 4.2, P= 0.041, longitude χ2= 3.2, P= 0.072; FLC model P < 0.0001, latitude χ2= 4.1, P= 0.043, longitude χ2= 12.0, P= 0.0005). Rarer genotype classes (FRI–null and FLC–B) exhibited significantly more northern and eastern distributions than the more common haplotypes at each locus, and in comparison to the mean of all lines (Fig. 2).
Analysis of NA SNP genotypes alone revealed that lines cluster into two ancestral groups with no clear geographic structure (Fig. 3). Some populations exhibited little admixture (e.g., South Carolina, Maryland, and New York), while others exhibited variable levels of admixture (e.g., Pennsylvania, Long Island, and mid-western states). Inclusion of presumably ancestral lines from EA increased the degree of clustering (K= 3 without geographic information supplied), and also suggested higher admixture in North America than within Eurasia, and relatively less population structure within the native range (Fig. S2). The low level of clusters detected is unlikely to be due to a general lack of variation at the SNPs, as they were not generally lacking variation (Fig. S3). One striking finding of the structure analysis is that EA samples showed little to no structure and belonged almost entirely to a single cluster (K2), while membership in the remaining clusters (K1, K3) was restricted to North America. Both the number of clusters (K= 5) and the degree of admixture increased when NA lines were forced to cluster with EA lines as a learning set, but differences between regions remained (Fig. S1). When forced to cluster with EA lines, the dominant cluster in North America (K5) is also the dominant cluster in admixed accessions from France, Sweden, and Switzerland. These results are qualitatively similar to those of Nordborg et al. (2005) and Ostrowski et al. (2006), who found that NA lines cluster with those from Western Europe.
Variation and covariation in traits
The majority of plants survived at least two weeks after transplanting (84%), and the majority of those plants survived until spring (728 out of 925 plants, 79%), and became reproductive (78%). Because there was little difference between winter survival and survival to reproduction, we focus on survival to reproduction. EA plants displayed higher overall survival than NA plants (mean proportion of replicates per line surviving ± SE: EA 0.79 ± 0.054; NA 0.65 ± 0.023; P= 0.017), although we note that these differences were based on five replicates per line. ithin regions, significant variation in survival was detected among populations in North America (P= 0.03). All plants bolted over a relatively short timeframe in early spring, and the majority of plants (n= 634) completed flowering within 39 days of first flower (mean ± SD 26.7 ± 3.3 days).
Despite high overlap between regions, EA plants flowered significantly earlier (lsmeans ± SE: days to bolting EA 196.3 ± 0.72, NA 197.7 ± 0.58; P= 0.01, days to flowering EA 207.0 ± 0.64, NA 208.0 ± 0.54; P= 0.019), and were more productive (log10 (fruit) back-transformed lsmeans ± SE range: EA 428.5 + 92.55 – 76.11, NA 331.2 + 63.77 – 53.47; P= 0.012) than plants from North America. The sample of EA lines showed statistically significant variation among lines for days to bolting and days to flowering (Table 2), despite the limited power due to low sample size. Plants in both regions bolted at similar rosette leaf numbers (EA 29.2 ± 1.34, NA 29.2 ± 0.95; P= 0.99) and rosette diameters (EA 3.3 ± 0.18, NA 3.1 ± 0.15; P= 0.18). For rosette diameter and leaf number, there was significant genetic variation among lines within populations (P < 0.0001). Within North America, significant variation among lines within populations was detected for all traits (Table 2B). Significant among population variation was present for days to bolting and fruit number, and was marginally significant for days to flowering (P= 0.061; Table 2). These data indicate substantial quantitative genetic variation exists in North America, despite reduced variation at FRI, FLC, and PHYC. Although the block effects are appreciable, the presence of significant genetic variation, in the form of lines within populations, indicates that testing for geographic trends in inbred line mean phenotypes is possible.
Table 2. Mixed model analysis of variance (ANOVA) for heterogeneity in individual plant traits among or within regions (native Eurasian or introduced North American range source), populations and lines for Arabidopsis thaliana planted in a common garden experiment in Toronto, Canada. The full model (A) includes the effect of NA and EA region, and subsequent models (B) partition variation within each region separately for those traits with a significant effect of region in the full model. For random effects, we report the variance component, as well as the likelihood ratio statistic and P-value associated with removing that term from the model.
(A) Full model Trait
Fixed effects F (P) Region
Random effects variance components (likelihood ratio statistic, P)
F < 0.001 (0.99)
3.33 (χ2= 3, P= 0.042)
10.81 (χ2= 30.3, P < 0.0001)
3.06 (χ2= 26.5, P < 0.0001)
F= 1.82 (0.18)
0.062 (χ2= 10.7, P < 0.0005)
0.109 (χ2= 15.7, P < 0.0001)
0.084 (χ2= 63, P < 0.0001)
(B) Within regions
Line (EA) or line (population) (NA)
0 (χ2= 0, P= 1)
0.021 (χ2= 6.6, P= 0.005)
0.0042 (χ2= 6.5, P= 0.005)
0.0091 (χ2= 4.4, P= 0.018)
0.0277 (χ2= 128, P < 0.0001)
Days to flower
2.80 (χ2= 7.1, P= 0.004)
1.24 (χ2= 4, P= 0.023)
0.262 (χ2= 2.4, P= 0.061)
1.31, (χ2= 25.4, P < 0.0001)
1.28 (χ2= 101.8, P < 0.0001)
Days to bolting
5.71 (χ2= 8.7, P= 0.003)
1.62 (χ2= 2.8, P= 0.047)
0.419 (χ2= 2.9, P= 0.044)
2.33 (χ2= 49.9, P < 0.0001)
1.34 (χ2= 94.3, P < 0.0001)
As expected for A. thaliana, within continents, early bolting was strongly correlated with early flowering, and larger rosette diameter was reflected in higher rosette leaf numbers at bolting (all P < 0.0001; Fig. 4). However, we detected qualitative differences in the patterns of trait covariation between plants in native and introduced regions. For instance, among EA lines, there was a positive correlation for bolting and flowering times with rosette diameter (both r > +0.62, P < 0.05), and with rosette leaf number (both r > +0.76, P < 0.0001) at bolting. In contrast, while NA flowering time was only moderately associated with rosette leaves (r=+0.15, P < 0.05) and rosette diameter (r=+0.20, P < 0.05); bolting time was not significantly associated with either trait (Fig. 4). Comparison of the correlation coefficients between EA and NA with Z-tests suggests that the coupling of flowering time with plant size and development is weaker in populations from the introduced range, but that correlations between rosette leaf number and size are stronger.
Geographic differentiation: flowering phenotypes and fitness
Multiple regression models revealed evidence for geographic differentiation. Furthermore, comparisons with EA lines in the experiment suggested a parallel cline in flowering time between regions. In the reduced model (Model A), mean inbred line days to bolting (estimate ± SE +0.094 ± 0.034, P= 0.0057) and days to flowering (+0.082 ± 0.030, P= 0.0061) exhibited significant regressions with longitude, and that remained significant above and beyond variation in neutral population structure (Model B; both P < 0.020, Table 3). A parallel, longitudinal cline was detected for days to flowering among EA lines (reduced model: –0.18 ± 0.061, P= 0.0091), although significance was reduced after controlling for population structure (P= 0.058). In both regions, inbred lines with more coastal origins (eastern in North America, western in Eurasia) flowered later than lines with more inland origins (Fig. 5). Reducing the NA sample to focus only on FRI functional genotypes (see, e.g., Stinchcombe et al. 2004) did not reveal additional geographic clines. In general, longitudinal clines were weak in overall magnitude, with considerable within-population variation in flowering time present, but statistically significant.
Table 3. Multiple regression models testing for geographic clines in flowering time with and without control for neutral processes and climatic variation. All models were conducted on inbred line means for Arabidopsis thaliana plants grown in a common garden experiment in Toronto. Principal components (PC) describe mean monthly climate at home from October through to April, for the variable indicated in the column header. Gray shading indicates that the term was omitted from the model. Significant effects at α= 0.05 are bolded.
Significant positive latitudinal clines among NA samples were observed for mean inbred line rosette diameter (0.043 ± 0.014, P= 0.002), survival (0.016 ± 0.0052, P= 0.0023), and fruit production (0.019 ± 0.006, P= 0.0021) in the reduced model, although none approached significance after controlling for population structure (P > 0.12). Among samples from the native range (EA), we also observed a significant longitudinal cline for inbred line rosette leaf number (Full Model B: –0.42 ± 0.19, P= 0.044), suggesting that western lines that have functional FRI flower later and at a larger size than eastern populations.
We evaluated the contribution of climatic variation to the observed clines by incorporating climatic PCs into a multiple regression, with PCs for each climate variable estimated separately. PCs analyses generated one PC for mean October–April temperature (PC1: eigenvalue 6.9, 98.3% variation explained), one PC for mean diurnal temperature range (PC1: 6.5, 92.2%) and two PCs for mean total precipitation (PC1: 5.3, 76.4%; PC2: 1.2, 17%). Inclusion of PC1 for temperature or PCs for precipitation either reduced or eliminated the significance of longitude on days to flowering (Table 3), suggesting that either of these climatic factors or factors closely correlated with them account for the longitudinal cline in flowering time. Although inclusion of a temperature PC reduced the significance of the longitude effect (P= 0.051), the parameter estimate is largely unchanged (0.075 ± 0.038 vs. 0.082 ± 0.03 for the fully reduced model). In contrast, inclusion of precipitation PCs not only eliminated the longitude effect (P= 0.91), but also dramatically reduced the parameter estimate (0.007 ± 0.06). Inclusion of PC1 for diurnal temperature range had no effect (Table 3). Taken together, these results suggest that geographic variation in flowering time across the introduced range is driven by longitudinal variation in climate, with precipitation (or variables highly correlated with it) showing the strongest statistical evidence as the mechanism.
Relative effects of site of origin, climate, and flowering time genes on phenotypes
In the full, genetic model (Model D), all traits displayed significant associations with FRI or FLC genotypes, or their interaction (Table 4). Compared to lines with the common, FRI–wt genotype, FRI–null lines produced more fruit (back-transformed lsmeans ± SE: FRI–wt, 306.9 fruit/plant, SE range +35.74, –32.02; FRI–null, 512.9 fruit/plant, +96.56, –81.26), flowered later, and at a larger size (lsmeans ± SE; days to flowering: FRI–wt, 207.5 ± 0.34 days; FRI–null, 208.8 ± 0.53 days, rosette diameter: FRI–wt, 2.96 ± 0.11 cm; FRI–null, 3.5 ± 0.17 cm, rosette leaves: FRI–wt, 28.9 ±0.96 leaves; FRI–null, 32.8 ± 1.50 leaves), and had higher survival (lsmean ± SE; proportion of five replicates surviving to reproduction: FRI–wt, 2.83 ± 0.21; FRI–null, 3.86 ± 0.33). A significant association with FLC haplotype was apparent only for days to flowering, and revealed that the uncommon FLC–B haplotype flowered on average one to two days later than the common FLC–A haplotype (lsmeans ± SE: FLC–A, 207.4 ± 0.37 days, FLC–B, 208.9 ± 0.53 days).
Table 4. Association mapping for genetic covariance with flowering time traits and fitness for North American inbred line means controlling for variation in geographic origin, genotype, and population structure.
1Log10 (mean lifetime fruit production). Significance at α= 0.05 is bolded, at α= 0.10 in italics.
Days to bolting
Days to flower
We also detected an epistatic association of FRI and FLC on flowering phenotypes. For both days to bolting and days to flowering, the interaction was associated with delayed phenology in lines with the FRI–null/FLC–B genotype compared to other genotypes (days to bolting, days to flowering: FRI–wt/FLC–A 197.2 ± 0.16, 207.4 ± 0.14; FRI–wt/FLC–B 196.7 ± 0.77, 207.6 ± 0.67; FRI–null/FLC–A 196.1 ± 0.84, 207.4 ± 0.73; FRI–null/FLC–B 199.1 ± 0.92, 210.2 ± 0.80).
Two aspects of the full, genetic model stand out. First, although the effects of genotype on flowering phenology range in strength from strong (e.g., effects of FRI on most traits, Table 4) to marginally significant (e.g., FLC on days to bolting P= 0.093; Table 4), the FRI×FLC interaction term should be interpreted with caution. Our common garden sample contained few lines that were not FRI–wt or FLC–A (6.4% and 6.9%, respectively), hence it contained few lines with genotypes other than FRI–wt/FLC–A (N for other genotype combinations, 6 FRI–null/FLC–A; 7 FRI–wt/FLC–B; 5 FRI–null/FLC–B). Second, in the full genetic model the effects of longitude on days to bolting and days to flowering were only marginally significant (P= 0.094, and P= 0.064, respectively; Table 4), although both were significant in the reduced, geographic models (Models A and B; all P < 0.020). However, parameter estimates for the longitude terms were similar between all three models (Model A, B vs. D parameter estimate ± SE for longitude: days to bolting A: +0.094 ± 0.034, B: +0.084 ± 0.034, vs. D: +0.061 ± 0.035; days to flowering A: +0.082 ± 0.029, B: +0.071 ± 0.030 vs. D: +0.058 ± 0.031). In addition, removing the FRI×FLC term from the full model restores the significance of the longitude effect on both days to bolting (P= 0.03) and days to flowering (P= 0.022). Finally, in an analysis restricted to lines with FRI–wt/FLC–A (N= 154) containing latitude, longitude, and ancestry coefficients, we still detect evidence of significant and marginally significant flowering time clines (longitude: days to bolting +0.095 ± 0.036, P= 0.009; days to flowering +0.061 ± 0.033, P= 0.069).
Introduced model species offer the potential to study convergent evolution, distinguish between adaptive and nonadaptive explanations for geographic differentiation, and evaluate underlying genetic mechanisms. Three major results emerge from our study of the evolution and genetics of naturalized populations of A. thaliana: (1) introduction and geographic expansion has been accompanied by dramatic shifts in population genetic structure and allele frequencies at major flowering time genes, (2) abundant quantitative genetic variation (within populations) and longitudinal differentiation exists in the introduced range, despite reduced variation at three known, major-effect flowering time genes; and (3) strong effects of FRI are observed on several quantitative traits. We discuss these findings below.
Variation in population structure and allele frequencies between regions
Both our STRUCTURE analysis using SNPs and assessments of allele frequencies at functional loci suggest dramatic genetic differences between introduced and native populations of A. thaliana. For example, we detected fewer clusters in North America (K= 2; Fig. 3) than have previously been observed using EA samples (K= 5–8 for primarily EA samples: Nordborg et al. 2005; Schmid et al. 2006; Fig. S2). Given bottlenecks, founder effects, and past reports of reduced variation in North America (Jorgensen and Mauricio 2004; Nordborg et al. 2005; Platt et al. 2010), these results are unsurprising. The structure plots of both EA and NA when analyzed together (Figs. S1 and S2) indicate substantial differences in patterns of population structure.
It is likely that many sources of the colonization of North America were not included in our sample, therefore increasing the probability that a rare set of source populations were also not sampled (although other explanations still remain possible). Three lines of evidence support this view. First, when we analyzed NA and EA samples together without supplying geographic information, we obtained K= 3, with the predominant pattern that EA samples are almost exclusively cluster 2 (K2). In contrast, NA samples appear to be either cluster 1, cluster 3, or admixed combinations (i.e., K1–K2 or K2–K3; Fig S1). These data indicate qualitative and quantitative differences in clustering between North America and Eurasia. Second, any structure among EA samples disappears almost entirely when analyzed with NA samples (Fig. S1), suggesting that differences between regions are appreciably more pronounced than differences within EA samples alone. Finally, when NA samples are forced to cluster with EA samples, clusters that predominate in Eurasia are less frequent in North America, and vice-versa (Fig. S2).
Given differences in population genetic structure, how should the differences between regions in allele frequencies at FRI, FLC, and PHYC be interpreted? As these genes affect quantitative traits expected to be under selection, it is likely that natural selection has contributed, at least in part. However, while natural selection may drive changes in allele frequencies within ranges, it seems more likely that the dramatic differences between regions are due to stochastic effects. Functional loci such as FRI, FLC, and PHYC should be subject to the same evolutionary and demographic forces that produced the pervasive, genome-wide differences in population structure indicated by the structure plots. Moreover, comparisons of allele frequencies at 136 SNPs between the native and introduced range suggest that numerous loci throughout the genome exhibit large frequency differences (Fig. S3), again suggesting the prominent role of stochastic forces accompanying introduction.
Parallel clines and ecological mechanisms
Our common garden experiment detected statistically significant longitudinal clines in flowering time, both in NA and EA samples, although the clines for the NA samples are weaker with considerable variation present within populations. These data further strengthen past interpretations that adaptive clines in A. thaliana are associated with regional climatic forces. Samis et al. (2008) suggested that longitudinal clines among EA Arabidopsis lines were likely due to adaptation to seasonal winter conditions, leading to changes in critical photoperiods. Three additional findings from the present experiment support their interpretation: (1) we again detected longitudinal clines in EA samples, using a smaller subset of accessions and under seminatural garden conditions in Toronto (Fig. 5A), (2) we detected a weak, but statistically significant longitudinal cline in flowering time in NA samples that was mirrored in the EA samples (Fig. 5B), and (3) when we used multiple regression to explain flowering time in the introduced range, winter precipitation (or highly correlated variables) emerged as the strongest explanation of the NA cline (Tables 3 and 4), with weaker contributions from temperature. Winter temperature and precipitation (or highly correlated variables) also appeared to explain the EA cline (K. E. Samis and J. R. Stinchcombe, unpubl. data). In general, the stochastic forces and bottlenecks associated with the introduction history, along with the changes in allele frequency at SNPs and flowering time genes (see above), suggest that clines like the ones we have described here (with a great deal of scatter) should perhaps be the expectation for A. thaliana in North America.
Identifying adaptive or parallel clines may be complicated by the potential problems of nonoverlapping latitudinal ranges (Colautti et al. 2009) and cryptic population structure associated with neutral processes (Keller and Taylor 2008). Neither factor appears to influence the flowering time clines we detected. First, the primary axis of differentiation we detected was longitudinal rather than latitudinal (and we included latitude as a covariate in all models). Second, our full models accounted for population genetic structure, and suggested that both neutral and adaptive processes contributed to geographic variation in flowering time. In contrast, any latitudinal clines we detected disappeared when ancestry coefficients were considered, suggesting largely neutral or demographic mechanisms (also see Keller et al. 2009).
One noteworthy aspect of our results is that we failed to detect latitudinal clines in flowering time, in contrast to past reports (Caicedo et al. 2004; Stinchcombe et al. 2004; Lempe et al. 2005; Brachi et al. 2010; Li et al. 2010). Our relatively sparse latitudinal sampling could have contributed, as past studies have shown that sample composition can affect power to detect clines and genetic associations (Samis et al. 2008). It is also likely that climatic differences among experiments also contributed. Although our study, Stinchcombe et al. (2004), and Caicedo et al. (2004) were each carried out under seminatural conditions, the climate differs markedly between locations (and growth chamber programs used in other studies). For example, prolonged, subzero winter temperatures and snow-thaw cycles were more common in Toronto during this experiment than past studies (Samis and Stinchcombe, unpub. data). The timing and duration of seasonal cues can be extremely important in determining genetic associations (Brachi et al. 2010) and life-history transitions (even in an isogenic background, Wilczek et al. 2009). As such, our failure to find latitudinal clines in flowering time under different conditions may be environment specific.
Phenotypic associations with major effect loci
Two important results emerge from our modified structured association analyses. First, the observed longitudinal cline for flowering time in North America is not accounted for by the effects of FRI, FLC, or their epistatic interaction (Table 4). Although we observed some evidence of differing geographic centers for different allele classes (Fig. 2), these differences were insufficient to account for the observed clines. These data suggest that the evolution of flowering time clines in North America has been achieved largely through other, unknown mutations. Although any linkage disequilibrium (LD) or haplotype sharing among NA samples (cf. Nordborg et al. 2005; Platt et al. 2010) may inhibit the resolution of association mapping, and make this conclusion conservative, we still detect geographic differentiation even after accounting for observed variation at FRI and FLC, and implicitly, other unknown loci in LD with them.
Second, we observed pervasive and dramatic effects of FRI (or genes in LD with FRI) on several quantitative traits (Table 4). Similar to previous studies (Caicedo et al. 2004; Korves et al. 2007; Scarcelli et al. 2007), we detected evidence of an epistatic interaction between FRI and FLC, although caution is warranted because of small sample sizes. The minor allele frequencies for many genotype combinations are quite low: the underrepresented combinations contain between 5 and 7 inbred lines, while the dominant genotype combination (FRI–wt/FLC-A) contained 181 inbred lines. Previous studies have shown that false-positive genetic associations increase in frequency when the minor allele frequency is < 10% (Bracchi et al. 2010). Second, the phenotypic effects of genetic associations (here FRI–null/FLC–B genotypes bolted 1.5–3 days later and flowered 2.6–2.8 days later than the others) are likely to be overestimated due to the Beavis effect.
Aside from the effects on phenology, the effects of FRI on other quantitative traits was more dramatic: plants with the FRI–null allele had larger rosette diameters, produced more rosette leaves at flowering, produced significantly more fruit, and had higher survival (Table 4). Despite these apparent performance and fitness advantages, FRI–null alleles are relatively rare in our sample. Three potential mechanisms could explain this discrepancy: (1) FRI–null alleles may indeed have a fitness advantage, but have not had sufficient time to increase in frequency since introduction; (2) while the seminatural conditions of our experiment may have lead to a fitness advantage of FRI–null alleles (also see Korves et al. 2007), FRIwt alleles may be favored elsewhere in the introduced range or seasonal environments; and (3) other unknown loci in LD with FRI led to the observed performance and fitness differences (which could also explain the later flowering of FRI–null alleles, which typically flower earlier). With the data in hand, it is not possible to distinguish these possibilities.
Our results point to three major conclusions: (1) Introduced NA samples of A. thaliana show qualitatively and quantitatively different patterns of genetic variation than EA samples—based on population structure (Fig. 3), allele frequencies at functional loci (Table 1), and covariation in quantitative traits (Fig. 4); (2) introduction to North America appears to have been accompanied by the evolution of weak but significant longitudinal clines in flowering time (Fig. 5), associated with geographic variation in winter precipitation, temperature, or closely correlated environmental variables (Table 3); and (3) variation at major flowering time genes FRI, FLC, and PHYC is insufficient to explain the cline, although variation in FRI (or factors in LD with it) appears to have dramatic effects on a host of quantitative traits. Several crucial next steps are necessary to confirm these results. First, reciprocal transplant experiments across the longitudinal range in North America are necessary, and manipulations of precipitation (either in the field or growth chambers) will be required to confirm or reject its role as a selective agent. Second, crosses and traditional QTL mapping approaches will still be necessary (see, e.g., Bracchi et al. 2010; Nemri et al. 2010) to confirm or reject the associations between FRI and associated quantitative traits. Careful selection of accessions should make it possible to evaluate the genetic basis of geographic differentiation in the introduced range.
We use “parallel” in the geometric sense, without implication about genetic mechanisms.
We thank NSERC Canada and CFI for funding and B. Hall, D. Tam, and A. Petrie for logistical support. B. Harrett, A. J. Stock, H. van Tol, and K. Wang helped to collect data. We thank J. Borevitz and Y. Li for SNP genotyping assistance and manuscript comments. Comments by A. Buerkle, L. Delph, and four anonymous reviewers improved the manuscript.