This study provides a broad survey of genetic variation along the distribution range of E. sativa in Israel. There is substantial variation in environmental conditions along the narrow local geographic distribution range of E. sativa – variation that encompasses both climatic conditions (i.e., temperature and rainfall) and edaphic conditions (i.e., conductivity, CaCO3 content, and SSA), as well as vegetation. Despite this, genetic diversity and structure was not unambigously associated with environment or geography, instead it was best explained by genetic cluster. Also, the finding of a genetic break that divided the investigated populations into two clusters did not coincide with the most striking differences in morphology observed in nature. Indeed, differences in phenotype largely disappeared in the common-garden experiments, which showed that this variation was almost entirely due to plasticity. Nevertheless, regression analysis showed that part of the phenotypic variation measured under field conditions could be attributed to the environment. In addition, several candidate loci for selection were detected by three different methods, suggesting a role for adaption to environmental conditions.
The present results show a moderate but clear genetic break that subdivides the nine studied populations within the narrow distribution range into a southern and a northern cluster (Figs. 3, S4). Although the existence of two genetic clusters could be attributed to local adaptation to differing climatic conditions, or to the apparent geographical distribution gap that separates the southern from the northern population (Fig. 2), the finding of distinct genetic groups in phylogeographic surveys is usually interpreted in terms of historical demography. Thus, the present two groups could be plausibly considered as reflecting the existence of Irano-Turanian (southern group) and Mediterranean (northern group) elements in the area (Danin and Plitmann 1987). The modern distribution range in the Jordan Valley and southern Golan Heights could represent a contact zone between former allopatric groups. A broader phylogeographic study of both introduced and natural populations in the eastern Mediterranean and the Near East is needed to distinguish these hypotheses.
Because the two clusters were located in a north-south pattern, testing for effects of geographic distance without correcting for this spatial pattern might yield spurious regression coefficients. The same applies to environmental factors if they are correlated with the geographic hierarchical structure. This was exemplified by the significant effects of the geographic and environmental distances on genetic distances in multiple regressions, which do not control for hierarchical structure. Cushman and Landguth (2010) suggested using causal modeling with partial Mantel tests to identify the process responsible for genetic patterns, and this approach identified cluster as the causal process (data not shown). Conversely, if true effects of geography were correlated with the geographic clusters, correcting for hierarchical structure might remove a large part of these effects (Mulley et al. 1979). Analyses subsequent to the partial Mantel tests, however, would have to be restricted to populations belonging to the same cluster, which would lead to a lack of power, because of the reduction of the number of data points. Instead, we here analyzed the full distance matrices with LMM, with substructure added as a grouping variable, and used simulated response data of a null model to estimate significance. An advantage of this approach is that the contribution of all parameters could potentially be estimated by using the whole data set, in case they had independent effects as well as their possible interactions, for example, (Lee and Mitchell-Olds 2011). In the end, this approach identified cluster as the major influence on marker variation, and no additional effects of environment or geographic distance could be detected.
Attention has recently been given to the influence of local adaptation on genetic patterns via the ‘general barriers’ mechanism (Nosil et al. 2009), and several studies have found evidence for a role of environmental selection in shaping neutral marker patterns (Nosil et al. 2009; Freeland et al. 2010). In the light of the strong environmental gradient in the present study area (Tables 1 and S1) there should be ample opportunities for local adaptation in populations of E. sativa. However, although the existence of IBA (isolation by adaptation) patterns seems relatively common (Nosil et al. 2009) we could not detect a distinct influence of the environment on genetic diversity. It is certainly possible that neither geography nor environment plays a significant role in this system or that selection is too weak to produce local adaptation, although intuitively this seems unlikely. Accordingly, explanations could be that the scale at which most of the habitat differentiation is found is greater than that at which gene flow occurs, which would lead to neutral patterns uncorrelated with the environment; or, alternatively, there could be selection pressures unrelated to the environmental axes used here. On the other hand, the presence of a relatively high correlation among the three explanatory variables (environment, geography, and cluster) might mask any smaller effects of geography or environment (Mulley et al. 1979). At any rate, an accurate estimation of relative effects does not seem possible when variables are correlated. Indeed, this may be common in nature; in many cases the effects of genetic barriers or geographic distance, on one hand, and adaptation to local habitat conditions, on the other hand, may not be independent because the homogenizing effects on gene flow counteracts differentiation and local adaptation (Slatkin 1987). Thus, some form of spatial restriction of gene flow between populations (e.g., distance, geographical barriers) might often be necessary for local adaptation to occur.
Phenotypic variation is thought to be more readily affected by selection than molecular variation, as the former may be the direct target of selection. Indeed, significant correlation of the phenotype data from the present common-garden experiment with the environmental characteristics implies a possible role for selection. In this case, the finding that edaphic factors accounted for more variation than climatic factors disentangles it somewhat from the geographic component. It is, however, difficult to speculate about which traits are influenced by the soil type. Indeed, most plant features monitored in the present study (i.e., performance-related characters, trichome density, and herbivore damage), as well as observed variation in seed germination in response to photo-thermal cues (Barazani et al. 2012), can reasonably be attributed to climatic factors. Similarly, phenotype variations in East-Mediterranean plant populations have largely been associated with climatic conditions, encompassing phenology, morphology, and ecophysiological differences such as seed dormancy (Gutterman 2002; Yonash et al. 2004; Yan et al. 2008). The contrasting lack of significant regression coefficients with the data from the insect-free net-house experiment may seem surprising at first glance as several traits were measured in both experiments. The most likely explanation is that these differences in traits resulted from the response of the plants to the ambient conditions in the common garden experiment – for example, herbivory, radiation, etc.
Detection of outlier loci
The three programs, Dfdist, Bayescan, and SAM detected 1.6, 2.2, and 3.2%, respectively, of all polymorphic loci as outliers; the total proportion of outliers was 5.4% (Table 4). These numbers are within the range found in other plant species, and are slightly below the average proportion of outliers (8.5%) found in other studies (Strasburg et al. 2012). However, findings like these are difficult to compare, because the studies described by example, Strasburg et al. (2012) and Nosil et al. (2009) differ in the methods used and the significance levels chosen, as well as in their study systems. In E. sativa, in the present study, only three of the nine candidate loci (55GA170, 55GA254, 54GA254) were identified by more than one method. However, in light of the fact that the programs use different methods to detect candidate loci, it is not surprising that the different programs identified different markers as outliers. Even when only comparing differentiation based outlier methods, several papers have shown that programs perform differently using both experimental and simulated data, for example, (Vitalis et al. 2001; Foll and Gaggiotti 2008; Excoffier et al. 2009; Nunes et al. 2011).
Several researchers have raised concerns about the possibilities of high rates of false positives in genome scans for outliers (Foll and Gaggiotti 2008; Excoffier et al. 2009). The finding of a hierarchical genetic structure, for example, may have an influence on the detection of outliers. Excoffier et al. (2009) found a large number of false positives when using an island model of differentiation in a hierarchical population structure. In their scenarios, simulations showed that the rate of false positives for loci under divergent selection was high when differentiation between groups was strong, compared to among population differentiation, but relatively low when the structure among groups was less pronounced. In our present study, we included all populations in a global analysis, to ensure that the data would cover as much of the climatic gradient as possible.
However, the pattern of differentiation in E. sativa appears to correspond more to the case described by Excoffier et al. (2009) which produced fewer false positives; if we exclude outliers from our data set the amount of among-group and within-group variation in AMOVA changes to 6% and 5%, respectively. We therefore assume that the relatively strict P- and q-value thresholds applied prevent a problematic rate of false positives. Nevertheless, we further tested whether pooling populations within the two clusters (excluding the admixed MZ) affected the detection of outliers with Dfdist. In this analysis fragments 55GA170 and 54GA254 were again identified, whereas fragment 55GA254 was not (Fig. S5). However, fragment 55GA254 showed a pattern of differentiation that was most marked among populations within the genetic groups, therefore its detection should not be affected by the hierarchical structure. This was also true of the two additional outlier loci detected by Bayescan, which suggests to us that the results of the outlier analyses are robust with respect to hierarchical structure.
Slightly more of the candidate loci detected by SAM were associated with the first PCA axis, which related mainly to temperature and rainfall, than with the second, which related mainly to LOI, CaCO3, and SSA, but also to topography. If we include candidate loci with nonsignificant associations, seven out of nine loci were most strongly associated with PC1. However, this is not significantly more than for PC2 (sign test, P = 0.18); furthermore, it may reflect the correlation between subclusters and climate, which might influence the results of this analysis. Thus, candidate loci in E. sativa showed no clear trend toward closer association with climatic than with other environmental factors. Most studies that test for associations between marker and environmental data use climatic data, most likely because of their availability. However, our results suggest that incorporating other data such as edaphic conditions might be a promising strategy; and this is not surprising in the light of the importance of edaphic factors in ecotypic variation (e.g. Turesson 1922; Heywood 1986; Sambatti and Rice 2007).
Of the five FST outlier loci found in the present study, only two showed significant associations with the environment (Table 4). The three remaining outliers showed distributions of allele frequencies that did not correspond to any of the considered patterns – that is, IBD, IBA, PCs – and also did not show the north-south pattern of genetic substructure (data not shown). This lends further support to the idea that additional environmental parameters should be identified to help explain the neutral patterns of genetic differentiation. However, even if we were confident in the significant associations found, it would still be difficult to make predictions about the selective forces involved, as well as their targets. For example, any locus associated with rainfall is likely to be associated with temperature also, because of their high correlation, and further experiments would be needed to confirm whether one, both, or neither of these factors acted as selective agent(s). In the present case, however, the available phenotype data might help to provide further insights into these questions. Thus, we correlated the phenotype means with band frequencies of outlier loci in order to associate candidate loci with phenotypic traits. After correcting for multiple comparisons, and removing trait combinations with high correlations (r > 0.7), we found two correlations that were significant for fragment 54GA254, the only candidate identified by all three methods: (1) stem trichome density (net-house experiment, Pearson's r = −0.95, P = 0.033); and (2) late herbivore damage (common-garden experiment, r = 0.96, P = 0.010). It is well known that trichome production is induced under herbivore attack (Yoshida et al. 2009; Sletvold et al. 2010), which indicates a clear relationship between trichome density and plant defense against herbivore damage. In addition, it can be assumed that in the common-garden experiment the trichome density resulted from exposure to herbivores in the field, which might explain why fragment 54GA254 was correlated with trichome density in the insect-free net-house and not in the common-garden experiment. Furthermore, in many plants trichomes reflect excess light and dissipate heat (Vogelmann 1993), and thus might also have a direct association with climate, as indicated by the present SAM analysis. Thus, there is a distinct possibility that this marker is linked to a locus that regulates trichome development and, therefore, that could have different adaptive value across the range of this study. Isolation and sequencing of this and the other candidate loci are still needed, in order to identify linkages to known genes and, therefore, possible roles in local adaptation.
There is no doubt that more experiments are needed to determine the local selective pressures and the underlying relationships between genotype, phenotype, and the environment. The use of new approaches that utilize phenotypic data from genotyped individuals within populations in exploratory genome scans (Herrera and Bazaga 2009; Herrera 2012) would have enabled us to infer associations between traits and candidate loci at the individual level, and to directly estimate the genetic component of the phenotypic variation. Nevertheless, the approaches taken in the present study enabled us to formulate hypotheses about the selective pressures experienced by the various populations along an environmental gradient, and to speculate about the targets of selection – thereby demonstrating the potential value of combining environmental and phenotypic data with detailed genetic surveys.