The raw material for Lasky et al.'s (2012) study consists of spatial, environmental and genetic information from more than one thousand genotypes of Arabidopsis thaliana collected from 447 locations throughout the Eurasian range of A. thaliana (Fig. 1). Each accession is genotyped for more than 200 000 single nucleotide polymorphisms or SNPs (see Horton et al. 2012 and http://bergelson.uchicago.edu/regmap-data/regmap.html/). This SNP density provides gene-level marker resolution, meeting the goal of surveying the entire genome for its participation in geographical patterning. In addition, the authors make use of three geo-referenced climate databases to provide a robust description of climate in each locale. Lasky et al. (2012) determine the extent of genome-wide geographical differentiation and dissect the two major causes of range-wide patterning. The first cause they address is in situ adaptation driven by local differences in the environment, approximated by Lasky et al.'s ‘climate’. Their second cause, ‘space’, includes all spatial patterns driven by forces other than local adaptation, including both random genetic drift and phylogeography driven by ice-age vicariance and subsequent spread. Space also may represent uncharacterized spatially autocorrelated environmental gradients that are not detected by the climate variables, including factors affecting microclimate such as slope and aspect of a site as well as soil and biotic community characteristics.
Teasing apart this mix of potential causes across both the species geographical range and its genome presents an analytical challenge because of the high dimensionality of the system. Lasky et al. (2012) essentially follow on Wright's (1932) seminal vision of genomic variation as describing a multidimensional space in which each gene represents one or more axes of variation. For the genome of A. thaliana, this space has well over 30 000 dimensions if all transcribed and additional non-transcribed regulatory sequences are counted. For Lasky et al., this space is described by the allelic variation in 200 000 SNPs. Geographical space may seem to be simpler, describable in two dimensions, but what matters here is the dimensionality of the matrix of distances between genotype locales on Earth's approximately spherical surface. For n = 447 locales in this study, this geographical space has n (n−1)/2 = 99 681 dimensions, nearly as daunting as the genetic space. And environmental variation, even if confined strictly to climatic variation, has also many dimensions. With each type of data, however, the many dimensions are not entirely orthogonal. What to do? Lasky et al. (2012) use sophisticated approaches to first find the important dimensions in each data matrix and then dissect key relationships amongst them.
Using methods that have previously been applied in community ecology (see Dray et al. 2006), and building on Manel et al.'s (2010) approach, Lasky et al. (2012) first reduce the distance matrix using principal coordinates of neighbour matrices (PCNM—Borcard & Legendre 2002). Much like principal components analysis (and Moran's eigenvector maps; see Dray et al. 2006), PCNM calculates eigenvectors that reflect the hierarchy of spatial scales in which the genotypes are clustered. This provides the spatial framework in which associations of climate patterning and genetic variation can be addressed. The authors then use redundancy analysis (RDA—see Makarenkov & Legendre 2002) to ask how much of the SNP variation can be predicted by climate and space. RDA is to canonical correlation as linear regression is to simple product-moment correlation; it predicts values of a multivariate set of response variables (SNPs) based on the predictor variables (space and climate), taking into account covariation in both predictor and response variables. RDA does this by maximizing the variance explained in the response variables by the predictors and creates orthogonal axes of linear combinations of response and predictor variables. The variance in response variables accounted for by predictor variables can be partitioned into components attributable to climate, space and climate–space confounded. Thus, this study provides an invaluable primer of multivariate statistical approaches for dissecting population genomic spatial patterning and its causes.
Lasky et al. (2012) show that climate and spatial associations together explain 23% of SNP variation. Climate variation that is independent of spatial variation accounts for about 25% of the total adjusted r2 of 0.23 for climate and space considered jointly, that is, about 6% of the total SNP variation. Spatial patterning that is independent of climate accounts for a nearly equal amount, 31% of the combined r2 and 7% of the total SNP variance. The remainder of the explained variation, a little under half, is associated with both climate and space, covarying in such a way that they cannot be teased apart (see Lasky et al.'s supplemental materials for details). Climatically, minimum growing season temperatures and summer precipitation explained the greatest amount of SNP variation, in agreement with the Hoffman's (2005) previous bioclimatic study. Spatially, the greatest variation was explained by an axis that separates Northern from Western European genotypes.
The magnitude of the variation that cannot be separated into the two focal components illuminates the difficulty of separating the effects of population structure from that of environmental gradients. A complementary approach to that of Lasky et al. (2012) is to focus on a finer spatial scale in which strong environmental gradients can partially or wholly dissociate these factors as in the study by Montesinos-Navarro et al. (2011) and Lewandowska-Sabat et al. (2012).
How much variation in SNPs should we expect to be explained by associations with climate and spatial patterning? There is no theoretical expectation available. That 23% of the SNP variation, spread around the genome, is predicted by climate and space suggests geographical differentiation throughout the genome. Widespread differentiation within the genome because of drift and phylogeography is not surprising. Adaptive differentiation to climate is another story. If this was a highly vagile outcrossing species, one could conclude that something in the range of 6–16% of the genome is included in the genetic architecture of climatic adaptations (i.e. variation strictly associated with climate or associated with both climate and space respectively; see Lasky et al.'s Table S4), suggesting a highly polygenic and highly genomically distributed basis to adaptation. However, A. thaliana is highly inbred. It is therefore unknown how much the linkage disequilibrium amongst SNPs inflates the apparent extent of genomic involvement in the genetic architecture of adaptation. Some light is shed on this question by the observation that there is an excess of SNP variation explained by climate in coding vs. non-coding regions. This suggests that much of the detected association with climate is adaptive rather than being a result of many SNPs hitchhiking on a few adaptive differences. Lasky et al. (2012) have shown us that many genes are involved in climate adaptation in this widespread species.
The PCNM eigenvector associated with the smallest spatial scale explained little SNP variation. This contrasts with other recent studies. Manel et al. (2010), using very similar approaches to Lasky et al. but in the closely related Arabis alpina, found the highest number of loci to be adaptively differentiated at their most local scale. Two studies also in A. thaliana are noteworthy in demonstrating local adaptive divergence in functional traits, suggesting that one will find underlying differentiation in SNPs as well. Montesinos-Navarro et al. (2011, 2012) observe populations on Spanish Mediterranean to Pyrenean climate and altitude gradients that are strongly adaptively differentiated in life history, morphological and functional traits. Lewandowska-Sabat et al. (2012) report substantial variation in vernalization requirements on coastal-inland and elevation gradients in Norway. Lasky et al. (2012) suggest that the lack of climate and space patterning of SNPs at their lowest spatial scale may be due to the preponderance of European accessions, where the greatest modern admixture has occurred. Montesinos-Navarro et al. (2011, 2012) and Lewandowska-Sabat et al. (2012) studied populations at the southern and northern limits of the range, respectively, where admixture is greatly reduced (see Picó et al. 2008 for the Spanish populations), and the opportunity for local adaptive differentiation therefore may be enhanced. Lasky et al. (2012) similarly find the greatest degree of independent explanatory power for climate in Scandanavian genotypes. It may be that adaptation to local conditions is more strongly evolvable at the range limits because of differences in population isolation, strength/divergence of selection or both.
The next step is perhaps the most difficult: elucidating the functional genomics of climate-associated differentiation. Whether conducted bottom-up by starting with candidate genomic regions identified through the approach of Lasky et al. (alternative top-down approaches: Hancock et al. 2011; Fournier-Level et al. 2011) or top-down by locating genes associated with functional differentiation across climate gradients, the biggest challenge will be to bring field phenotyping to a level of throughput that matches the impressive prowess of collective genotyping and statistical computation demonstrated in this study.