Global spatial analysis of Arabidopsis natural variants implicates 5′UTR splicing of LATE ELONGATED HYPOCOTYL in responses to temperature

Abstract How plants perceive and respond to temperature remains an important question in the plant sciences. Temperature perception and signal transduction may occur through temperature‐sensitive intramolecular folding of primary mRNA transcripts. Recent studies suggested a role for retention of the first intron in the 5′UTR of the clock component LATE ELONGATED HYPOCOTYL (LHY) in response to changes in temperature. Here, we identified a set of haplotypes in the LHY 5′UTR, examined their global spatial distribution, and obtained evidence that haplotype can affect temperature‐dependent splicing of LHY transcripts. Correlations of haplotype spatial distributions with global bioclimatic variables and altitude point to associations with annual mean temperature and temperature fluctuation. Relatively rare relict type accessions correlate with lower mean temperature and greater temperature fluctuation and the spatial distribution of other haplotypes may be informative of evolutionary processes driving colonization of ecosystems. We propose that haplotypes may possess distinct 5′UTR pre‐mRNA folding thermodynamics and/or specific biological stabilities based around the binding of trans‐acting RNA splicing factors, a consequence of which is scalable splicing sensitivity of a central clock component that is likely tuned to specific temperature environments.

The circadian clock enhances biological fitness by allowing organisms to anticipate environmental changes (Dodd et al., 2005;Green, Tingay, Wang, & Tobin, 2002). Its pace is largely unaffected across a range of physiologically relevant temperatures (Pittendrigh, 1960); this is termed temperature compensation and is usually studied by measuring circadian period at different fixed temperatures (Gould et al., 2006;Gould et al., 2013). In contrast to focusing solely on acclimated temperatures, we examined clock genes in mature Arabidopsis thaliana plants both during and after cooling. This identified temperature-dependent AS in several genes including LATE ELON-GATED HYPOCOTYL (LHY), CIRCADIAN CLOCK ASSOCIATED1, and PSEUDO RESPONSE REGULATOR7 (James, Syed, Bordage, et al., 2012). The retention of the 5′UTR intron 1 in LHY (I1R, event UAS4 in James, Syed, Bordage, et al., 2012) and the inclusion of exon 5a (event AS5 in James, Syed, Bordage, et al., 2012) reach physiologically important levels in cooling and control LHY protein levels. Notably the former AS event is transient whereas the latter is adaptive to temperature (James, Syed, Bordage, et al., 2012;James et al., 2018). Conceptually, the regulation of LHY therefore represents an interesting model of how the clock adapts to (a) fluctuations and (b) longer term changes in temperature that are analogous in nature to unpredictable everyday changes and longer term (conceivably seasonal) changes in temperature, respectively. The LHY I1R event is of particular interest, because switching between fully spliced (FS) and I1R isoforms with temperature is rapid and reversible (James, Syed, Bordage, et al., 2012), has characteristics of a thermometer in that it is sensitive to temperature changes as modest as 2°C, and is scalable and reversible over a wide dynamic range of temperature .
Pre-mRNA processing is simultaneous, and mechanistically coupled, to transcription with splicing factors (SFs) recruited either constitutively or on an "as needed" basis to intron-containing genes (Bentley, 2014). The complexity of temperature signalling has recently been augmented with studies revealing cooling associated "splicing of the splicing factors" (Verhage et al., 2017;James et al., 2018), for example, for temperature-associated isoform switching of the polypyrimidine (pY) tract-binding (PTB) proteins and U2 auxiliary factor 65A  both of which compete for interaction with pY-rich sequences thereby influencing efficiency of splicing (Simpson et al., 2014).  (Consortium, 2016). Using high-resolution spatially interpolated climate data for global land areas, we mapped the distribution of four LHY 5′UTR haplotypes with a series of global bioclimatic variables and altitude, reasoning that if splicing of the LHY 5′ UTR constituted a bona fide temperature sensing module, haplotype distribution would stratify with temperature climatic parameters.
We found that distinct haplotypes delineate along the lines of annual mean temperature and extent of annual temperature fluctuation (annual mean diurnal range) and that haplotype accessions are distinct in their temperature-dependent splicing of LHY pre-mRNA.
We discuss these findings in the context of the transduction of temperature information to the clock via modulation of pre-mRNA intramolecular folding and temperature scalable splicing sensitivity that is likely tunable to specific climatic regions.  (Hahne & Ivanek, 2016) was used to annotate LHY gene organization features with SNP frequencies. The sample() function in R was used to obtain random accession groups from the WRLD dataset. Maps were prepared using the map_data() function and a coord_fixed(1.3) aesthetic in ggplot2 in R. Bioclimatic variables at a global resolution of 2.5 arcmin, reflecting interpolations of observed data from 1960 to 1990, were obtained at worldclim.org (Hijmans, Cameron, Parra, Jones, & Jarvis, 2005). The global index maps were created by plotting the bioclimatic raster layer objects, with projected accession coordinates made explicitly spatial using the coordinates() function from the sp package in R and a coordinate reference system using the PROJ.4 spatial projection and the Earth shape reference datum WGS84. Bioclimatic variables were extracted for spatial locations using the extract() function from the raster package in R and represented mean values of raster cells in a 5-km radius around each point location. Elevations, based on the latitude-longitude coordinates of the accession collection site, were obtained at http://www.gpsvisualizer.com/elevation.
2.2 | RNA secondary structure prediction LHY 5′UTR sequence of length 782 nt (−779 to +3 at the canonical start codon) was used in all RNA secondary structure prediction algorithms.

| Statistical analyses
Ordinary one-way analysis of variance (one-way ANOVA) with Brown-Forsythe summaries, post hoc Tukey-Kramer, and unpaired t tests with equal SD analyses were carried out in GraphPad Prism (version 6). For Tukey-Kramer, pairs of means grouped by a horizontal line were not significantly different from each other (p > .05). For t tests, threshold significance summaries were ***p = .0001 to .001, **p = .001 to .01, and *p = .01 to .05. Principal component analyses (PCAs) were carried out in R using prcomp by applying a log transformation to the continuous bioclimatic variables and employing "set center" and "scale." =TRUE to standardize the variables. PCAs were visualized using ggbiplot employing "elipse" =TRUE where contours are drawn at the default 68% probability for each haplotype group.

| RESULTS
We inspected the LHY locus (At1g01060) for SNPs and focused our attention on a subset of five SNPs ( Figure S1, Figure 1a, and Table 1) located within the 5′UTR region because the balance between constitutive splicing and retention of the 5′UTR intron 1 in the LHY pre-mRNA is scalable with temperature transitions ( (Table S1) were devoid of latitude-longitude information and were excluded from the dataset, as were accessions that possessed one or more uncalled "N" base. Inspection of this filtered dataset (our "WRLD" dataset, see Table S2 and the Supporting information Dataset file 'LHY SNP WRLD dataset.csv') revealed that 932 accessions each possessed one of four distinct 5′UTR haplotypes (Table 2). Col-0, the Arabidopsis reference strain, possesses the G/G/ U/G/C haplotype-the most prevalent LHY 5′UTR haplotype within the WRLD cohort.
We next compared pre-mRNA secondary structure models for the four haplotypes. Table 3 summarizes the predicted contacts within the LHY 5′UTR made by the individual SNPs. Haplotype A/U/G/C/A displays a quite different pattern of base pairing compared to the other three haplotypes. We visualized the predicted structural differences between the Col-0 haplotype, G/G/U/G/C, and A/U/G/C/A. To do this, we contrasted arc diagrams (R4RNA; Lai et al., 2012)-a method that uses the dot-bracket output annotation of RNA secondary structure algorithms such as mfold (Zuker, 2003). The arc diagram of  (Singh, Valcarcel, & Green, 1995;Wachter, Ruhl, & Stauffer, 2012) and SUA cis-consensus sequences (Marquez, Hopfler, Ayatollahi, Barta, & Kalyna, 2015), respectively (Figures 1c and S2). These interactions are not evident for the A/U/G/C/A haplotype (Figure 1c) but are retained in the G/G/U/G/A and A/G/U/G/A haplotypes (Table 3). In addition, SNP 37268 is also predicted to make a different base pair in the A/U/G/C/ A haplotype compared to the other three, though there is only a difference of 2 nts between contact sites. Interestingly, the thermodynamic stability (Gibbs free energy, ΔG, kcal/mol) of the G/G/U/G/C haplotype pre-mRNA is more stable compared to the A/U/G/C/A haplotype (Table 4) PTBs contribute to the temperature-dependent splicing of LHY  and as such PTB would be predicted to interact specifically with LHY mRNA. To test this, we used RNA-EMSA binding reactions with recombinant PTB1 protein and an in vitro transcribed LHY mRNA fragment comprising exons 1 through 5 and the intervening introns (representing the Col-0 G/G/U/G/C haplotype) to ask whether PTB1 bound LHY pre-mRNA. Figure 2b shows the formation of PTB1:LHY mRNA complexes that could be partially competed by the addition of excess unlabelled probe. The amount of probe bound seemed to increase cooperatively with the PTB1 concentration. Taken thus far, the structure predictions and binding data highlight that LHY 5′UTR splicing sensitivity is likely influenced by both the inherent thermodynamic stability (RNA secondary structure) of transcripts in addition to the biological stability of transcripts bound and processed by trans-acting RNA-binding SFs.
In order to investigate the potential relevance of the SNP variations to temperature-dependent splicing of the 5′UTR of LHY, we next Note. Single nucleotide polymorphisms are ordered into four distinct haplotypes. The number of accessions and proportion of the 932 accessions (the WRLD dataset) used for global spatial analysis is provided. examined global spatial distribution of the Arabidopsis natural variants.
Thirty countries, denoted as "others" in Figure S3a, have seven or less accessions ( Figure S3b). Around 50% of the accessions in the WRLD dataset originate from Sweden, the Iberian Peninsula (Spain and Portugal), and the United States ( Figure S3a). Differences in the relative make-up of haplotypes within the country populations were evident-Recently colonized U.S. accessions  are principally the Col-0-like G/G/U/G/C type with a high proportion of the U.S. cohort (66%) originating from sites adjacent to Lake Michigan split between two sites-a northern "High Coast" group and a southern "Skåne" population ( Figure S4c). These two regions display contrasting climates. The northern accessions, the majority of which are the G/G/ U/G/A type ( Figure S4c, inset), experience colder temperatures, longer snow cover, and a broader range of photoperiod, whereas the more stratified southern strains are found in agricultural meadows, fields, and on beaches along the Baltic Sea (Brachi, 2014).
Sample sizes for each haplotype within the WRLD dataset were uneven due to the relative rarity of A/U/G/C/A ( Figure S5a). We therefore prepared a dataset ("WRLD_ran50", see Supporting information Dataset file 'LHY SNP WRLD_ran50.csv') with equal haplotype sample sizes by randomly selecting 50 accessions for each of the four haplotypes from the original WRLD dataset ( Figure S5a). The WRLD dataset contained a relatively high proportion (40%) of lines originating from the same latitude-longitude coordinates, but with the WRLD_ran50 dataset, this sample site redundancy was reduced to 11.5% ( Figure S5b).
Previously, we distributed the LHY 5′UTR haplotypes at the country level (Figures S3 and S4). However, The 1001 Genomes Consortium established that genetic distances between individual accessions did not reflect geographic distance (Consortium, 2016). We therefore next sought to associate haplotypes according to their ADMIXTURE assignations, a model-based assessment of the ancestry of unrelated individuals (Alexander, Novembre, & Lange, 2009;Consortium, 2016). The clusters broadly correspond to geography (eight groups) and extreme ancestral divergence (the "relict" and "admixed" clusters).
The distribution of the different haplotypes within these ADMIXTURE groups is shown in Figure 3a for the WRLD dataset and in Figure 3b for the WRLD_ran50 dataset. These data show representation of the haplotypes, albeit with different frequencies, within the majority of these clusters (Figure 3a)-the exception being the Northern Sweden cohort that is almost universally the "G/G/U/G/A" haplotype ( Figure 3a). A similar distribution of haplotypes within the ADMIXTURE clusters for the WRLD_ran50 dataset was also seen (Figure 3b Table S2]) and, as noted earlier, probably possesses distinct secondary structure properties compared to the more prevalent haplotypes (Figure 1b and Tables 3 and 4).
We next investigated correlations between LHY 5′UTR haplotype distribution with a series of 19 bioclimatic variables at a global resolution of 2.5 arcmin (worldclim.org; Hijmans et al., 2005). These variables included monthly total precipitation, and monthly mean, minimum, and maximum temperature, and another 15 variables derived from the monthly data. Elevations for the accession collection sites were also obtained (see Section 2). Figure 4a shows the WRLD_ran50 dataset accessions mapped upon global means of monthly temperature ranges (BIO2: annual mean temperature range), an index useful for interpreting the relevance of temperature fluctuation to a species (O'Donnell & Ignizio, 2012). Figure S6a shows an equivalent plot for the 932 accessions of the WRLD dataset mapped upon BIO2 variables.
The WRLD and WRLD_ran50 datasets (Table S2) Figure 4d). A/G/U/G/As and G/G/U/G/As, which differ only in SNP 37437, share similar responses to fluctuations (i.e., the extent of habitat temperature "bandwidth"; Figure 4c) yet appear to be geared to environments with distinct annual mean temperatures (BIO1; Figure S8).
Taken together, a general picture emerges of haplotype temperature "specialisms"-Although A/U/G/C/As and G/G/U/G/As both tend to associate with cooler environments (BIO1; Figure S8), A/U/ G/C/As are prominent in environments with wider extremes of temperature (BIO7; Figure 4d). On the other hand, A/G/U/G/As and G/G/U/G/Cs both appear to correlate with milder temperatures (BIO1; Figure S8), with G/G/U/G/Cs tending to associate with wider maximum-minimum temperature habitats (BIO2; Figure 4c).
A/U/G/C/As are notably distinct from the other three haplotypes for precipitation bioclimatic variables (Table 5 and, e.g., BIO14, BIO17, and BIO18 in Figure S9). A similar picture is seen for the annual precipitation variable (BIO12, Figure S10) where A/U/G/C/A is clearly distinct from the other haplotypes. Interestingly, scatter plots of BIO1 (annual mean temperature) with BIO12 (annual precipitation) imply that for Iberian and Swedish populations ( Figures S11a,b, respectively), the G/G/U/ G/A haplotype has expanded into low temperature and high precipitation environments, whereas G/G/U/G/C haplotypes appear to occupy habitats with narrower ranges of temperature and precipitation.
The Quaternary glacial history of the Mediterranean Basin has played an important role in structuring patterns of plant biodiversity.
Consequently, the Iberian Peninsula has emerged as an important backdrop for studying the evolutionary processes underlying plant differentiation (Comes & Kadereit, 1998;Hewitt, 1999;Hughes, Woodward, & Gibbard, 2006;Marcer et al., 2017;Médail & Diadema, 2009). There are also high levels of environmental heterogeneity in the region, leading to, for example, large differences in geographic variances of minimum temperature between the north and south of the Peninsula, and these appear to correlate strongly with Arabidopsis life cycle phenology (Marcer et al., 2017). Central and northwest Spain are mountainous areas with strong altitudinal gradients and rapid changes in ecological conditions over short distances. These mountainous areas have cool climates that are largely absent in the south-west of the Iberian Peninsula (Mendez-Vigo, Pico, Ramiro, Martinez-Zapater, & Alonso-Blanco, 2011). We were therefore interested to see the global distribution of the four haplotypes for the elevation variable in the WRLD dataset (Figures 5a and S6e) and also for the WRLD-ran50 datasets (Figure 5b). These data show a global-wide high altitude distribution of the relict A/U/G/C/A haplotype compared to the other three haplotypes. Iberian A/U/G/C/As also demonstrated a tendency for a higher altitude distribution (Figure 5c,d). Similar to that observed previously for the BIO2 and BIO7 variables, A/U/G/C/A accessions classified as relict by ADMIXTURE did not appear to influence the higher altitude distribution compared to the other haplotypes ( Figure S6e).
The A/U/G/C/A haplotype was not the only high altitude specialist; there also seemed to be a prevalence for the G/G/U/G/A  (Hijmans et al., 2005) and the elevation variable was tested by ordinary one-way ANOVA. For the null hypothesis to be true the F ratio value is expected to be close to 1.0 and a large F ratio indicates that the variation among haplotype group means is more than expected by chance. A small p value (in the ANOVA column) indicates that it is unlikely that the differences observed are due to random sampling. Bioclimatic variables are summarized as either temperature ("T") or precipitation ("P") associated. The Brown-Forsythe test is reported as a p value and indicates whether the haplotype group populations have different standard deviations and is summarized at a significance level ("B-F significance") as either yes ("Y") or no ("N") (p < .05). ANOVA = analysis of variance.
haplotype at higher elevations in the Iberian cohort ( Figure 5d). Thus, there appears to be a tendency for low altitude, low latitude Iberian accessions to be G/G/U/G/C variants with higher altitude and higher latitude types to be of the G/G/U/G/A type ( Figure S4b  Levels of FS 5′UTR transcripts and transcripts retaining intron 1 (I1R transcripts) were determined (see Section 2). Previous work had demonstrated that FS:I1R levels, at dawn at 20°C for Col-0, was approximately 0.9 (James, Syed, Bordage, et al., 2012), and this ratio was therefore used as the reference against which all other accessions were compared. FS levels for the two subgroups were largely similar, with a tendency for G/G/U/G/Cs to have lower levels post-cooling compared to A/U/G/C/As (Figure 6a,b). More strikingly, however, relict accessions appear to splice a lower proportion of transcripts to I1R under all conditions (Figure 6c,d), with the concomitant result that the splice ratio-the proportion of FS as a fraction of total levels-for relicts was higher compared to G/G/U/G/Cs (Figure 6e,f). This suggests that haplotype does indeed affect the splicing of LHY transcripts in response to cooling.

| DISCUSSION
The circadian clock integrates multiple environmental stimuli, or "inputs," with physiologically relevant "output" processes, and it is now well appreciated that there are fitness costs in running a dysfunctional clock (Dodd et al., 2005;Greenham & McClung, 2015;Yerushalmi & Green, 2009). The clock is keenly tuned to alternating light:dark and temperature cycles and is said to "gate" responsiveness to the environment to particular phases of the day via precise timing -or phasing-of individual clock components and their cognate downstream signalling cascades (Fowler, Cook, & Thomashow, 2005;Hotta et al., 2007). Large portions of the Arabidopsis transcriptome are clock controlled including growth, stress responses, hormone signalling, and metabolism pathways (Covington, Maloof, Straume, Kay, & Harmer, 2008;Harmer et al., 2000). One example is circadian clock gating of the cold response via the C-REPEAT BINDING FACTOR reguloncomprising around 100 or so cold-responsive genes-many of which contain CIRCADIAN CLOCK ASSOCIATED1/LHY cis-binding elements within their promoters (Mikkelsen & Thomashow, 2009;Thomashow, 1999 and SKIP are implicated in temperature compensation (Marshall, Tartaglio, Duarte, & Harmon, 2016;Schlaen et al., 2015;Wang et al., 2012).
We based this analysis of 1,001 genomes accessions on our recent observations of the dynamic responses of LHY AS to temperature, and in particular, the "molecular thermostat" properties of 5′UTR splicingan apparently adaptive response to temperature such that I1R splicing is most prominent during temperature transitions (and not "steady state" temperatures), akin to fluctuations in temperatures that are a hallmark of natural climatic conditions . We focused our attention on five SNPs in the 5′UTR region-two exonic and three intronic-that cluster as four haplotypes. We characterized the A/U/ G/C/A haplotype, common to "relict" accessions, as the most distinct of the haplotypes in the respect that, worldwide, these accessions are found in regions of low rainfall. They are also associated with the highest elevations with low mean annual temperatures and a wider range of maximum-minimum temperatures. Two of the remaining three haplotypes seem to associate with milder annual mean temperatures (A/G/U/G/As and G/G/U/G/Cs) and lower altitude and wetter habitats. Interestingly, G/G/U/G/As seem to be a low temperature "specialist"-This haplotype is commonly found in the mountainous Pyrenees region of northern Spain and is prominent at the limit of Arabidopsis growth in northern Sweden.
It is not known whether the splicing of LHY transcripts is affected by water status, but it is clearly affected by temperature changes. Our data show that the splicing of the LHY 5′UTR on cooling does differ between a small number of representative "relict-like" A/U/G/C/A and 'Col-0 like' G/G/U/G/C accessions, consistent with the notion that the LHY 5′UTR represents a bona fide thermometer that is likely finely tuned to distinct temperature habitats. However, further work will be required to determine if other differences between the haplotypes contribute to these results. The 5′UTR is critical for ribosome recruitment to mRNAs and start codon choice plays a major role in the control of translation efficiency (Hinnebusch, Ivanov, & Sonenberg, 2016  Distinct LHY 5′UTR splicing sensitivity for two haplotypes. Expression levels, at dawn for the denoted temperature conditions, for the individual accessions (Panels (a), (c), and (e)) and means and ±SEM for the grouped haplotypes (Panels (b), (d), and (f); n = 3) for ((a) and (b)) fully spliced (FS) 5′UTR, ((c) and (d)) intron 1 retained (I1R) transcripts, and ((e) and (f)) the splice ratio (FS transcripts as a fraction of total transcripts). Expression levels for individual accessions are derived from pooled tissue (9-13 plants per temperature condition) and from two technical repeats of the qPCR assay (see Section 2) [Colour figure can be viewed at wileyonlinelibrary.com] (Greenup, Peacock, Dennis, & Trevaskis, 2009;Oliver, Finnegan, Dennis, Peacock, & Trevaskis, 2009) and natural variation in intron 1 length in VERNALIZATION1 seems to modulate vernalization sensitivity in a manner that reflects Spring and Winter flowering time habits (Szucs et al., 2007). Regulation of the MADs box FLOWERING LOCUS M (FLM) component of the ambient temperature flowering pathway also appears to be regulated by an intron-associated mechanism. In vernalization-insensitive Arabidopsis accessions flowering in ambient temperatures is largely under control of the precise balance of FLM αand β-alternatively spliced isoforms (Capovilla et al., 2017;Pose et al., 2013;Sureshkumar et al., 2016) and natural variation within intron 1 of FLM results in varying splicing sensitivities that likely are advantageous for flowering time adaptation in the ambient temperature range (Lutz et al., 2015). Here, the primary sensing mechanism feeding into FLM splicing is unknown, but as with LHY splicing may conceivably involve the perturbation of a network of temperatureassociated isoform switching RNA binding proteins that include PTB1, U2 auxiliary factor 65A, and SUA .
At this stage, we cannot rule out the possibility that other SNPs cosegregate with the LHY SNPs to provide the correlation patterns presented here, and future work will require the assessment of the influence of LHY SNPs for temperature-associated splicing sensitivity in isogenic or near isogenic backgrounds. We conclude that LHY 5′ UTR haplotypes-possessing distinct pre-mRNA folding stabilities and/or biological stabilities display a range of temperature specialisms that may have enabled Arabidopsis to colonize new temperature habitats. Given that global climate change is likely to have major but unpredictable effects on plant diversity and crop yields (Chakraborty & Newton, 2011;Hatfield et al., 2014;McClung & Davis, 2010;Moore & Lobell, 2015;Mora et al., 2015;Thuiller, Lavorel, Araujo, Sykes, & Prentice, 2005;Wheeler & von Braun, 2013), insights as to how plants perceive and integrate temperature information via the clock to physiologically relevant outputs and how evolution drives innovations in plants responses to temperature is likely of value to enhanced crop breeding programs.