Flowering time and transcriptome variation in Capsella bursa-pastoris (Brassicaceae)

Authors

  • Hui-Run Huang,

    1. Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, the Chinese Academy of Sciences, Guangzhou 510650, China
    2. State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, the Chinese Academy of Sciences, Beijing 100093, China
    Search for more papers by this author
  • Peng-Cheng Yan,

    1. MOE Key Laboratory for Biodiversity Science and Ecological Engineering and College of Life Sciences, Beijing Normal University, Beijing 100875, China
    Search for more papers by this author
  • Martin Lascoux,

    1. Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, SE-752 36 Uppsala, Sweden
    2. Laboratory of Evolutionary Genomics, CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, the Chinese Academy of Sciences, Shanghai, China
    Search for more papers by this author
  • Xue-Jun Ge

    1. Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, the Chinese Academy of Sciences, Guangzhou 510650, China
    Search for more papers by this author

Author for correspondence:
Xue-Jun Ge
Tel: +86 20 3725 2551
Email: xjge@scbg.ac.cn

Summary

  • Flowering is a major developmental transition and its timing in relation to environmental conditions is of crucial importance to plant fitness. Understanding the genetic basis of flowering time variation is important to determining how plants adapt locally.
  • Here, we investigated flowering time variation of Capsella bursa-pastoris collected from different latitudes in China. We also used a digital gene expression (DGE) system to generate partial gene expression profiles for 12 selected samples.
  • We found that flowering time was highly variable and most strongly correlated with day length and winter temperature. Significant differences in gene expression between early- and late-flowering samples were detected for 72 candidate genes for flowering time. Genes related to circadian rhythms were significantly overrepresented among the differentially expressed genes.
  • Our data suggest that circadian rhythms and circadian clock genes play an important role in the evolution of flowering time, and C. bursa-pastoris plants exhibit expression differences for candidate genes likely to affect flowering time across the broad range of environments they face in China.

Introduction

Flowering time is an important fitness trait for species with short life cycles, in that flowering at the wrong time can result in the failure of a plant to reproduce. Thus, geographically widespread plant species often show extensive variation in flowering time (Riihimaki et al., 2005; Franke et al., 2006; Matsuoka et al., 2008), exhibiting genetically based clines for flowering time along latitudinal and/or altitudinal gradients, for example Solidago spp. (Weber & Schmid, 1998), Arabidopsis thaliana (Stinchcombe et al., 2004; Montesinos-Navarro et al., 2011), and Lythrum salicaria (Montague et al., 2008). Understanding the genetic mechanisms controlling flowering time, especially identifying the main genes responsible for natural variation in flowering time between different populations, is clearly important in determining how plants adapt locally and are able to reproduce over a wide range of latitudes and altitudes.

In the model plant A. thaliana, genes belonging to four main pathways (the vernalization, autonomous, light-dependent, and GA pathways) are involved in the control of flowering time (Mouradov et al., 2002; Amasino, 2010). Several flowering time pathways (e.g. vernalization and autonomous) converge on FLOWERING LOCUS C (FLC), a MADS-box transcription regulator that represses flowering (Crevillen & Dean, 2010). FLC expression levels are correlated with flowering time (Michaels et al., 2003; Lempe et al., 2005), and a flowering time quantitative trait locus (QTL) cluster has been found in a region including the locations of the entire FLC clade of transcription factor genes (Saloméet al., 2011). FLC and its activator FRIGIDA (Johanson et al., 2000; Choi et al., 2011) have been shown to be major determinants of flowering time variation in A. thaliana raised under experimental conditions (Le Corre et al., 2002; Michaels et al., 2003; Shindo et al., 2005), although circadian clock genes may possibly play an even more important part under natural conditions. Indeed, several circadian clock-related genes, such as CIRCADIAN CLOCK ASSOCIATED 1 (CCA1), TIMING OF CAB EXPRESSION 1 (TOC1), CYCLING DOF FACTOR 3 (CDF3) and CONSTANS-LIKE 1 (COL1), were detected in an association mapping study of flowering time in A. thaliana (Brachi et al., 2010). Gene expression variation in the light-dependent pathway has been suggested to correlate with photoperiodic flowering in nonmodel species, such as soybean cultivars (Zhang et al., 2008) and common sunflower (Blackman et al., 2011). Although less is known about the control of flowering time in other plant species, the available data suggest that the same pathways are involved, although individual genes might have different importance (e.g. Lagercrantz, 2009).

Capsella bursa-pastoris is a herbaceous, predominantly selfing, tetraploid plant, notable for its wide geographical distribution (Neuffer et al., 2011). Capsella is a small genus that contains only three species: the tetraploid C. bursa-pastoris and the two diploids Capsella rubella and Capsella grandiflora. C. bursa-pastoris and C. rubella are selfers, while C. grandiflora is an outcrosser. A recent study indicates that C. bursa-pastoris is an autopolyploid of C. grandiflora (St. Onge et al., 2012). Because Capsella is one of the most closely related genera to A. thaliana (German et al., 2009; Franzke et al., 2011), it is easy to transfer molecular genetic resources developed for A. thaliana to C. bursa-pastoris. There are both winter-annual (late-flowering) ecotypes and summer-annual (early-flowering) ecotypes in China, and investigations in Europe and America have revealed significant geographical differences in flowering time in the species (Neuffer et al., 2011). For example, ecotypes in Scandinavia and southern Spain flower early, whereas ecotypes from intermediate latitudes flower late (Neuffer & Hurka, 1986; Neuffer & Bartelheim, 1989; Neuffer & Hoffrogge, 2000). Besides, C. bursa-pastoris showed clinal differentiation in flowering time along a 2500 km latitudinal transect in Russia: the most northern and most southern provenances flowered earlier than intermediate provenances (Neuffer, 2011). Flowering time is delayed with altitude in alpine climates in the Alps in Europe (Neuffer & Hurka, 1986) and the Sierra Nevada in North America (Neuffer & Hurka, 1999), whereas populations at high elevations in subarctic regions such as Norway (Neuffer & Hurka, 1986) and at locations where summers are hot and dry, as in southern Spain, flower early (Neuffer & Hoffrogge, 2000).

The evolutionary history of C. bursa-pastoris may have contributed to the population differentiation of flowering time. C. bursa-pastoris is believed to have originated in the eastern Mediterranean region, and subsequently spread westwards to Europe where introgressive hybridization with diploid C. rubella took place, and eastwards to Asia where C. rubella does not grow (Hurka & Neuffer, 1997; Ceplitis et al., 2005; Slotte et al., 2008). C. bursa-pastoris was very recently introduced to North America by European settlers, and variation patterns of flowering time there can be traced back to the introduction of preadapted genotypes (Neuffer & Hurka, 1999). Worldwide surveys of nuclear/chloroplast genetic diversity in C. bursa-pastoris have revealed limited variation within the species and suggest that it recently went through a rapid expansion such that ecotypic differentiation of flowering time is likely to have evolved recently (Ceplitis et al., 2005; Slotte et al., 2006, 2008). The latter is likely to be true, particularly for the species in China where the species has been shown to exhibit much lower nuclear genetic diversity and a less pronounced genetic structure than in Europe (Slotte et al., 2008, 2009). Although Chinese C. bursa-pastoris is derived from European material that spread to eastern Eurasia relatively recently (21–64 ka) (Slotte et al., 2008), the species is today widely distributed across China occurring in a broad array of complex environmental conditions ranging from subtropical climatic conditions in the south to more extreme environments in the northwest and northeast. Chinese C. bursa-pastoris can thus be used as an independent replicate to studies carried out in the European part of the species range to address questions about the importance of shared ancestry and parallel evolution in the development of flowering time clines in different parts of the world.

In Capsella, Linde et al. (2001) found that three major QTLs accounted for onset of flowering, and this was the first evidence of the multigenic control on this trait. Later studies have suggested that there are both differences and similarities in the genetic control of flowering time in the European and Chinese parts of the natural range. An association study between flowering time and sequence polymorphism at FLC, FRIGIDA, CRYPTOCHROME 1 (CRY1) and LUMINIDEPENDENS (LD) revealed that single nucleotide polymorphisms (SNPs) at CRY1 and FLC were significantly associated with flowering time variation in western Eurasia, whereas in China CRY1 was monomorphic and a different SNP at FLC was significantly associated with flowering time variation (Slotte et al., 2009). Other flowering time genes, such as FRIGIDA, CRY1 and LD, were either monomorphic or exhibited almost no sequence variation in the Chinese plants studied, despite notable variation in flowering time (Slotte et al., 2009). On the other hand, Slotte et al. (2007) found good agreement of flowering time gene expression differences in comparisons between two pairs of accessions, one pair comprising an early-flowering accession from Taiwan and a late-flowering accession from northern Europe, and the other comprising both early- and late-flowering accessions from California. They noted that this could indicate that the genetic basis of expression differences is shared by common ancestry, or that similar regulatory differences have evolved in parallel. Interestingly there were many key circadian clock genes among the genes that were differentially expressed between early- and late-flowering accessions. Further analysis of gene expression differences among different flowering C. bursa-pastoris ecotypes from a broad array of environmental conditions in China would help to clarify the situation further.

In the present study, we broadened the analysis of flowering time variation in C. bursa-pastoris to samples collected from multiple environments in China. We also constructed gene expression profiles of 12 different samples representing extremes of flowering time using the Solexa/Illumina’s Digital Gene Expression (DGE) system. The DGE system allows an examination of variation in gene expression across many genes at the same time and has been successfully applied to studies of gene expression in different animal species (Harhay et al., 2010; Veitch et al., 2010; Pemberton et al., 2011), including transcriptome response to virus infection (Hegedus et al., 2009; Basu et al., 2011). A few studies have also used DGE to look at gene expression differences in plant species (e.g. maize (Eveland et al., 2010) and cucumber (Qi et al., 2012)). The aims of the study, therefore, were to examine the pattern of flowering time variation for C. bursa-pastoris in China; to examine further whether circadian clock genes are strong candidates for the evolution of adaptive flowering time variation as indicated previously (Slotte et al., 2007; K. Holm et al., unpublished); and to assess whether C. bursa-pastoris plants exhibit different patterns of genetic variation from one another for candidate genes likely to affect flowering time across the broad range of environments they face in China.

Materials and Methods

Sample collection and growth chamber experiment

Seeds were collected from 37 populations of C. bursa-pastoris (L.) Medic., representing most of the species’ distribution range in China (Fig. 1, Supporting Information Table S1). The seed samples within populations were collected at intervals of a few m apart to avoid the collection of the same individual. Geographical coordinates of populations were determined using GPS. These populations are distributed in a broad range of environmental conditions, from subtropical to temperate/alpine. Data on mean temperature of the coldest quarter of a year (MTCQ) and annual precipitation (AP) of each population were extracted from the software DIVA-GIS (http://www.diva-gis.org/) based on the WORLDCLIM data set (resolution 2.5 min, Hijmans et al., 2005). Data on day length of each population on 22 June 2011 were obtained from the Era Shuttle Calendar (http://www.agr.cn/Calendar.htm). All of these data are presented in Table S1.

Figure 1.

Distribution of the 37 analyzed Capsella bursa-pastoris populations in China. The populations are presented in numerical order, and the capital letters in brackets indicate the samples used in the digital gene expression (DGE) analysis. The dashed line indicates the division between the northwest and east populations. The northwest has high plateaus with very cold winters and the lowest annual rainfall in the country. The elevations of all northwest populations are > 400 m, but the elevations of the east populations are < 200 m (except for populations 2 and 3).

Seeds were stratified on moist filter paper for 4 d at 4°C and subsequently transferred to soil in pots, which were fully randomized in a growth chamber at 23°C with a 16 : 8 h light : dark regime. Flowering time was calculated as the number of d from germination to flowering. A nonparametric test, H-test of Kruskal & Wallis (1952), was performed to analyze the total variance of flowering time. For brevity, geographical data (latitude, longitude, and altitude), MTCQ, AP and day length are referred to as environmental factors in further analyses. To solve the problem of high correlations among some environmental factors, for example, day length is highly correlated with latitude (= 0.99, P < 0.0001), principal component regression was used to investigate the relationships between flowering time and the environmental factors: first, a principal component analysis was used to project the factors onto a lower dimensional space; second, a regression analysis was performed using the first two principal components as the independent variables and flowering time as the response variable; and finally, the parameters of the regression model were computed for the environmental factors according to the coefficients of the principal components. The R package (http://www.r-project.org/) was used for these analyses.

3′ tag DGE

Eleven populations were chosen for the DGE experiment (Fig. 1, Table S1). According to the records of flowering time, three individuals with similar flowering time were chosen to represent a sample for each selected population; only for population 32 two samples (F and L) were used. We estimated the mean flowering time of the three individuals for each sample as follows: A (30 d), B (31 d), C (40 d), D (38 d), E (35 d), F (39 d), G (129 d), H (93 d), I (95 d), J (107 d), K (94 d) and L (78 d); accordingly we divided these samples into two groups: early-flowering (A, B, C, D, E, F) and late-flowering (G, H, I, J, K, L). Seeds of each individual were stored at 4°C for 4 d in distilled water to break seed dormancy, and then sterilized using 75% ethanol and 0.1% HgCl2. For each individual the surface-sterilized seeds were sown on an 0.8% agar plate with Murashige and Skoog medium (Murashige & Skoog, 1962), and subsequently the plates were placed randomly under long-day conditions as stated earlier (16 : 8 h photoperiod, 23°C). Two-week-old seedlings were sampled and immediately flash-frozen in liquid nitrogen. We measured gene expression in seedlings because previous studies have shown that several flowering time regulators are expressed at a very early stage in C. bursa-pastoris (Slotte et al., 2007) and A. thaliana (Keurentjes et al., 2007). Sampling took place 7 h after dawn. For each sample, seedlings of a similar size were chosen, and equal numbers of seedlings for each individual were pooled together for RNA extraction. RNA was extracted from frozen seedlings using RNAiso Plus and RNAiso-mate for Plant Tissue (TaKaRa, Dalian, China) according to the manufacturer’s protocols. RNA integrity and concentration were evaluated on an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). In total, 12 RNA libraries were constructed.

Approximately 6 μg of RNA representing each library was run on an Illumina Genome Analyzer for sequencing (Beijing Genome Institute, Shenzhen, China). Tag library preparation was performed using Tag Profiling for NlaIII Sample Prep kit (Illumina, San Diego, CA, USA) according to the manufacturer’s instructions. In brief, total RNA was incubated with magnetic oligo(dT) beads to capture mRNA. First- and second-strand cDNA was synthesized and bead-bound cDNA was subsequently digested with NlaIII. GEX NlaIII adapter 1 was ligated at the site of NlaIII cleavage. This adapter contains a restriction site for MmeI that cuts 17 bp downstream from the NlaIII site, thus creating 21 bp tags starting with the NlaIII recognition sequence, CATG. After removing the 3´ fragment via magnetic bead precipitation, GEX NlaIII adapter 2 was ligated at the site of MmeI cleavage. The adapter-ligated cDNA tags were subsequently enriched using PCR-primers that annealed to the adapter ends. The amplified and purified tags were then sequenced on an Illumina Genome Analyzer according to the manufacturer’s protocols. An Illumina pipeline was used for off-instrument data processing, including image analysis, base calling, extraction of 17 bp tags and tag counting. After filtering adaptor tags, low-quality tags and tags of copy number = 1, we classified the remaining tags (clean tags) according to their copy number in the library. The tag sequences and counts have been submitted to Gene Expression Omnibus (GEO) under series GSE28624.

All tags were mapped to the A. thaliana genome TAIR9 released in The Arabidopsis Information Resource (http://www.arabidopsis.org/index.jsp), the most closely related fully annotated genome available to C. bursa-pastoris, using MAQ program, ver. 0.7.1 (Li et al., 2008), allowing for a 2 bp mismatch between the tags and the references. Tags that were generated with a poor mapping quality score (< 30) were removed from further analysis. Gene expression levels were represented by high-quality tag count numbers. When there were multiple types of tags aligned to different locations of the same gene, the gene expression levels were represented by the sum of all. Tags that mapped to multiple genomic locations were excluded from further analyses. Differential gene expression of 36 early vs late pairwise comparisons based on the 12 libraries was calculated. The Bioconductor package edgeR (Robinson et al., 2010) was used for gene normalization and statistical comparisons. The edgeR package uses empirical Bayes estimation and exact tests based on the negative binomial distribution to provide P-values associated with changes in expression between samples (Robinson & Smyth, 2007, 2008). The false discovery rate (FDR) was estimated to determine the threshold of P-value in multiple tests (Benjamini & Hochberg, 1995). Our significance threshold for differential expression was < 0.05 after correction using FDR of 1%.

We assembled a list of 298 candidate genes involved in flowering regulation in A. thaliana (Mouradov et al., 2002; Slotte et al., 2007), most of which have a gene ontology biological process annotation that contained the terms ‘circadian rhythm’, ‘flower development’, ‘vegetative to reproductive phase transition’, ‘photoperiod’, ‘vernalization response’, or ‘gibberellic acid’ (Table S2). We tested for statistical overrepresentation of differentially expressed genes in the above six categories, using Fisher’s exact test with < 0.05 as the threshold to judge the significance of overrepresentation. We used complete linkage with Euclidean distance to generate the dendrograms of the 36 pairwise comparisons and the significant flowering time genes, based on the differential expression of these genes; up- and down-regulation of expression levels in late-flowering samples were represented by ‘1’ and ‘–1′, and nondifferential expression were represented by ‘0’ in the clustering matrix. The software Genesis (Sturn et al., 2002) was used for clustering. Based on the clustering pattern, we then justified the clusters of the pairwise comparisons according to the sample sources, that is, cluster x contains the comparisons between the sample named x and other samples.

Results

Phenotypic variation in flowering time

Descriptive statistics for flowering time of each population are presented in Table S1. Flowering time in Chinese C. bursa-pastoris varied from 23 to > 200 d across the 387 individuals and 37 populations examined under specific long-day conditions (Fig. 2). A Kruskal–Wallis test showed that flowering time variation among all studied populations was significant (χ2 = 255.9194, < 0.001).

Figure 2.

Boxplot of flowering time of the 37 Capsella bursa-pastoris populations. These boxes are presented according to the latitudes. In each box, the central mark is the median, the edges of the box represent the 25th and 75th percentiles (the lower and upper quartiles), the ends of the whiskers represent the lowest datum within 1.5 × interquartile range (IQR) of the lower quartile and the highest datum within 1.5 × IQR of the upper quartile, and outliers are plotted individually.

The proportion of variance and the loadings of each component in the principal component analysis are shown in Table S3. The first principal component (PC1), which explained 68% of the variation, was most strongly associated with MTCQ, AP, latitude and day length. The second principal component (PC2), which explained 26% of the variation, was mainly associated with longitude and altitude. A PCA plot is presented in Fig. 3, showing the similarities in environmental factors between the populations. The regression analysis was carried out with PC1 and PC2 as independent variables and flowering time as response variable, and addressed the effects of the environmental factors on flowering time (Table 1). Flowering time was most strongly influenced by day length, and was moderately associated with latitude, MTCQ and longitude. Although the regression analysis suggested a positive relationship between latitude of origin and flowering time among all studied populations, some populations in northwest China flowered very early although they are located in high-latitude regions (e.g. populations 26, 28; Fig. 2). These populations are also from high altitudes, which separate them from most of the other populations (Fig. 3, Table S1). The northwest has high plateaus with very cold winters and the lowest annual rainfall in the country. To investigate the local pattern of flowering time diversification in this area, we characterized the populations located in the range with latitude > 33°50′N and longitude < 112°40′E as a group in the regression analysis (hereafter called northwest populations, for brevity) (Fig. 1). A regression analysis was also done for the remaining populations (east populations). We found that the regression pattern of the east populations was quite similar to that of the overall populations. However, the pattern in the northwest was somewhat different from the overall or east populations, with a contrasting relationship between flowering time and MTCQ and AP. Flowering time decreased with MTCQ and AP in the east populations, but increased with MTCQ and AP in the northwest populations.

Figure 3.

Principal component analysis of the environmental factors of the Capsella bursa-pastoris populations studied. Component 1 explains 68% of the variability, while component 2 explains 26%. The populations in the open circle represent the northwest populations. LAT, latitude; LONG, longitude; ALT, altitude; MTCQ, mean temperature of the coldest quarter; AP, annual precipitation; DL, day length.

Table 1.   Coefficients of the regression models for overall, east and northwest populations of Capsella bursa-pastoris, using environmental factors as independent variables and flowering time as response variable
 Environmental factors
LatitudeLongitudeAltitudeDay lengthMTCQAP
  1. MTCQ, mean temperature of the coldest quarter; AP, annual precipitation.

Overall0.6510.336−0.0025.975−0.409−0.004
East0.5110.476−0.0034.653−0.479−0.005
Northwest0.4740.813−0.0174.1331.1320.005

Digital gene expression

The major characteristics of the 12 DGE libraries are summarized in Table 2. After filtering adapter tags, low-quality tags, and tags of copy number = 1, there were c. 42.8 million clean tags in all libraries, with the total number of clean tags per library ranging from 3.3 to 3.8 million and the number of distinct clean tag sequences ranging from 63 595 to 95 541. The G library had the highest number of total sequence tags (3 742 840), while the I library had the highest number of distinct sequence tags (95 541). In all libraries high-expression tags with copy numbers > 100 were in absolute dominance (> 70%), whereas low-expression tags with copy numbers < 10 occupied the majority of distinct tag distributions (> 68%) (Fig. S1). There were 11 283 tag-mapped genes in the 12 libraries. The dynamic range of DGE spanned five orders of magnitude. The most abundant transcript in all 12 libraries was that for AT2G26010, a gene predicted to encode a pathogenesis-related protein, with a maximum tag count in the A library of 64 892 tags. However, the tag counts for the majority of genes were low in all libraries. Between 12.67 and 16.09% of distinct clean tags were mapped to the Arabidopsis database and the ratio of number of tag-mapped genes to reference genes per library ranged from 15.73 to 19.85% (Table 2).

Table 2.   Basic characteristics of the 12 digital gene expression (DGE) libraries
LibrariesRaw tag countClean tag countTags mapped to genesAll tag-mapped genes
TotalDistinctTotalDistinctTotal% of total clean tagsDistinct% of distinct clean tagsNumber% of reference genes
A358549617154434576717139752488715.20941013.18597216.26
B374005319380936060688786164791617.971127212.83666718.15
C350412317977433852478667861212118.081312315.14729119.85
D382211516958637263738238660147416.141184614.38682018.57
E372667912611536339756359565326317.98978515.39577915.73
F361795616729935200727762367754719.241230415.85682718.59
G383689216105637428407555668499118.301130814.97631717.20
H360147417799234900177435354473015.611023713.77613516.70
I365662120858535362949554159457316.811210512.67702219.12
J353617118196334278758118560947717.781227715.12692418.85
K373673018282736245687873259556516.431107514.07629017.12
L374468314780536613347310569046318.861176116.09647117.62

Approximately 1000–2500 differentially expressed genes were found in the 36 early vs late comparisons under specific long-day conditions. Seventy-two genes listed in the prior flowering time gene list were found to be differentially expressed in at least one of the 36 pairwise comparisons (hereafter called ‘candidate genes for flowering time’; Fig. 4, Table 3). The ratio of number of candidate genes to the tag-mapped genes in the prior list for each comparison ranged from 13.21% (D vs G) to 33.63% (B vs G). The expression differences of flowering time genes within the same population (F vs L) were among the smallest, with a ratio of 13.27%. Interestingly, the Fisher’s exact test showed a significant overrepresentation of differentially expressed genes only in the category ‘circadian rhythm’ (circadian rhythm, = 0.032; flower development, = 1; vegetative to reproductive phase transition, = 0.785; photoperiod, = 0.068; vernalization response, = 0.213; gibberellic acid, = 0.868). These genes were ARR4, LWD1, GI, FKF1, MPK7, CCA1, CRY1, PRR7, CHE, PRR5, ZTL, and TOC1, which are part of, or closely related to, the circadian clock. Unexpectedly, the expression of FLC, which is involved in the convergence of the autonomous and vernalization pathways in Arabidopsis, was not detected in this study.

Figure 4.

The regulatory pattern of the 72 candidate genes for flowering time in Capsella bursa-pastoris. Each column represents expression differences of all candidate genes in a pairwise comparison between early- and late-flowering samples; each row represents expression differences for one candidate gene in all comparisons between early- and late-flowering samples; red and green, genes were up- and down-regulated in late-flowering samples, respectively; black, no differences of gene expression in the comparison. Based on the expression differences of these candidate genes, the 36 comparisons were divided into six clusters (A, B, C, E and F, I, D). Each cluster contains the comparisons comprising one to two specific samples, for example, cluster A contains the comparisons of sample A vs other samples; the 72 candidate genes were divided into two large clusters (a and b).

Table 3.   List of candidate genes for flowering time in the digital gene expression (DGE) analysis
Gene numberGene nameGene ontology biological processReference
AT1G06180MYB13Gibberellic acid Kirik et al. (1998)
AT1G10470ARR4Circadian rhythm Saloméet al. (2006)
AT1G12910LWD1Circadian rhythm Wu et al. (2008)
AT1G13260RAV1Flower development Hu et al. (2004)
AT1G14920GAIGibberellic acid Dill & Sun (2001)
AT1G22690AT1G22690Gibberellic acid 
AT1G22770GICircadian rhythm, flower development Mizoguchi et al. (2005)
AT1G25560TEM1Photoperiod Castillejo & Pelaz (2008)
AT1G26830CUL3Flower development Dieterle et al. (2005)
AT1G61040VIP5Flower development Oh et al. (2004)
AT1G68050FKF1Circadian rhythm, flower development Imaizumi et al. (2003)
AT1G68480JAGFlower development Dinneny et al. (2004)
AT1G68840TEM2  Castillejo & Pelaz (2008)
AT1G69490NAPFlower development Sablowski & Meyerowitz (1998)
AT1G69570CDF5  Fornara et al. (2009)
AT1G74840AT1G74840Gibberellic acid Chen et al. (2006)
AT2G01570RGAGibberellic acid Dill & Sun (2001)
AT2G02760UBC2Vegetative to reproductive phase transition Xu et al. (2009)
AT2G14900AT2G14900Gibberellic acid 
AT2G16720MYB7Gibberellic acid Li & Parish (1995)
AT2G18170MPK7Circadian rhythm Rao et al. (2009)
AT2G19520FVEFlower development Pazhouhandeh et al. (2011)
AT2G26300GPA1Gibberellic acid Ullah et al. (2002)
AT2G34720NF-YA4Vegetative to reproductive phase transition Wenkel et al. (2006)
AT2G36830AT2G36830Gibberellic acid 
AT2G42200SPL9Vegetative to reproductive phase transition Wang et al. (2009)
AT2G46830CCA1Circadian rhythm, gibberellic acid Alabadi et al. (2001)
AT3G02380COL2Flower development Ledger et al. (2001)
AT3G02885GASA5Gibberellic acid Zhang et al. (2009)
AT3G03450RGL2Gibberellic acid Tyler et al. (2004)
AT3G04610FLKFlower development Lim et al. (2004)
AT3G12810PIE1Flower development Noh & Amasino (2003)
AT3G15354SPA3  Laubinger et al. (2006)
AT3G26640LWD2Photoperiod Wu et al. (2008)
AT3G28910MYB30Gibberellic acid Vailleau et al. (2002)
AT3G33520ESD1Flower development Martin-Trillo et al. (2006)
AT3G44880ACD1Flower development Pruzinska et al. (2003)
AT3G47500CDF3  Fornara et al. (2009)
AT3G48590NF-YC1  Wenkel et al. (2006)
AT3G49600UBP26  Schmitz et al. (2009)
AT3G50060MYB77Gibberellic acid Shin et al. (2007)
AT3G55730MYB109Gibberellic acid Chen et al. (2006)
AT3G63010GID1BGibberellic acid Griffiths et al. (2006)
AT4G02440EID1Photoperiod Dieterle et al. (2001)
AT4G08920CRY1Circadian rhythm Lin (2002)
AT4G14540NF-YB3  Wenkel et al. (2006)
AT4G16250PHYD  Aukerman et al. (1997)
AT4G24540AGL24Vegetative to reproductive phase transition, vernalization response, gibberellic acid Yu et al. (2002)
AT4G29010AIM1Flower development Richmond & Bleecker (1999)
AT4G30270SEN4Gibberellic acid Gan & Amasino (1997)
AT4G32551LUGFlower development Liu & Meyerowitz (1995)
AT4G32980ATH1Gibberellic acid Proveniers et al. (2007)
AT4G36920AP2Flower development Jofuku et al. (1994)
AT4G38620MYB4Gibberellic acid Hemm et al. (2001)
AT4G39400BRI1Flower development Domagalska et al. (2007)
AT5G02030PNYVegetative to reproductive phase transition Kanrar et al. (2008)
AT5G02810PRR7Circadian rhythm Nakamichi et al. (2007)
AT5G08330CHECircadian rhythm Pruneda-Paz et al. (2009)
AT5G13480FYFlower development Simpson et al. (2003)
AT5G17490RGL3Gibberellic acid Tyler et al. (2004)
AT5G23150HUA2Vegetative to reproductive phase transition, flower development Doyle et al. (2005)
AT5G24470PRR5Circadian rhythm Nakamichi et al. (2007)
AT5G25900GA3Gibberellic acid Helliwell et al. (1998)
AT5G37260CIR1Gibberellic acid Zhang et al. (2007)
AT5G46210CUL4Flower development, photoperiod Chen et al. (2010)
AT5G47390MQL5.25Gibberellic acid Ikeda & Ohme-Takagi (2009)
AT5G47640NF-YB2  Wenkel et al. (2006)
AT5G51810GA20ox2Flower development, gibberellic acid Rieu et al. (2008)
AT5G57360ZTLCircadian rhythm, flower development Somers et al. (2000)
AT5G60120TOE2  Aukerman & Sakai (2003)
AT5G61380TOC1Circadian rhythm Alabadi et al. (2001)
AT5G65790MYB68Gibberellic acid Chen et al. (2006)

Based on the expression differences of these candidate genes, the 36 pairwise comparisons between early- and late-flowering samples could be grouped into six clusters: A, B, C, E and F, I and D (Fig. 4). Each cluster contained several comparisons comprising one to two specific early-flowering samples (with the exception of cluster I), suggesting that there were diverse regulatory patterns of flowering time among the early-flowering samples, apart from cluster I. We divided the candidate genes for flowering time into two clusters (a and b, Fig. 4). Half the genes in cluster b, such as CHE, UBC2, CRY1, TOC1 and SEN4, in contrast to genes in cluster a (13.2%), exhibited differential expression in at least half of the comparisons between early- and late-flowering samples. Unidirectional regulation in all differentially expressed comparisons only occurred in seven genes (CCA1, FKF1, ZTL, RGL2, RGL3, VIP5, and AIM1). All of these genes were up-regulated in late-flowering samples. The evening-phased clock genes, such as TOC1, CHE, and GI, were expressed differentially in many comparisons (69.4, 77.8, and 50% of comparisons, respectively), but the morning-phased clock genes, such as CCA1 and PRR7, were expressed differentially only in 16.7 and 11.1% of comparisons, respectively. However, ZTL, a gene encoding a protein that interacts with GI to degrade TOC1 in the dark, did not show great difference in contrast to other evening-phased genes (8.3% of comparisons). Apart from some evening-phased clock genes, several genes related to the circadian rhythm, for example, ARR4, LWD1, and CRY1, were also expressed differentially in many comparisons (ARR4, 47.2%; LWD1, 63.9%; CRY1, 72.2%).

Discussion

Variation in flowering time

In the present study, as in previous ones (Neuffer, 1990; Hurka & Neuffer, 1997; Ceplitis et al., 2005; Slotte et al., 2009), it was found that flowering time is highly variable among C. bursa-pastoris populations. There was a significant, although relatively weak, latitudinal cline in flowering time, suggesting that flowering time variation among Chinese populations could be adaptive. The weakness of the cline, also observed for circadian rhythms (K. Holm et al., unpublished), could be the result of a high diversity of environments across China, since different patterns of correlations between flowering time and environmental factors were suggested to exist in diverse environments (Table 1). The weak correlation could also be the result of the young age of the Chinese populations (Slotte et al., 2008).

Day length is not only highly correlated with flowering time among Chinese C. bursa-pastoris, but also a key environmental signal that affects flowering time in other temperate plant species. For instance, strong correlations between day length and phenotypes related to development time and flowering were observed in A. thaliana (Hancock et al., 2011). Photoperiodic control of flowering time is also believed to affect the latitudinal distribution of soybean (Zhang et al., 2008). Apart from day length, winter temperature might be one of the factors shaping the flowering time variation of C. bursa-pastoris in local environments. In the eastern part of China, early-flowering plants were found to occur at low latitudes with mild winters where they can have multiple generations per yr, as recorded for other species (e.g. A. thaliana (Le Corre et al., 2002), Beta vulgaris (VanDijk et al., 1997) and Delphinium (Katsutani & Ikeda, 1997)). In northwest China, early-flowering phenotypes predominate in areas with severe winters, where rapid growth is needed before the temperatures become too low. In China the 6°C isotherm of average January temperature has been identified as the cultivation border between single- and double-crop rice, with the latter growing in areas above 6°C; the −6°C January isotherm is also the boundary between spring and winter wheat, with spring wheat growing in regions below –6°C (Lu, 1946). Because the distributions of C. bursa-pastoris, rice and wheat generally overlap in China, the coincidence of generation time between C. bursa-pastoris and crop cultivation supports the hypothesis that flowering time difference in C. bursa-pastoris could be affected by winter temperature. For C. bursa-pastoris, it was suggested that flowering was strongly influenced not only by temperature but also by rainfall over a wide range of different climatic conditions (Steinmeyer et al., 1985); however, this study indicated that annual precipitation only had little effect on flowering time (Table 1).

Although we used seeds collected from plants grown in the field, maternal effects probably contributed little to the among-population differentiation and the latitudinal cline of flowering time observed, because seeds of C. bursa-pastoris are very small, and the differences in flowering time among populations from large-scale geographic ranges were considerable (23–200 d). In addition, maternal effects are usually most pronounced in early life-history stages and are less likely to account for large trait differences that persist during later stages (Rossiter, 1996).

Candidate genes for flowering time

Only 12.67–16.09% of distinct clean tags in the DGE analysis could be mapped to the Arabidopsis database. One explanation for the occurrence of a large number of unknown tags could be the c. 10 million yr of divergence and the difference in number of chromosomes between C. bursa-pastoris and A. thaliana (Koch et al., 2000, 2001). For example, the expression of FLC was not detected in this study; as a rapidly evolved gene, at least in Arabidopsis (Nah & Chen, 2010), the FLC tag of A. thaliana may have changed in Capsella. The proportion of tags mapped to genes were very variable between some samples (e.g. > 4% between A and F, Table 2), which could be the result of the rapid divergence of expression levels between the samples, as rapid evolution on quantitative traits can occur in a few generations given a strong selection force (Cheptou et al., 2008). Alternatively, this could be the result of isolation by geographic distance; however, we did not detect the significant link between geographic distance and variability of the ratios using linear regression (R2 = 0.0002, = 0.9177).

In this study we found a significant overrepresentation of differentially expressed genes among genes related to circadian rhythms, in agreement with our suggestion that day length is one of the key environmental signals affecting flowering time in C. bursa-pastoris. The Arabidopsis clock is composed of at least three interlocking loops (Imaizumi, 2010; Pruneda-Paz & Kay, 2010), including a morning loop formed by CCA1/LHY and two PSEUDO-RESPONSE REGULATORS (PRR7 and PRR9) and an evening loop formed by TOC1 and a hypothetical clock component Y (e.g. GI; Locke et al., 2005). Here we found that the morning-expressed genes expressed differentially only in a few comparisons (CCA1: 16.7%, PRR7: 11.1%) between early- and late-flowering C. bursa-pastoris, but the evening-expressed genes expressed differentially in many comparisons (TOC1, 69.4%; CHE, 77.8%; GI, 50%). Thus, our results suggest that genes expressed in the evening loop are more involved in flowering time variation than those involved in the morning loop, but the reason for this difference is unclear. Circadian rhythm was found to be important for flowering time variation in both C. bursa-pastoris and A. thaliana (Slotte et al., 2007; Brachi et al., 2010; Hancock et al., 2011), suggesting a parallel evolution of similar regulatory differences in these closely related species.

Although C. bursa-pastoris is widely distributed across China, it does not seem too likely that the Chinese populations would have multiple origins, because there is no population structure among the Chinese plants (Slotte et al., 2009). Also, the results of genotyping PHYTOCHROME B suggest that all Russian populations belong to one group and all the Chinese populations to another, even for those close to the Chinese border (A. Keele & K. Holm, unpublished). Despite a likely single and recent origin of Chinese populations, the DGE analysis showed that most genes were polymorphic with respect to the direction of expression differences among different comparisons, indicating that the evolution of flowering time in Chinese C. bursa-pastoris could involve different sets of genes in different regions. Sample A, B, C, and D came from a subtropical zone (Zheng et al., 2010) where the early-flowering plants may be favored in mild climates or have multiple generations in 1 yr. The two high-latitude samples, E and F, originated from a temperate zone where they may be selected to escape severe winters or forced by the very long photoperiod (Neuffer, 2011). E and F were grouped together, as were the two low latitude samples A and B (Fig. 4), indicating that populations from similar latitudes can exhibit similar expression patterns. However, owing to the young age of the Chinese populations, the clinal variation for flowering time and circadian rhythm was much weaker in China than in Europe (K. Holm et al., unpublished), and thus the cline of expression changes may not have had time to develop as yet. For example, sample I was not grouped with E and F, which were from similar latitudes, but grouped with sample D, which was from a lower latitude. The independent local adaptation by different genetic mechanisms among early-flowering C. bursa-pastoris from different regions may represent replicated events of adaptive evolution, that is, parallel evolution (Elmer & Meyer, 2011), and facilitate flexible evolutionary response to changing environments across the species range (Fournier-Level et al., 2011).

A different set of candidate genes that might affect flowering time was found in the present DGE study relative to the previous microarray study reported by Slotte et al. (2007). In total, a common set of 60 genes was analyzed for differential expression in both studies (Table S4). Of these genes, 28 and 13 genes were identified as candidate genes for flowering time in the DGE and microarray studies, respectively. Seven genes (CCA1, CIR1, GA20OX2, MPK7, MQL5.25, MYB68, TOC1) were characterized as candidate genes in both studies, which is roughly what one would expect by chance. Five of these genes (CCA1 and GA20OX2 excepted) showed different regulatory directions between some early vs late comparisons in the DGE study relative to the microarray study. The use of different samples in these two studies could be a possible reason for this difference. The samples used in the DGE systems were drawn from many regions of mainland China, whereas those included in the microarray study were from Sweden, Taiwan, and the United States. There was no overlap in the distribution of samples used in the two studies, although the sample PL (Taiwan) likely shares the same origin with the mainland China samples. A shared genetic basis of expression differences in Europe and North America, but a different one in China, is in agreement with the evolutionary scenario that C. bursa-pastoris originated in the eastern Mediterranean region, and subsequently spread eastwards to Asia and westwards to Europe, but that it was recently introduced into North America by European settlers (Hurka & Neuffer, 1997; Neuffer & Hurka, 1999; Ceplitis et al., 2005; Slotte et al., 2008).

Conclusions

Asian populations of C. bursa-pastoris are believed to have recently evolved independently of those in Europe and North America, thereby allowing us to use them as an independent ‘replicate’ when trying to understand the genetic basis of flowering time in this species. In the present study, we found that flowering time of C. bursa-pastoris was highly variable in China. Day length and winter temperature were found to be key environmental signals that affected flowering time differentiation. There was a significant overrepresentation of differentially expressed genes in the category ‘circadian rhythm’. We suggest that genes involved in regulation of the circadian clock are strong candidates for the evolution of adaptive flowering time variation in this species, especially since some of those genes were also identified in previous experiments. Finally, C. bursa-pastoris plants exhibit expression differences for candidate genes likely to affect flowering time across the broad range of environments they face in China.

Acknowledgements

Special thanks are given to all seed collectors. We are grateful to Prof. Richard Abbott for his comments on the manuscript. This work was financially supported by the National Natural Science Foundation of China (grant no. 31000108), the National Basic Research Program of China (973 Program) (2009CB119200), State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, and Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, the Chinese Academy of Sciences. M.L. thanks the Chinese Academy of Sciences (visiting professorship), the Swedish Research Council (VR) and the Erik Philip-Sörensens Stiftelse for support.

Ancillary