Transcriptome divergence between introduced and native populations of Canada thistle, Cirsium arvense


  • Alessia Guggisberg,

    Corresponding author
    1. Institute of Integrative Biology (IBZ), ETH Zürich, Zürich, Switzerland
    • Botany Department, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
    Search for more papers by this author
  • Zhao Lai,

    1. Department of Biology and Center for Genomics and Bioinformatics, Indiana University, Bloomington, IN 47405, USA
    Search for more papers by this author
  • Jie Huang,

    1. Department of Biology and Center for Genomics and Bioinformatics, Indiana University, Bloomington, IN 47405, USA
    Search for more papers by this author
  • Loren H. Rieseberg

    1. Botany Department, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
    2. Department of Biology and Center for Genomics and Bioinformatics, Indiana University, Bloomington, IN 47405, USA
    Search for more papers by this author

Author for correspondence:

Alessia Guggisberg

Tel: +41 44 632 74 10



  • Introduced plants may quickly evolve new adaptive traits upon their introduction. Canada thistle (Cirsium arvense – Cardueae, Asteraceae) is one of the worst invasive weeds worldwide. The goal of this study is to compare gene expression profiles of native (European) and introduced (North American) populations of this species, to elucidate the genetic mechanisms that may underlie such rapid adaptation.
  • We explored the transcriptome of ten populations (five per range) of C. arvense in response to three treatments (control, nutrient deficiency and shading) using a customized microarray chip containing 63 690 expressed sequence tags (ESTs), and verified the expression level of 13 loci through real-time quantitative PCR.
  • Only 2116 ESTs (3.5%) were found to be differentially expressed between the ranges, and 4458 ESTs (7.1%) exhibited a significant treatment-by-range effect. Among them was an overrepresentation of loci involved in stimulus and stress responses.
  • Cirsium arvense has evolved different life history strategies on each continent. The two ranges notably differ with regard to R-protein mediated defence, sensitivity to abiotic stresses, and developmental timing. The fact that genotypes from the Midwest exhibit different expression kinetics than remaining North American samples further corroborates the hypothesis that the New World has been colonized twice, independently.


As a result of global trade and climate change, indigenous floras are ‘bombarded’ with exotic species at an unprecedented pace (Engelkes et al., 2008; Bradley et al., 2012). Although 99% of alien species may never proliferate in the introduced range (see the tens rule by Williamson, 1996), those that do may impose substantial losses in agriculture, irrevocably threaten native biodiversity and ecosystem functions, or cause major health risks (Duncan et al., 2004; Taramarcaz et al., 2005; Pyšek et al., 2012). To counteract the negative impact of human-induced plant expansions, it has therefore become critically important to understand the genetic and ecological conditions underlying the evolution of introduced plants.

Exotic plants are often exposed to novel biotic and abiotic conditions (i.e. different stress factors and selection pressures) upon their introduction, leading to the emergence of new adaptive traits (Prentis et al., 2008; Whitney & Gabler, 2008). A growing body of studies indeed shows that the introduced range may differ from the native one with regard to herbivore communities (Cripps et al., 2006), mutualistic interactions such as pollinator faunas or mycorrhizal symbionts (Valentine, 1977; Richardson et al., 1994), precipitation rate, drought (Broennimann et al., 2007) and/or day length (Godoy et al., 2009). Because all of these differences can affect fitness, they may also act as barriers to invasion, unless the introduced species evolves quickly enough to avoid demographic stasis or extinction (Richardson & Pyšek, 2012). Yet, the relative importance of these different conditions as elicitors of the rapid adaptation of plant invaders remains unclear (Dlugosch & Parker, 2008).

Alien plants may evolve increased competitive ability in their new range, due to the absence/reduction of specialist herbivores, allowing them to invest their energy in improved growth and/or reproduction rather than protection (Keane & Crawley, 2002; but see Colautti et al., 2004; Bossdorf et al., 2005). Constitutive defences are considered to be especially costly, because they typically lead to reduced fitness in the absence of targeted pests, particularly when nutrients are limiting or when competition is high (Baldwin, 1998; Agrawal, 2000; Björkman et al., 2008; Sampedro et al., 2011). Recent work showing that the tug-of-war between plant growth, defence and reproduction may be regulated by the same genetic pathways suggests that such trade-offs may exist. Todesco et al. (2010), for example, identified a gene in Arabidopsis thaliana, which provides enhanced resistance to plant pathogens, while simultaneously inhibiting the production of leaf biomass.

More generally, the fate of alien species may be determined by resource-based trade-offs between cell growth and various differentiation processes (i.e. not only defence against herbivores, but also, for example, attraction of pollinators, protection from UV light, drought resistance or facilitation of nutrient uptake; Grime, 1977; Herms & Mattson, 1992). In other words, genetic changes conferring increased fitness in a given set of circumstances may be (directly or indirectly) disadvantageous in other situations, due to competing demands on available energy supplies (so-called allocation costs; see He et al., 2010; Hodgins & Rieseberg, 2011). Accordingly, if alien plants encounter more favourable biotic and/or abiotic conditions in their introduced range, they may preferentially invest their energy into increased competitive ability, at the cost of lower stress resistance and reduced defence response (Alpert et al., 2000). Alternatively, successful invaders may simply exhibit greater plasticity in morphological and physiological traits than noninvasive exotic species (Richards et al., 2006; Davidson et al., 2011; but see Palacio-López & Gianoli, 2011).

Ultimately, the fate of alien species may depend on the genetic makeup at the time of release in the new range. Interspecific hybridization and admixture between differentiated populations of the same species have been shown to facilitate adaptive evolution, for it not only generates genetic novelty and variation upon which selection can act, but also rescues recent invaders from the detrimental effects of demographic bottlenecks inherent to stochastic long-distance dispersals (Ellstrand & Schierenbeck, 2000; Dlugosch & Parker, 2008). For the same reasons, polyploidy (i.e. whole-genome duplication) may play a key role in the successful establishment of exotic plant species (Pandit et al., 2011; te Beest et al., 2012). More generally, repeated introductions from different native source areas may enhance the adaptive potential of introduced species (Lavergne & Molofsky, 2007; Dlugosch & Parker, 2008).

So far, most studies on introduced species have hypothesized rapid evolution based on phenotypic differences between native and introduced populations using common garden experiments (reviewed in Colautti et al., 2009). More recently, scientists have also started to use resurrection protocols to follow these phenotypic changes over time (Sultan et al., 2012). While the latter approach allows one to test whether the changes occurred in the introduced range or whether introduced species were already pre-adapted to the different conditions encountered, few investigations have been undertaken at the genetic/genomic level to identify putative genes underlying the emergence of new adaptive attributes (Kane & Rieseberg, 2008; Lai et al., 2008; Yuan et al., 2010; Mayrose et al., 2011; Anderson et al., 2012; Hodgins et al., 2013). Rather, most genetic studies have focused on inferring the levels of genetic diversity of introduced populations and assessing their geographical origin (Marrs et al., 2008; Schlaepfer et al., 2008; Gaudeul et al., 2011; Guggisberg et al., 2012).

Cirsium arvense Scop. (Cardueae: Asteraceae; also known as Canada, Californian or creeping thistle) is native to temperate Eurasia, where it is recognized as the third most noxious agricultural weed (Schröder et al., 1993). It is also recorded as one of the worst invasive plants worldwide, because it threatens both natural and cultivated communities in the remaining four continents where it has been unintentionally introduced (Tiley, 2010 and references therein). Consequently, this species has been repeatedly proposed for development as a model system, to warrant financial support of studies aimed at regulating its spread and generating genetic or genomic resources for its investigation (Chao et al., 2005; Stewart et al., 2009).

Cirsium arvense is one of the most (if not the most) frequently listed weed in North America (Moore, 1975). Genetic data (Guggisberg et al., 2012) in combination with historical records (Stevens, 1847; Hansen, 1918; Hodgson, 1968; Evans, 2002) suggest that the species initially entered the continent from Western Europe with the arrival of the first European settlers in the early 17th century, before being re-introduced in the prairie states from Eastern Europe during the agricultural boom in the late 19th century. Yet, North American populations differ genetically from their European ancestors, implying that different allelic combinations were retained during the colonization process. Results from bioclimatic niche modelling further indicate that genotypes from Western Europe likely had to evolve new adaptive traits in order to spread across the continent, while genotypes from Eastern Europe were probably pre-adapted to the local conditions encountered in the Midwest upon their arrival (Guggisberg et al., 2012). Altogether, these results not only suggest that C. arvense evolved different life history traits in each range, but also that North American populations are likely to vary with regard to expression profiles in accordance to their ancestral origin.

In this paper, we examine the transcriptional response of native (European) and introduced (North American) populations of C. arvense to various growth conditions, in order to shed light on the genetic mechanisms underlying its adaptation to new biotic and abiotic conditions, and provide a list of candidate genes for future investigations. We specifically ask: (1) Do North American populations express different stimulus and stress responses than their native counterparts?; (2) If so, are these differences consistent with changes in trade-off strategies?; (3) And does the variability in gene expression profiles correlate with the geographic origin of the given genotypes within each range? To answer these questions, we explored the transcriptome of ten populations (five per range) of C. arvense in response to three treatments (control, nutrient deficiency and shading) with a customized microarray chip.

Materials and Methods

Study system

Cirsium arvense (L.) is an erect, diploid (2n = 34), perennial herb that can reach 1 m in height and reproduces both sexually and asexually via creeping roots (Hamdoun, 1972; Moore, 1975; Lloyd & Myall, 1976; Tiley, 2010). It grows in a variety of habitats, but prefers open, disturbed environments such as roadsides and abandoned fields with mesic soils, for it does not tolerate shade (Hunter & Smith, 1972; Moore, 1975; Tiley, 2010).

Plant material

Seeds were collected from five populations in Europe (i.e. native range) and five populations in North America (i.e. introduced range) during summer 2008 (Table 1). The populations were chosen to best represent the phenotypic diversity observed within each continent during a large-scale common garden experiment meant to identify putative differences in growth and reproductive output between the ranges in response to selected growth conditions (A. Guggisberg, unpublished). Plant material from Europe was quarantined before being shipped to Canada under CFIA import permit no. P-2008-03604.

Table 1. Population acronym, collector and voucher number (in italics) followed by herbarium abbreviation (in parentheses), collecting locality, coordinates and altitude of Cirsium arvense accessions investigated in this study
AcronymVoucher informationLocalityLatitudeLongitudeAltitude
Native range
H1Guggisberg, Bretagnolle & Zeltner 310808-1 (UBC)Hungary, S of Nagyiván, Hortobágyi Nemzeti Park47.415220.884598 m
I1Guggisberg, Bretagnolle & Zeltner 170808-1 (UBC)Italy, Mirano, between Padova and Venezia45.469812.109014 m
PL1Guggisberg & Zeltner 030908-2 (UBC)Poland, S of Nowy Targ, between Kraków and the Slowakian border49.453420.0797607 m
ROM2Guggisberg, Bretagnolle & Zeltner 280808-2 (UBC)Romania, c. 5 km S of Lugoj, between Craiova and Timişoara45.659121.9541145 m
SKGuggisberg & Zeltner 030908-1 (UBC)Slovakia, 7 km from Dedinky towards Poprad, W of Košice48.854420.3542889 m
Introduced range
AB4King & Olson King132 (UBC)Canada, AB, along TWP Rd 42, near Magrath49.2906−113.1650975 m
MN1Hodgins CA-6 (UBC)USA, MN, SE of Crookston47.7606−96.5926270 m
ND1Hodgins CA-4 (UBC)USA, ND, along Hgw. 81, S of St.-Thomas48.5718−97.4468256 m
ON2Hodgins KN-ON (UBC)Canada, ON, on Jane St, N of King St (by #14449), N of Richmond Hill43.9572−79.5615c. 178 m
ON4Hodgins CU-ON (UBC)Canada, ON, Barrie, Cundles Rd E at Pacific Ave44.4150−79.6809c. 250 m

Common garden experiment

In April 2010, seeds from five families (i.e. mother plants) from each population were scarified with 100 grit sand paper, and sown in Petri dishes, on damp filter paper soaked with 1% Plant Preservative Mixture (Plant Cell Technology, Washington, DC, USA) to prevent fungal growth during germination. Petri dishes were then placed in a germination chamber set for 30°C, 80% humidity and 16 h daylight. Radicle emergence was checked on a daily basis. Newly-emerged seedlings were transferred onto a 1 : 1 sand : soil mixture when root growth reached 1–2 cm length, and moved to a growth chamber set for 20–22°C, 40% humidity and 16 h daylight. After 1–2 wk acclimation in the growth chamber, when 1–2 pairs of true leaves were out, young plantlets were transferred into 9-cm (3½”) pots filled with the same sand : soil mixture, and moved to a flood bench at the UBC Horticulture glasshouse.

Five individuals from each family (i.e. 25 plants per population from five different mother plants) were randomly assigned to each of three growth conditions: a ‘control’, nonstressful condition; a nutrient stress; and a light stress. The stresses were chosen to compare the competitive ability of C. arvense seedlings from the native and introduced range. All plants, with the exception of those assigned to the nutrient-stress treatment, received 1.5 ml Osmocote® 13-13-13 slow release fertilizer. The light stress was simulated by growing the plants in a shade box (c. 4.5 × 3.5 × 1.8 m) made of PVC pipes and covered by 121 Lee green filters (Lee Filters, Andover, UK) and neutral density shade cloths, to mimic the spectral quality of light that is transmitted through the canopy of neighbouring plants (Bonser & Aarssen, 2003). The green filter reduced light transmittance by 73%, and the shade cloth further reduced transmittance to 92% of the original value (light intensities were reduced from an average of 873.3–66.1 μmol m−2 s−1 based on an average of three measurements taken at noon on sunny days). Plants were randomly distributed within each treatment block on the bench, for a total of 250 plants per treatment.

Three weeks following the last transplant, young, similar-sized leaves were randomly harvested, quick frozen in liquid nitrogen, and stored at −80°C until RNA extraction. To minimize the effect of individual differences within populations, three average-sized plants from three different families were pooled as one biological sample, for a total of four samples per population (representing 12 plants) for subsequent microarray experiments.

Microarray construction

The C. arvense high-density oligonucleotide microarray was customized by Roche NimbleGen (Madison, WI, USA), as described in Lai et al. (2012). Briefly, 12-plex expression arrays were designed upon an expressed sequence tag (EST) library generated from leaf, root and flower tissues of a single accession from North Dakota, USA (i.e. the introduced range) using the 454 titanium sequencing technology (454 Life Sciences, Branford, CT, USA). The microarray chip finally contained 63 690 unigenes represented by one to three unique probes, as well as random probes for background correction, for a total of 136 582 features on the array. The microarray data are freely available at the public repository ArrayExpress under the accession numbers E-MTAB-1517 and A-MEXP-2290.

RNA isolation, probe preparation, labelling and hybridization

Total RNA was extracted using the TRIzol reagent (Invitrogen, Carlsbad, CA, USA)/RNeasy (Qiagen, Valencia, CA, USA) approach described in Lai et al. (2006) and quantified by spectrophotometry (NanoDrop, Wilmington, DE, USA). Total RNA was further treated with RNase-free DNase I (Qiagen) to eliminate possible genomic contamination.

cDNA synthesis, labelling, hybridization and scanning were performed according to protocols outlined in the NimbleGen Arrays User's Guide: Gene Expression Analysis, v5.0 (Roche NimbleGen). Briefly, 10 μg of total RNA from each RNA sample was reverse transcribed to double-strand cDNA (ds-cDNA) using oligo(dT)15 primers (Promega) and a SuperScript Double-Stranded cDNA Synthesis Kit (Invitrogen). Before labelling, cDNA samples were quantified by spectrophotometry using a Nanodrop, and checked for adequate quality using the Agilent RNA 6000 Nano Kit on an Agilent 2100 Bioanalyser (Agilent Technologies, Waldbronn, Germany). The ds-cDNA samples were then labelled using the NimbleGen One-Color DNA Labeling kit, and hybridized onto the arrays using the NimbleGen Hybridisation kit and the NimbleGen Hybridisation system in combination with proprietary NimbleGen mixers, before being scanned with the MS 200 Microarray Scanner to obtain high-resolution image files, which were finally analysed with the NimbleScan software (Roche NimbleGen).

Gene expression data analysis

All arrays were inspected for possible artefacts (e.g. dust particles or bubbles). Small, suspicious areas were corrected by imputing the expression intensity of the given features using the k-nearest neighbour (KNN) averaging method implemented in the R package impute (Troyanskaya et al., 2001). Arrays were then processed with the robust multichip average (RMA) algorithm implemented in the R package oligo (Irizarry et al., 2003a,b) from Bioconductor ( This method corrects for background noise, and accomplishes quantile normalization, followed by log2 transformation and median polish summarization. Unigenes for which > 95% of the arrays showed a signal lower than the average expression value of random probes plus twice their standard deviations, were eliminated. Lastly, the dataset was divided into five subsets comprising (1) arrays from the control plot (subset C); (2) arrays from the nutrient-deficient plot (subset N); (3) arrays from the shading plot (subset S); (4) arrays from the control and nutrient-deficient plots (subset CN); and (5) arrays from the control and shading plots (subset CS), respectively, and analysed using mixed-models with the R package maanova from Bioconductor (Cui & Churchill, 2003; Wu et al., 2003). This uncovered loci that were differentially expressed (DE) between the ranges (subsets C, N and S) or exhibited significant treatment-by-range interactions (subsets CN and CS). The following terms (if available) were treated as fixed in the various models: Range (the term of interest), Treatment, and Range × Treatment (the interaction term), while Population was always included as a random term. Statistical testing for differential expression between groups of interest was based on an F-like statistic (the so-called FS statistic) that incorporates information across all of the unigenes through the use of the James–Stein shrinkage estimator (Cui et al., 2005). P-values were determined by comparing obtained FS values to null distributions approximated by 1000 array permutations, and corrected a posteriori for multiple comparisons by false discovery rate (FDR) transformation using the R package qvalue from Bioconductor (Storey, 2002) with a 5% FDR cut-off. To identify putative subgroups among investigated samples, loci found to be systematically DE between the ranges were also explored through hierarchical clustering using the function heatmap.2 with distance set to Euclidean, as implemented in the R package gplots. All analyses were run in R 2.12.2 (R Development Core Team, 2010).

Gene annotation and GO term analysis

The 63 690 ESTs were aligned against the Arabidopsis thaliana protein database using Blast (Altschul et al., 1990, 1997), and filtered for best hits using an e-value cut-off of 1e-10. Unigenes that were differentially expressed between the ranges were tested for over-represented biological gene ontology (GO) categories using the Term Enrichment tool in AmiGO (Boyle et al., 2004; Carbon et al., 2009). P-values for enrichment were calculated using a hypergeometric distribution and adjusted for multiple comparisons using the Bonferroni correction. The following subsets were investigated: (1) transcripts DE in the control plot (subset C); (2) transcripts DE under nutrient stress (subset N); (3) transcripts DE in the shading plot (subset S); and transcripts showing significant treatment-by-range interactions in (4) control/nutrient-stress comparisons (subset CN); and (5) control/shading comparisons (subset CS), respectively.

Quantitative PCR

In order to technically validate the microarray results, 20 RNAs from the control plot, representing two samples per population, were investigated through real-time quantitative PCR (qPCR) for 13 candidate loci shown to be differentially expressed between the ranges (Table 3): eight loci that were shown to be significantly DE (FDR < 0.05) in the control plot (loci numbered 1–8); two loci that were shown to be marginally DE (FDR = 0.01–0.05) in the control plot (loci numbered 9–10); and three loci that showed no expression difference in the control plot, but were found to be significantly DE (FDR < 0.05) under stressful conditions (loci numbered 11–13). They were chosen on the basis of: biological relevance regarding predicted function among loci involved in stimulus responses (a repeatedly over-represented GO category among DE loci; see 'Study system'); and available information from A. thaliana homologues. A homologue to actin 7 (i.e. a putative housekeeping gene) was further co-amplified to normalize the actual amount of cDNA of each sample (Supporting Information Table S1). Primers were designed using default settings in Primer3 (Rozen & Skaletsky, 2000).

qPCRs were carried out on a Mx3000P™ Real-Time PCR System (Stratagene, La Jolla, CA, USA) in 15 μl total volume containing 7.5 μl Platinum® SYBR® Green qPCR SuperMix-UDG (Invitrogen), 0.3 μl SYBR® Green I dye, 0.3 μl ROX reference dye, and 1.5 μl four-fold diluted first-strand cDNA in the presence of 200 nM of each transcript-specific forward and reverse primer. Each reaction was run in triplicate following a two-step cycling program: 50°C for 2 min hold, 95°C for 2 min hold, 40 cycles of 95°C for 15 s and 60°C for 30 s. Dissociation curve analysis followed PCR amplification and standard curves were generated with the Stratagene qPCR software using a cDNA mix as template and corresponding transcript-specific forward and reverse primers. Relative fold changes (FC) estimated from the microarrays were compared to those obtained from qPCRs with a one-tailed Pearson correlation test in R.


A total of 40 pooled RNA samples were investigated from each of the control and nutrient-stress plots, but only 32 samples could be analysed from the shading plot, because of 40% mortality among plants from the introduced range (populations AB4 and MN1 were represented by only two and three samples, respectively, while population ON4 completely died out; Table 1).

Of the 15 297 184 features available (136 582 features/array × 112 samples), 761 (0.005%) had to be corrected by KNN imputation due to the presence of dust particles and small bubbles on the slides. After quantile normalization and median-polish summarization, 63 216 (99.3%) of the 63 690 unigenes represented on the array showed expression levels above those of random probes and were kept for further analyses. Of those 63 216 ESTs, 37 831 (59.8%) could be annotated, 40 546 (64.1%) showed DE due to nutrient deficiency, and 46 901 (74.2%) showed DE due to shading (data not shown).

Expression differences between the ranges

Under nonstressful conditions (subset C), 1223 (1.9%) loci were found to be DE between the ranges with mean fold changes (± SD) of log2(FCpositive) = 1.07 ± 0.77 and log2(FCnegative)= −0.80 ± 0.49, respectively. Among these DE loci was an over-representation of sites involved in response to stimulus (GO:0050896, Padj = 0.004), response to stress (GO:0006950, Padj = 0.016) and response to biotic stimulus (GO:0009607, Padj = 0.037). Under nutrient stress (subset N), 968 (1.5%) loci were found to be DE between the ranges with mean fold changes (± SD) of log2(FCpositive) = 1.16 ± 0.80 and log2(FCnegative)= −0.78 ± 0.45, respectively. Among these DE loci was an over-representation of sites involved in response to stimulus (Padj = 0.002). Under light stress (subset S), 307 (0.5%) loci were found to be DE between the ranges with mean fold changes (± SD) of log2(FCpositive) = 1.50 ± 0.82 and log2(FCnegative)= −1.02 ± 0.52. Among these DE loci was an over-representation of sites involved in response to stimulus (Padj = 0.004) and response to other organisms (GO:0051707, Padj = 0.004). Altogether, this summed up to a total of 2116 ESTs (3.5%) that were found to be DE between the ranges, of which 1076 (50.9%) were upregulated in the introduced range and a significant number were involved in stimulus responses (Figs 1, 2; Table 2).

Table 2. Significant over-representation of gene ontology (GO) categories among unigenes, which are either differentially expressed (DE) between introduced and native populations of Cirsium arvense, or show significant treatment-by-range interactions
Subset: Term of interestNo. DE unigenesGO term P adj Sample frequencyBackground frequency
  1. The P-values were adjusted for multiple comparisons using a Bonferroni correction. C, control plot; N, nutrient-deficient plot; S, shading plot.

C: Range1223Response to stimulus (GO:0050896)0.0040.270.19
Response to stress (GO:0006950)0.0160.160.10
Response to biotic stimulus (GO:0009607)0.0370.070.03
N: Range968Response to stimulus (GO:0050896)0.0020.270.19
S: Range307Response to biotic stimulus (GO:0009607)0.0040.110.03
Response to other organisms (GO:0051707)0.0040.110.03
CN: Treatment × Range4458Response to stress (GO:0006950)< 0.0010.150.10
Response to stimulus (GO:0050896)< 0.0010.240.19
Defence response (GO:0006952)< 0.0010.060.04
Response to cold (GO:0009409)0.0040.030.01
Response to cadmium ion (GO:0046686)0.0090.030.02
Response to temperature stimulus (GO:0009266)0.0200.040.02
Response to inorganic substance (GO:0010035)0.0260.050.03
Response to biotic stimulus (GO:0009607)0.0260.050.03
Response to other organisms (GO:0051707)0.0270.0050.03
Response to osmotic stress (GO:0006970)0.0470.040.02
CS: Treatment × Range5None   
Figure 1.

Venn diagrams showing the number of unigenes with significant differential expression (DE) between introduced and native populations of Cirsium arvense in control (C), nutrient-deficient (N) and shading (S) plots, based on a false-discovery rate (FDR) threshold of 0.05. UP, upregulated in the introduced range; DOWN, downregulated in the introduced range.

Figure 2.

Network of significantly over-represented gene ontology (GO) categories among unigenes, which are either differentially expressed (DE) between introduced and native populations of Cirsium arvense (black squares with double border), or show significant treatment-by-range interactions (black squares with simple or double border).

Eighty-four ESTs were shown to be consistently DE between the ranges across all treatments investigated with mean fold changes (± SD) of log2(FCpositive) = 2.15 ± 1.00 and log2(FCnegative)= −1.55 ± 0.69, respectively. Hierarchical clustering analysis of these ESTs on samples from the control plot identified three gene clusters (Fig. 3). Cluster A groups ESTs that are upregulated in the introduced range, but weakly expressed in the native range, cluster B groups ESTs that are downregulated in the introduced range, and cluster C groups ESTs that are upregulated in the introduced range, but strongly expressed in both ranges. Thirteen of those loci could further be assigned to a putative, studied homologue of A. thaliana with known function (Table S2). In addition, samples were grouped according to their continental origin, and populations sampled in the prairie states of North America (aka Midwest) formed a sub-clade in the North American clade (shown in black on Fig. 3).

Figure 3.

Centred, log2-transformed expression profiles and hierarchical clustering for 84 expressed sequence tags (ESTs), which are consistently differentially expressed between introduced (in black) and native (in grey) populations of Cirsium arvense. Cluster A and C group ESTs that are upregulated in the introduced range, while cluster B groups ESTs that are downregulated in the introduced range. Only samples from the control plot are shown. EST and population acronyms are given in Table 1 and 3, respectively.

Treatment-by-range interactions

When analysing the control and nutrient-stress plots simultaneously (subset CN), 4458 (7.1%) loci showed a significant treatment-by-range effect. Among those loci was an over-representation of sites involved in response to stress (Padj < 0.001), response to stimulus (Padj < 0.001), defence response (Padj < 0.001), response to cold (GO:0009409, Padj = 0.004), response to cadmium ion (GO:0046686, Padj = 0.009), response to temperature stimulus (GO:0009266, Padj = 0.020), response to inorganic substance (GO:0010035, Padj = 0.026), response to biotic stimulus (GO:0009607, Padj = 0.026), response to other organisms (GO:0051707, Padj = 0.027), and response to osmotic stress (GO:0006970, Padj = 0.047; Table 2; Fig. 2). By contrast, when analysing the control and shading plots simultaneously (subset CS), only five loci of unknown biological function showed a significant treatment-by-range effect (Table 2).

Microarray and candidate gene validation

Thirteen ESTs that were exhibiting differential expression between the ranges in the microarray analyses were investigated using qPCR, (Table 3). Eleven of them (85%) could be validated by qPCR: ALDH12A1, AOXIA1, RNS1, PHS1, THE1, α-DOX1 and HSC70 were shown to be significantly upregulated in the introduced range under nonstressful conditions as expected based on the microarray results; FER was shown to be significantly downregulated in the introduced range under nonstressful conditions, also consistent with the microarrays; and FUS3, ACO4 and the unknown gene showed no differential expression among samples from the control plot as predicted (Fig. 4a). Results from one-tailed Pearson correlation tests further indicate that the microarray-based fold change estimates for these 11 ESTs are highly concordant with those approximated with qPCR (r = 0.91, P < 0.001; Fig. 4b). The correlation coefficient, however, drastically drops (r = 0.50, P < 0.041; Fig. 4b), when one includes SGT1B and ERA1 in the calculations.

Table 3. False discovery rates (FDR) calculated from permutated P-values, overall expression pattern (with fold changes in parentheses), best hit on Arabidopsis thaliana genome, and putative gene homologue and function for 13 expressed sequence tags (ESTs) tested for differential expression between the native and introduced range of Cirsium arvense using microarray data and later investigated through quantitative PCR
EST nameFDRCFDRNFDRSExpression patternTAIR IDLocus nameFunction
  1. Significant FDR (i.e. FDR < 0.05) for control (FDRC), nutrient-deficient (FDRN), and shading (FDRS) plots are highlighted in bold, while marginal values (i.e. FDR = 0.05–0.1) are indicated in italics. UP, upregulated in the introduced range; DOWN, downregulated in the introduced range; Pi, phosphate; JA, jasmonic acid; ABA, abscisic acid; IAA, indole-3-acetic acid (aka auxin); na, not available.

1. CGB091029R2CTFR REP C47080 0.024 0.3510.644UP in control plot (FC = 1.64)At5g62530Δ1-PYRROLINE-5-CARBOXYLATE DEHYDROGENASE (P5CDH) Proline degradation
2. contig18249 <0.001 0.034 <0.001 UP in all plots (FC = 5.99–7.09)At4g11260SUPPRESSOR OF G2 ALLELE OF SKP1 (SGT1B)R-protein mediated defence
3. contig3635 0.035 0.1670.641UP in control plot (FC = 2.10)At3g22370ALTERNATIVE OXIDASE 1A (AOX1A)Alternative respiration, redox homeostasis
4. CTLF CAP contig1227 0.038 0.1180.625UP in control plot (FC = 2.34)At2g02990RIBONUCLEASE 1 (RNS1)Pi remobilization, JA-independent wound signalling
5. CTLF CAP contig6586 0.014 0.4920.173DOWN in control plot (FC = 0.51)At3g51550FERONIA (FER)Cell-cell communication, cell elongation
6. CTRT CAP 219 contig23912 0.015 0.028 0.067 UP in all plots (FC = 1.99–2.33)At3g47390PHOTOSENSITIVE 1 (PHS1)Riboflavin biosynthesis
7. CTRT CAP contig7326 0.041 0.6350.377UP in control plot (FC = 1.66)At5g54380THESEUS 1 (THE1)Cell wall-integrity signalling, cell elongation
8. CTRT CAP contig9615 0.044 0.2090.368UP in control plot (FC = 3.58)At3g01420ALPHA-DIOXYGENASE 1 (α-DOX1)Oxylipin biosynthesis
9. contig13047 0.055 0.085 0.257UP in control and nutrient-deficient plots (FC = 1.61-1.67)At5g40280ENHANCED RESPONSE TO ABA 1 (ERA1)Guard cell ABA signalling, IAA-induced lateral root formation
10. CTLF CAP contig6238 0.060 0.051 0.184UP in control and nutrient-deficient plots (FC = 2.71-3.51)At5g02500HEAT SHOCK COGNATE PROTEIN 70 (HSC70) Cellular protein homeostasis
11. CGB091029R2CTFR C65060.459 0.039 0.734DOWN in nutrient-deficient plot (FC = 0.61)At3g26790FUSCA 3 (FUS3)Embryogenesis, cellular morphogenesis
12. CGB091029R2CTFR CAP contig41830.322 0.039 0.331DOWN in nutrient-deficient plot (FC = 0.63)At1g24620naPi deficiency-induced root hair elongation
13. CTLF CAP contig96500.1060.393 0.026 UP in shading plot (FC = 1.66)At1g050101-AMINOCYCLOPROPANE-1-CARBOXYLIC ACID OXYDASE 4 (ACO4)Ethylene biosynthesis
Figure 4.

Validation of 13 expressed sequence tags (ESTs) by real-time quantitative PCR (qPCR) using 20 samples of Cirsium arvense (ten from each range) grown under nonstressful conditions, and comparison with corresponding microarray (MC) data. (a) Expression differences (with standard error bars) as measured by MC and qPCR. The y-axis displays the fold changes (FC) in transcript levels between samples from the introduced range (in black) relative to those from the native range (in grey). (b) Log2-transformed expression FC between samples from the introduced range relative to those from the native range as measured by MC (on the y-axis) and qPCR (on the x-axis). The black dotted and black solid lines represent the least-squares regression line fitted to all 13 ESTs investigated via qPCR, and the regression line fitted to validated ESTs, respectively, while the grey dashed line symbolizes the identity line. Numbers refer to ESTs listed under (a).

In the microarray analyses, STG1B shows a highly significant, six- to seven-fold expression difference between the ranges, but no significant variation could be detected via qPCR (Table 3; Fig. 4a). Similarly, the marginally significant, nearly two-fold expression difference detected for ERA1 using the microarrays could not be confirmed via qPCR (Table 3; Fig. 4a). It is noteworthy that careful examination of the raw microarray data revealed strong expression discrepancies between probes; in each case, one of the three probes did not support the expression difference obtained after normalization and summarization. For STG1B, the region amplified by qPCR was flanking the probe that showed no expression difference, while for ERA1, the targeted sequence was separated by 188–585 base pairs from the divergently expressed probes. Considering the inconsistencies between microarray probes, it is likely that the qPCRs targeted paralogues that were not detected during transcriptome sequencing/assembly, and were missing from the customized microarray chip.


Only 2116 ESTs (3.5%) were found to be DE between native and introduced populations of Cirsium arvense (Fig. 1). Similarly, only 4458 ESTs (7.1%) exhibited a significant treatment-by-range effect. Yet, among them was an overrepresentation of loci involved in stimulus and stress responses, suggesting that C. arvense has evolved different life history strategies on each continent (Fig. 2). North American populations notably differ from their European ancestors with regard to R-protein mediated defence, sensitivity to abiotic stresses and developmental timing (Table 3). The fact that genotypes from the Midwest exhibit different expression profiles than remaining North American samples further corroborates the hypothesis that the New World has been colonized twice, independently (Fig. 3).

Introduced populations differ with regard to constitutive response

Under benign, nonstressful conditions, 1223 (1.9%) unigenes are DE between European and North American populations of C. arvense, and they contain an over-representation of genes putatively involved in stimulus response (Table 2; Figs 1, 2). This evidence suggests that C. arvense has evolved different adaptive traits on each continent. An obvious reason for such differences would be climatic mismatches between source and destination areas. In fact, the first genotypes introduced from Western Europe into what is currently the north-eastern USA and Canada likely had to adapt to stronger temperature seasonality and, to a lesser extent, different precipitation regimes, before being able to spread westwards (Peel et al., 2007; Guggisberg et al., 2012 and references therein).

The fact that native and introduced genotypes of C. arvense differ significantly with regard to constitutive stress response further indicates that European and North American populations do not share the same susceptibility to stress (Figs 1, 2; Table 2). Among the DE ESTs, the significant upregulation of SGT1B and HSC70 in North American samples is of particular interest, because these genes mediate cellular plant defence pathways (Table 3; Fig. 4). Plants possess intracellular receptor-like proteins, called resistance (R) proteins, that sense pathogen attacks and trigger (amongst others) localized programmed cell death (also known as a hypersensitive reaction), to limit further spread of the infection. Although the exact mechanisms underlying such responses are not yet fully understood, it is now well accepted that SGT1B regulates R protein accumulation and/or activation during plant disease resistance (reviewed in Muskett & Parker, 2003). Within this context, the chaperone HSC70 has been shown to form a stable complex with SGT1, whereby the latter likely acts as a modulator of the former (Noël et al., 2007). Altogether, these data suggest that the HSC70-SGT1 complex plays a crucial role in resistance to both biotic and abiotic stresses (in particular pathogen and heat-shock resistance). Recent evidence further indicates that HSC70 is involved in guard cell abscisic acid (ABA) signalling (Clément et al., 2011), hence plant immunity and adaptation to drought, because ABA triggers stomatal closure in response to water deprivation or pathogen intrusion. As expected, both SGT1 and HSC70 were significantly upregulated in C. arvense grown under nutrient- and light-limiting conditions (data not shown).

Likewise, the significant upregulation of AOX1A in North American plants from the control plot is noteworthy (Table 3; Fig. 4). In plants, the mitochondrial electron transport chain consists of two partially overlapping pathways, the cytochrome respiratory pathway (using cytochrome c oxidases, COX) and the alternative respiratory pathway (using nuclear-encoded alternative oxidases, AOX). The pathway involving AOX1A produces little energy, and mainly acts as an antioxidant, reducing the amount of reactive oxygen species (ROS) accumulating in the mitochondria as a result of adverse growth conditions (e.g. chilling, drought, nutrient deficiency or fluctuating light conditions), while simultaneously re-establishing optimal photosynthetic performance (Clifton et al., 2005; Yoshida et al., 2011). In this study, the transcript level of AOX1A indeed significantly increased in response to nutrient and light stress (data not shown).

The possibility that variation in stimulus response may ultimately lead to changes in developmental timing and phenotype is illustrated by the DE of FER and THE1 between native and introduced genotypes of C. arvense from the control plot (Table 3; Fig. 4). The FER receptor-like kinase was first shown to regulate male–female interactions during pollen tube reception (Escobar-Restrepo et al., 2007). FER has later been hypothesized to play a role in plant resistance to pathogen attacks, because fungal invasion induces similar processes as pollen tube reception (i.e. hydration and germination, followed by penetration of the host cell; Kessler et al., 2010). fer mutants are indeed resistant to powdery mildew infection. Together with THE1 and HERCULES RECEPTOR KINASE 1 (HERK1, At3g46290), FER is also implicated in cell elongation during vegetative growth; knockout/knockdown mutants of these genes exhibit dwarfism (Guo et al., 2010). In addition to its stimulating effect on cell elongation during vegetative growth, THE1 also has an inhibitory effect on cellular growth following cell wall damage (Hématy et al., 2007).

Introduced populations differ with regard to induced stress response

Introduced populations of C. arvense also differ from their European ancestors in how they respond to stress. Under limiting resource and light conditions, 968 (1.5%) and 307 (0.5%) unigenes are DE between the ranges, respectively (Table 2; Figs. 1, 2). The DE of ERA1 and At1g24620 under nutrient deficiency (Table 3; Fig. 4) might contribute to previously reported variation in root morphogenesis (Bommarco et al., 2010). ERA1 is a negative regulator of the ABA response, and its deletion causes guard cells to be hypersensitive to ABA, ultimately reducing transpiration/wilting during desiccation (Pei et al., 1998). Accordingly, era1 mutants exhibit slowed growth compared to wild-type A. thaliana, probably as a result of decreased carbon fixation due to stomatal closure. ERA1 also has a repressive effect on ABSCISIC ACID INSENSITIVE 3 (ABI3, At3g24650), which exerts a positive effect on lateral root development (Brady et al., 2003). Interestingly, era1 mutants produce significantly more lateral roots than wild-type A. thaliana. Similarly, locus At1g24620 codes for an EF-hand calcium binding protein that is involved in phosphate (Pi) deficiency-induced root elongation (Lin et al., 2011). Under Pi limitation, the gene is repressed and mutant alleles exhibit longer root hairs than wild-type A. thaliana as expected.

In the case of light stress, the DE of ACO4 between native and introduced populations of C. arvense is noteworthy, because it implies variation in developmental programming (Table 3; Fig. 4). Ethylene acts as an endogenous regulator of plant development (e.g. during root/shoot expansion, fruit ripening or leaf abscission), whereby ACO4 is involved in the final step of the conversion of methionine to ethylene (Gómez-Lim et al., 1993). It is therefore not surprising that the concentration of ethylene (and consequently ACO transcripts) varies with environmental conditions such as fluctuating light intensities (Vandenbussche et al., 2003, 2012). ACO4 has also been shown to be drastically upregulated in response to wounding, implicating ethylene in biotic stress regulation (Gómez-Lim et al., 1993; Adie et al., 2007). Interestingly, the upregulation of ACO4 in North American C. arvense coincides with significant upregulation in the shade treatment (i.e. a significant treatment effect; data not shown).

Numerous genes exhibit treatment-by-range interactions

The combined analysis of the control and nutrient-deficient plots (subset CN) identified 4458 ESTs (7.1%) exhibiting a treatment-by-range interaction, and a significant proportion of these are involved in various stress responses (e.g. cold, heat, drought, biotic factors; Table 2; Fig. 2). While these results suggest that introduced, North American populations of C. arvense have evolved different strategies for responding to biotic and abiotic stresses than their European ancestors, this pattern could not be confirmed for the shade treatment (i.e. when analysing subset CS; Table 2). The low number (5) of significant interaction terms detected in this case may be due to the severely reduced statistical power of this dataset, as shading caused 40% mortality among plants from the introduced range (see 'Study system').

Despite the large number of ESTs with significant treatment-by-range interactions, the expression data alone provide little information regarding the phenotypic traits likely to be affected by the expression variation. However, the treatment-by-range interactions correlate with preliminary evidence for trade-offs between growth rates and resistance to both biotic and abiotic stressors from a large common garden experiment (A. Guggisberg, unpublished). Under favourable (control) conditions, introduced populations produce more leaves than their European ancestors, but this trend was reversed when the plants were grown under various stress conditions. In the latter cases, native populations produced more shoots and leaves under limiting resource and light conditions, respectively, supporting the hypothesis that introduced plant species may establish in a new environment, if they trade their tolerance to stress for increased growth performance (Hodgins & Rieseberg, 2011; Mayrose et al., 2011; Koziol et al., 2012).

Gene expression profiles correlate with geographic origin

Eighty-four loci that are consistently DE expressed between native and introduced populations of C. arvense were used to explore the relationship between gene expression variation and genetic distance. The resulting hierarchical clustering coincides with the geographical provenance of the given populations (Fig. 3). Gene expression profiles further confirm phylogeographic patterns recovered with nuclear microsatellites, in that populations sampled in the prairie states (the Midwest) differ from the remaining North American populations with regard to their expression profiles, hence corroborating the hypothesis that the New World has been colonized twice, independently (Guggisberg et al., 2012). Alternatively, the geographical structure in expression profiles detected among introduced genotypes may reflect adaptation to different local conditions. Results from bioclimatic niche modelling indeed indicate that the genotypes from Western Europe likely had to evolve new adaptive traits in order to spread across the continent, while genotypes from Eastern Europe were probably pre-adapted to the conditions encountered in the Midwest upon their arrival (Guggisberg et al., 2012).

The present paper constitutes an important step towards our understanding of the genetic mechanisms underlying the adaptation of introduced plant species to their new habitat. It not only depicts which processes are likely crucial in exotic plant establishment, but also provides a valuable database for future studies aimed at investigating candidate genes that may account for the success of cosmopolitan weeds such as C. arvense. In an attempt to further our knowledge in invasion biology, we are currently comparing the transcriptome of progenies from controlled crosses of C. arvense, to determine whether the different expression patterns uncovered here are caused by selection in the vicinity of the genes themselves (cis) or by selection on interacting regulatory genes (trans).


The authors thank all colleagues listed in Table 1 for collecting seed material for the purpose of this study. A.G. thanks François Bretagnolle and Louis Zeltner for accompanying her into the field, as well as following colleagues for their generous help with microarray analyses: Aurélie Bonin, Kathryn Hodgins, Margot Paris, and Sébastien Renaut. Technical assistance by Katharina Dlugosch, Maya Mayrose, Kristin Nurkowski, and Moira Scascitelli was also greatly appreciated. The authors finally acknowledge Elena Kramer, Guilhem Mansion and three anonymous reviewers for providing helpful comments on earlier versions of this manuscript. This work was supported by two grants (nos. PBZHP3-123301 and PA00P3_134180) from the Swiss National Science Foundation to A.G., and the Natural Sciences and Engineering Research Council of Canada Award 353026 to L.H.R.