Examining the molecular mechanisms contributing to the success of an invasive species across different ecosystems

Abstract Invasive species provide an opportune system to investigate how populations respond to new environments. Baby's breath (Gypsophila paniculata) was introduced to North America in the 1800s and has since spread throughout the United States and western Canada. We used an RNA‐seq approach to explore how molecular processes contribute to the success of invasive populations with similar genetic backgrounds across distinct habitats. Transcription profiles were constructed from seedlings collected from a sand dune ecosystem in Petoskey, MI (PSMI), and a sagebrush ecosystem in Chelan, WA (CHWA). We assessed differential gene expression and identified SNPs within differentially expressed genes. We identified 1,146 differentially expressed transcripts across all sampled tissues between the two populations. GO processes enriched in PSMI were associated with nutrient starvation, while enriched processes in CHWA were associated with abiotic stress. Only 7.4% of the differentially expressed transcripts contained SNPs differing in allele frequencies of at least 0.5 between populations. Common garden studies found the two populations differed in germination rate and seedling emergence success. Our results suggest the success of G. paniculata in these two environments is likely due to plasticity in specific molecular processes responding to different environmental conditions, although some genetic divergence may be contributing to these differences.


| INTRODUC TI ON
The ability of invasive species to invade, adapt, and thrive in novel ecosystems has long been a focus of ecological research. Coined the "paradox of invasions," examining how invasive populations respond to novel environmental stressors after an assumed reduction in population size during introduction has become an entire field of scientific inquiry (Dlugosch, Anderson, Braasch, Cang, & Gillette, 2015;Sax & Brown, 2000;Sork, 2018). However, this paradox has been called into question as research shows that while many invasive populations may undergo a reduction in demographic and/or effective population size after an invasion event, this is not always linked with a subsequent reduction in genetic diversity (Dlugosch et al., 2015;Frankham, 2005). Additionally, differences between the total genetic diversity of a population and the adaptive variation of a population can be large (Leinonen, O'Hara, Cano, & Merilä, 2008;McKay & Latta, 2002). For these reasons, using total genetic diversity as a measure of invasive potential can be complex and potentially misleading. Instead, a better approach may be to examine how invasive species functionally respond to novel environments and assess how specific molecular processes may be contributing to invasive success (Kawecki & Ebert, 2004;Lande, 2015;Sork, 2018).
Local adaptive evolution and phenotypic plasticity represent two strategies for coping with novel environmental stressors, although they are not mutually exclusive (Kawecki & Ebert, 2004;Lande, 2015). Phenotypic plasticity can be adaptive, maladaptive, or neutral, and can occur independently or in conjunction with shifts in allele frequencies that also alter mean trait values (Ghalambor, McKay, Carroll, & Reznick, 2007;Van Kleunen & Fischer, 2005).
When phenotypic plasticity is adaptive, the population's trait value moves closer to the new environment's optimum. This can allow populations to persist through the sudden application of strong directional selection that often accompanies an introduction, particularly a founder event, without the more time-consuming process of having to wait for fortuitous mutations to arise (Conover & Schultz, 1995;Ghalambor et al., 2007;López-Maury, Marguerat, & Bähler, 2008;Van Tienderen, 1997). Over time, if there are population distributional changes in allele frequencies associated with fitness, then the invasive population will have on average a phenotype that is more fit in its current range than it would be in other environments, including the native range. Regardless of the mechanism, these shifts in fitness-related traits are the difference between persistence and perishing for an introduced population (Joshi, 2001;Kawecki & Ebert, 2004;Richards, Bossdorf, Muth, Gurevitch, & Pigliucci, 2006).
In the study of invasive species, the ability to examine molecular processes associated with phenotypically plastic responses (e.g., through environmentally driven gene expression differences) and those indicative of local adaptive evolution (e.g., through changes in allele frequencies) is often limited by the relative lack of background genetic data available, particularly for nonmodel species (Ekblom & Galindo, 2011). Examining these two processes can be further complicated when traditional methods used to assess local adaptation, such as reciprocal translocation experiments, bring up ethical concerns since moving invasive populations to new locations may increase their potential spread (Bunting & Coleman, 2014). This concern may be especially true for highly prolific invasive species. However, with the development of technologies such as RNA-seq, which allows for the assembly of transcriptomes de novo, gene expression and sequence data have become more widely available for nonmodel systems (Ekblom & Galindo, 2011;Sork, 2018;Wang, Gerstein, & Snyder, 2009). RNA-seq-derived gene expression data can be used to answer questions related to how different environments influence changes in gene expression, which can help address how plastic these responses may be (Des Marais, Hernandez, & Juenger, 2013;Lande, 2015;Via & Lande, 2006). In addition, because RNA-seq also produces sequence data, we can assess allele frequency differences for genes that are differentially expressed, which can give initial insight into population divergence and potential processes driving local adaptive evolution (Costa, Angelini, De Feis, & Ciccodicola, 2010). Thus, the combination of expression and sequence data produced from RNA-seq methods can allow researchers to estimate the prevalence of plasticity in response to novel environmental stressors and begin to address questions about how invasive species adapt to their introduced environments (Lande, 2015;Sork, 2018).
In this study, we take advantage of RNA-seq technology to examine changes in different molecular processes that may allow invasive populations with similar genetic backgrounds to establish across different ecosystems. The system we are using to explore this question is invasive populations of baby's breath (Gypsophila paniculata L.; Caryophyllaceae), which inhabits different regions of the continental United States and Canada. Gypsophila paniculata is a perennial forb native to Eurasia. It is thought to be a long-lived herbaceous perennial (at least 7 years, C. G. Partridge, personal observation), although the full life span has not been assessed, and flowers are not produced until the second or third year of growing (Darwent & Coupland, 1966). As is characteristic of most members of the genus Gypsophila, it thrives in environments with dry, well-draining, calcareous soils with warm summers and cool winters (Barkoudah, 1962). However, it has one of the largest geographic distributions of the genus, stretching from eastern Europe to North China (Barkoudah, 1962;CABI, 2015). Originally introduced into North America in the late 1800s for use in the floral industry (Darwent, 1975;Darwent & Coupland, 1966), G. paniculata quickly spread and can now be found growing in diverse ecosystems across North America, often outcompeting and crowding out the native species (Baskett, Emery, & Rudgers, 2011). While relatively little is known about the history of invasive baby's breath populations in the United States, a recent population genetic analysis using 14 microsatellite markers identified at least two distinct population clusters, with one of these clusters including populations that span from the upper portion of Michigan's lower peninsula to the eastern side of the Cascade Mountains (Lamar & Partridge, 2019). The environments that these populations occur in range from quartz-sand dunes in Michigan to disturbed roadsides in Minnesota, prairies in North Dakota, and sagebrush steppes in eastern Washington.
While these populations may share a similar genetic background, understanding how they are responding to different environments will help shed light on how this invasive is able to thrive across distinct habitats.
For this study, we examined differential gene expression and identified single nucleotide polymorphisms (SNPs) within differentially expressed genes from two G. paniculata populations within the same genetic cluster that inhabit divergent ecosystems: (1) the coastal sand dunes in Petoskey, Michigan, and (2) sagebrush steppe regions around Chelan, WA. These two habitats were chosen because they represent ecologically distinct ecosystems, with divergent environmental characteristics (see results). In addition, we conducted a common garden growth trial to examine differences in germination rates, seedling emergence success, and above-and belowground tissue allocation between these two populations. We predict that the populations will differ in gene expression patterns and that those differences will be reflective of the environment in which they inhabit. Given that baby's breath established in these environments approximately 100 years ago (Lamar & Partridge, 2019), we also predict that this should be enough time to see divergence in allele frequencies for genes that are important to these distinct habitats. This will allow us to identify potential targets of local adaptive evolution for future testing. Finally, we hypothesize that different environmental conditions (i.e., growing degree day, precipitation, and nutrient availability (see Section 3)) between these two habitats have likely led to differences in growth responses. Therefore, we predict that these populations will differ in certain phenotypic traits, such as germination rate, seedling emergence success, and above-and belowground tissue allocation, when grown in a common garden environment. Thus, the overall goal of this work was to examine how G. paniculata populations that have shared genetic backgrounds but differ in their invaded habitats (i.e., sand dunes in Petoskey, Michigan, and sagebrush steppe in Chelan, Washington) are responding to these different environments and to explore how different molecular processes are contributing to their success as an invasive species.

| Study site characterization
Petoskey, Michigan (PSMI), is -located along Lake Michigan's primary-successional quartz-sand dune system. Vegetation is sparse and is chiefly comprised of Ammophila breviligulata (dune grass), Silene vulgaris (bladder campion), Juniperus horizontalis (creeping juniper), J. communis (common juniper), and Cirsium pitcheri (Pitcher's thistle; Figure 1a,b). Chelan, Washington (CHWA), is a disturbed habitat situated on slopes surrounding Lake Chelan and dominated by sagebrush (Artemisia spp.; Figure 1a,c). Average climate data for these two locations were collected from stations operated by the National Oceanic and Atmospheric Organization (NOAA) in Petoskey, MI, and Entiat, WA (near Chelan, WA), and are summarized in Table 1.

| Soil analysis
In June 2018, we collected soil samples from PSMI (45.4037°N 84.9121°W) and CHWA (47.7421°N 120.2177°W; Figure 1a-c). In PSMI, we collected soil from 10 cm, 50 cm, and 1 m, while in CHWA, we collected soil from 10, 25, and 50 cm depths. Sampling locations differed in collection depths due to soil characteristics in CHWA that made deeper collection impossible (large boulders, hard soil). At both locations, we collected two sets of soil samples from all depths.
We stored samples in airtight plastic bags and maintained them at 4°C until analysis.
We sent soil samples collected from all depths at PSMI and CHWA to A&L Great Lakes Laboratories (Fort Wayne, IN) for nutrient analysis.
Samples were tested for organic matter (%), phosphorus (P), potassium (K), magnesium (Mg), calcium (Ca), soil pH, total nitrogen (N), cation exchange capacity (CEC), and percent cation saturation of K, Mg, and Ca. At the laboratories, samples were dried overnight at 40°C before being crushed and filtered through a 2-mm sieve. The following methods were then used for each analysis: organic matter content (loss on ignition at 360°C), pH (pH meter), phosphorus, potassium, magnesium, and calcium content (Mehlich III extraction and inductively coupled plasma mass spectrometry). Total nitrogen was determined using the Dumas method (thermal conductance). Results of nutrient testing were analyzed using a principal component analysis (PCA) in the statistical program R v3.6.0 (R Development Core Team, 2017).
We then dissected seedlings into three tissue types (root, stem, and leaf), placed tissue in RNAlater™ (Thermo Fisher Scientific), and flash-froze them in an ethanol and dry ice bath. Samples were kept on dry ice for transport and maintained at −80°C until RNA extractions were performed.
We extracted total RNA from frozen tissue using a standard TRIzol ® (Thermo Fisher Scientific) extraction protocol (https:// F I G U R E 1 (a) Map identifying sample locations for Gypsophila paniculata populations used in this study. (b) Petoskey, Michigan (PSMI), study site, and (c) Chelan, Washington (CHWA), study site assets.therm ofish er.com/TFS-Asset s/LSG/manua ls/trizol_reage nt.pdf). We resuspended the extracted RNA pellet in DNase/RNasefree water. The samples were then treated with DNase to remove any residual DNA using a DNA-Free Kit (Invitrogen). We assessed RNA quality with a Bioanalyzer 2100 (Agilent Technologies) and NanoDrop™ 2000 (Thermo Fisher Scientific). RNA integrity number (RIN) values for individuals used in this study ranged from 6.1 to 8.3.
However, because both chloroplast and mitochondrial rRNA can artificially deflate RIN values in plant leaf tissue, we deemed these values to be sufficient for further analysis based upon visualization of the 18S and 28S fragment peaks (Babu & Gassmann, 2016). This resulted in high-quality total RNA from 10 PSMI leaf, 10 PSMI stem, 10 PSMI root, 10 CHWA leaf, 9 CHWA stem, and 10 CHWA root samples. Finally, we submitted the total RNA samples to the Van Andel Research Institute for cDNA library construction and sequencing.

| cDNA library construction and sequencing
Prior to sequencing, all samples were treated with a Ribo-Zero rRNA Removal Kit (Illumina). cDNA libraries were constructed using the Collibri Stranded Library Prep Kit (Thermo Fisher Scientific) before being sequenced on a NovaSeq 6000 (Illumina) using S1 and S2 flow cells. Sequencing was performed using a 2 × 100 bp paired-end read format and produced approximately 60 million reads per sample, with 94% of reads having a Q-score > 30 (Table S1).

| Differential expression
To quantify transcript expression, reads were mapped back to the assembly using bowtie and quantified using the RSEM method as implemented in Trinity. Counts were generated for genes and TA B L E 1 Location and climate data for sampling sites, taken from National Oceanic and Atmospheric Organization (NOAA) weather stations in Petoskey, MI, and Entiat, WA (near Chelan, WA) To be considered significantly differentially expressed, transcripts needed to have an adjusted p-value (BH method [Benjamini & Hochberg, 1995]) below 0.05 and a log2 fold change greater than 2.
For transcripts that were differentially expressed, we identified Gene Ontology (GO) biological processes that were either over-or underrepresented using the PANTHER classification system v14.

| Single nucleotide polymorphism variant calling
We used the HaplotypeCaller tool from GATK4 to identify potential single nucleotide polymorphisms (SNPs) that were present in transcripts that were differentially expressed between populations (Depristo, 2011;McKenna, 2010). The bowtie mapped files were used to jointly genotype all 59 samples simultaneously with a minimum base quality and mapping quality of 30. Variant data were visualized using the vcfR package v1.8.0 (Knaus & Grünwald, 2017).
We identified variants associated with nonsynonymous SNPs, synonymous SNPs, 5ʹ and 3ʹ UTR SNPs, 5ʹ and 3ʹ UTR indels, frameshift and in-frame indels, premature or changes in stop codons and changes in start codons, and calculated population diversity estimates for all SNP types. The effect prediction was done using custom scripts (which can be found in the Dryad repository) and the TransDecoder predicted annotation in conjunction with the base change. We set a hard filter for the SNPs so that only those with QD scores > 2, MQ scores > 50, SOR scores < 3, and read postrank sums between −5 and 3 passed. We then calculated the allele frequencies for each SNP within PSMI and CHWA.
For the subsequent evaluation, we focused on SNPs that had potential functional effects (i.e., they were not listed as "synonymous" or "unclassified"), were in transcripts differentially expressed between PSMI and CHWA across all three tissues, and that exhibited differences in SNP allele frequencies between the populations by at least 0.5. We used the R package metacoder v0.3.3 (Foster et al., 2017) to visualize the GO biological process hierarchies associated with transcripts containing these SNPs.

| Common garden trials
Finally, to examine whether environmental differences between these two locations has led to different growth responses, we conducted common garden trials to examine differences in germination rate (functionally defined as radicle emergence (Baskin & Baskin, 2001)), seedling emergence success (defined as successful cotyledon emergence from the soil), and the ratio of above-and belowground tissue allocation between the populations.

| Germination trial
On 11 August 2018, we returned to our sample sites in CHWA and PSMI and collected seeds from 20 plants per location. This date was chosen because it was previously determined that this collection time can yield over 90% seed germination for G. paniculata collected from Empire, MI (Rice, Martínez-Oquendo, & McNair, 2019). To collect seeds, we manually broke seed pods off and placed them inside paper envelopes in bags half-filled with silica beads. We stored bags in the dark at 20 to 23°C until the germination trial began one month later.
We counted one hundred seeds from twenty plants per population and placed them in a petri dish lined with wet filter paper (n = 2,000 seeds per population). We established a control dish using 100 seeds from the "Early Snowball" commercial cultivar (G.

| Growth trials
To examine population differences on seedling emergence success and above-and belowground tissue allocation, we planted 6 seeds collected from 20 individual plants per population (n = 120 per population, n = 240 total). All seeds were planted on the same day to a standardized depth of 5 mm in a sand/potting soil mixture.
Greenhouse conditions were set at 7:17-hr dark:light photoperiod.
Relative humidity and temperature settings during the day were 55% and 21°C, while nighttime conditions were 60% and 15.5°C. Each day we watered plants until the soil appeared fully wet and we randomized plant position to prevent bias in temperature, light, or water regime. At the end of the seven-week trial period, we carefully removed plants from the soil and measured the length of tissue above and below the caudex using a caliper.
To compare the proportion of seedlings that successfully emerged between the populations, we ran a two-sided proportion test in the R statistical program v3.6.0 (R Development Core Team, 2017). We analyzed differences in the ratio of above-and belowground tissue between populations for seedlings that successfully emerged and examined the presence or absence of family effects using a completely randomized design with subsampling ANOVA in SAS v9.4 (SAS Institute Inc., 2013).

| Habitat characterization
Climate data collected from NOAA monitoring stations revealed differences in mean temperature, precipitation, and growing degree day (GDD) between our two sampling locations. CHWA had a 3°C

| Differential gene expression
Across all three tissue types, there were 1,146 transcripts that were differentially expressed between the PSMI and CHWA populations ( Figure 3a, Table S3), with the majority of the differences in expression being driven by sampling location and tissue type (Figure 3b).
Root tissue contained the highest number of differentially expressed transcripts between the two populations (8,135 transcripts, Table S4), followed by leaf tissue (5,666 transcripts, Table S5) and stem tissue (5,374 transcripts; Figure 3a, Table S6).

| Enriched GO processes in CHWA
GO biological processes that were enriched with transcripts displaying higher expression in CHWA relative to PSMI across all three tissue types were primarily associated with different stress responses (

| Enriched GO processes in PSMI
For the PSMI population, GO terms that were enriched with transcripts that showed significantly higher expression across all three tissues were associated with nutrient response, development, and transcriptome processes (Table 2) were specifically enriched in CHWA root tissue (Table S7). For the PSMI population, GO terms specifically associated with root tissue included cellular response to nitrogen starvation (GO:0006995), nitrate assimilation (GO:0042128), and organophosphate metabolic processes (GO:0019637; Table S8).

| Stem tissue
There were 5,374 differentially expressed transcripts in stem tissue collected from CHWA and PSMI (Figure 3a) and systemic acquired resistance (GO:0009627; Table S10).

| Leaf tissue
Of the 5,666 transcripts that were differentially expressed between leaf tissues from CHWA and PSMI (Figure 3a  PSMI. Some of the enriched GO terms that were specific to leaf tissue from the CHWA population included fatty acid beta-oxidation (GO:0006635) and positive regulation of salicylic acid-mediated signaling pathway (GO:0080151; Table S11). The enriched GO terms that were specific to PSMI leaf tissue included vitamin biosynthetic process (GO:0009110), long-day photoperiodism, flowering (GO:0048574), and response to UV-A (GO:0070141; Table S12).

| Comparison of gene expression and SNP GO biological processes
Of the transcripts that were differentially expressed between CHWA and PSMI across all three tissues, 85 (7.4%) of those transcripts contained potentially functional SNPs, which displayed allele frequencies that differed between the two populations by at least 0.5 (Table S13). Enrichment analysis did not identify any GO processes that were statistically enriched for these 85 transcripts, although GO biological terms associated with these transcripts can be viewed in Figure 4b.

| Germination trial
Results of a log-rank test comparing time-to-germination curves for each locality indicated strong statistical differences between seeds collected from PSMI and CHWA, with seeds from CHWA germinating more quickly (p < 2.0 × 10 -16 ; Figure 5). While there was a difference in germination curves, both localities reached 90% germination by the end of the germination trial. Log-rank tests looking at homogeneity within groups found strong statistical support for variation among time-to-germination curves for seeds from different parent plants for both populations (both p < 2.0 × 10 -16 ), suggesting potential family effects.

| Growth trial
A two-sided proportion test indicated a significant difference in the total number of seedlings that emerged between seeds collected from CHWA and PSMI, with CHWA seedlings emerging more often than PSMI (p < 0.0002; Figure 6a). When excluding plants that did not emerge, ANOVA results indicated no significant difference in the ratio of above-and belowground tissue allocation between populations (p = 0.61; Figure 6b). However, there were significant family effects in above-and belowground tissue allocation (p = 0.03; Figure S1).

| D ISCUSS I ON
The primary drivers that allow invasive species to adapt to novel environments over relatively short periods of evolutionary time is a process not yet fully understood. To better understand these mech-  (Lamar & Partridge, 2019). Using RNA-seq data (which gives orders of magnitude more informative data than microsatellites), we found that there were a number of transcripts differentially expressed between these populations and that many of these genes were involved in processes directly related to their different environments, particularly those associated with abiotic stress response in CHWA and nutrient starvation in PSMI. Of the genes that were differentially expressed across all three tissues, only 7.4% contained potential SNPs that differed in frequency by at least 0.5 between the populations. In addition, while we identified differences in germination rates and seedling emergence success between the two populations in a common garden experiment, we did not observe differences in above-and belowground tissue allocation as we initially predicted. From these data, we suggest that the success of invasive G. paniculata across these distinct ecosystems is likely the result of plasticity in molecular processes responding to these different environmental conditions, although some genetic divergence over the past 100 years may also be contributing to these differences.

F I G U R E 4
Heat trees displaying (a) GO biological processes that are enriched with transcripts with significant differential expression between each population, and (b) GO biological processes that are represented by transcripts differentially expressed between the two populations and contain SNPs that differ in allele frequency by at least 0.5. The size of each node is representative of the number of transcripts assigned to each GO term. The color of each branch represents increased expression, with green displaying higher expression in Petoskey, Michigan (PSMI), and brown displaying higher expression in Chelan, Washington (CHWA)

| Stress response in CHWA
The sagebrush ecosystem of the eastern Cascade Mountains is characterized by a semi-arid, temperate environment with a drought-resistant plant community (Miller et al., 2011). The environmental data obtained from our sampling regions suggest that the CHWA population experiences less precipitation and higher temperatures than G. paniculata growing in PSMI. As such, many of the enriched GO processes with higher expression in the CHWA population were related to a suite of stress responses indicative of abiotic stress. Some of these included response to abscisic acid (ABA), response to reactive oxygen species, response to heat, response to salt stress, response to water deprivation, and response to topologically incorrect folded proteins (Table 2, Figure 4a).
During abiotic stress, many of these processes interact with one another to help maintain cellular homeostasis (Shinozaki & Yamaguchi-Shinozaki, 2000;Tuteja, 2007). In our data, transcripts that were associated with protein folding GO processes mainly corresponded to heat-shock proteins (Hsps). While Hsps are most notably involved in protein stability during heat stress, they can also respond when plants experience osmotic, cold, or oxidative stress (Boston, Viitanen, & Vierling, 1996;Vierling, 1991;Wang, Vinocur, Shoseyov, & Altman, 2004;Waters, Lee, & Vierling, 1996). Hsps can also interact with ABA, often considered a "plant stress hormone" because it can be induced by multiple abiotic stressors (Mahajan & Tuteja, 2005;Swamy & Smith, 1999). Arabidopsis mutants that are deficient in ABA do less well under drought or osmotic stress conditions than those with sufficient ABA (Tuteja, 2007). Under heat and drought stress, increased production of ABA can lead to higher levels of hydrogen peroxide and result in oxidative stress.
But, this effect can be mediated as increased oxidative stress triggers synthesis of Hsp70, which upregulates antioxidant enzymes that control reactive oxygen species and protects against oxidative injury (Fauconneau, Petegnief, Sanfeliu, Piriou, & Planas, 2002;Hu et al., 2010). Thus, the enrichment of genes involved in these interacting processes suggests CHWA populations are under higher levels of abiotic stress, particularly heat and drought stress, compared with PSMI populations, and these data provide insight into the molecular response to these stressors.
When examining leaf, root, and stem tissue from CHWA seedlings separately, additional GO processes related to stress responses were observed. "Response to salicylic acid" was enriched in both the leaf and root tissue. Salicylic acid (SA) is a phytohormone that is involved in immunity and defense response to pathogens (Dempsey, Shah, & Klessig, 1999;Vlot, Dempsey, & Klessig, 2009). It also plays an important role in a plant's response to abiotic stress, including metal, salinity, ozone, UV-B radiation, temperature, and drought stress (Khan, Fatma, Per, Anjum, & Khan, 2015). For example, in Mitragyna speciose, the application of SA led to increased expression of chaperone proteins F I G U R E 5 Germination curves for Gypsophila paniculata seeds collected from Petoskey, Michigan (PSMI, n = 2,000), and Chelan, Washington (CHWA, n = 2,000), on 11 August 2018 and incubated for 12 days. Burpee commercial cultivar seeds (n = 100) known to have germination success in excess of 90% were used for an experimental control F I G U R E 6 Results of a common garden growth trial of Gypsophila paniculata plants conducted for seven weeks (n = 120 per population).
(a) Seedling emergence per sampling location, (b) ratio of aboveground: belowground tissue allocation per sampling location. Location codes: Chelan, Washington (CHWA); Petoskey, Michigan (PSMI). and heat-shock proteins when plants were in drought conditions (Jumali, Said, Ismail, & Zainal, 2011;Khan et al., 2015). As previously stated, the arid environment of the sagebrush ecosystem is likely to result in higher drought stress, and increased expression of genes associated with SA pathways may be an additional mediating factor allowing invasive G. paniculata to thrive in this system.

While a number of genes involved in abiotic stress response
showed higher expression in CHWA, the majority of these genes did not have SNPs with divergent allele frequencies between the two populations, suggesting that some of this response is likely due to plasticity. However, a few genes involved in different stress responses and chaperon-mediated protein folding processes did have SNPs that differed in allele frequency by at least 0.5 or greater. One of the genes involved in oxidative stress was caffeoylshikimate esterase (CSE). CSE is an important enzyme in the synthesis of lignin, a major component of the cell wall (Vanholme, 2013). Plants with mutations in the CSE gene display increased sensitivity to hydrogen peroxide and oxidative stress, which were enriched in our GO analysis (Gao, Li, Xiao, & Chye, 2010). In addition, another transcript that displayed divergent allele frequencies was peptidyl-prolyl cis-trans isomerase (FKBP62), which is involved in chaperone-mediate protein folding. FKBP62 interacts with the heat-shock protein 90 (HSP90.1) complex to positively regulate thermotolerance in Arabidopsis (Meiri & Breiman, 2009).
Expression of this gene is induced in Arabidopsis during heat stress, and those that overexpress this gene show higher survival at temperatures above 45°C after a 37°C acclimation period (Meiri & Breiman, 2009). This increased heat tolerance could be helpful in the warmer, arid climate of CHWA. Differences in allele frequencies between PSMI and CHWA associated with these genes suggest that there could be local adaptive evolution occurring due to different selection pressures associated with abiotic stress.
However, additional work needs to be conducted to more thoroughly examine these distinct SNPs and fully assess their relationship to population divergence and local adaptive evolution.

| Nutrient starvation in PSMI
The G. paniculata population in PSMI is located in the coastal sand dunes of northwest Michigan. This area is a primary-successional dune habitat where G. paniculata grows in the foredune region.
The sand dune environment can present strong selection pressure on plants in the form of sand burial, limited soil moisture, and lack of nutrients (Maun, 1994). One of the main limiting factors for seedling success in dune systems is nutrient deficiency, especially nitrogen, phosphorus, and potassium (Hawke & Maun, 1988;Willis & Yemm, 1961). Our soil analysis shows that PSMI soil contained low concentrations of organic matter, total nitrogen, phosphorus, and potassium, suggesting this is a very nutrient-limited environment. In conjunction with these environmental differences, the GO enrichment analysis showed that "regulation of response to nutrient levels" and "cellular response to phosphate starvation" were both significantly enriched in PSMI in all three tissues compared with CHWA. In addition, there were a number of processes associated with nitrate regulation (nitrate assimilation and nitrogen cycle metabolic process) specifically enriched in the root tissue from PSMI. Some of the differentially expressed genes associated with these processes included phospholipase D zeta 2 (PLPZ2), transcription factor HRS1 (HRS1), and SPX domain-containing protein 3 (SPX3). In Arabidopsis thaliana, PLPZ2 can aid in phosphate recycling and has been shown to be upregulated during phosphate starvation (Misson, 2005). Additionally, SPX3 helps regulate phosphate homeostasis (Secco et al., 2012;Shi et al., 2014), while HRS1 is a major regulator of both nitrogen and phosphate starvation (Kiba, 2018). The increased expression of these genes may help G.
paniculata survive in PSMI, where the limited levels of nitrate and phosphorus in the soil make this ecosystem a challenge for many plant species. However, these specific genes did not display SNPs that differed in frequency between our populations, suggesting that expression differences related to nutrient deprivation are environmentally driven, potentially epigenetically maintained, and/ or are regulated by nontranscribed regions, and these differences exist in response to the low nitrogen and phosphorus environment experienced in the dune system.
When examining PSMI GO processes enriched with differentially expressed genes that contain SNPs differing in frequency between the two populations, the only nutrient-associated process was "phosphorus metabolic processes." The gene involved in this process was CDP-diacylglycerol-glycerol-3-phosphate 3-phosphatidyltransferase 1 (PGPS1), which is involved in phosphatidylglycerol (PG) biosynthesis (Müller & Frentzen, 2001). While this gene itself has not directly been associated with nutrient homeostasis, PG can be used as a phosphate reserve during phosphate starvation, and rapidly decreases in cells when phosphate is limited (Jouhet, Maréchal, Bligny, Joyard, & Block, 2003;Nakamura, 2013). Thus, it is possible that the increase in PGPS1 may be needed to maintain PG levels under these nutrient-limited environments. However, further analysis needs to be performed to determine whether the SNPs identified alter the function of this gene.

| Circadian rhythm expression in PSMI
There were also a number of enriched GO processes in PSMI related to different timing processes, including circadian rhythm and flowering-associated photoperiod. These two processes can be linked, with the circadian clock mechanisms that drive 24-hr cycles also significantly influencing plant phenology (Salmela, McMinn, Guadagno, Ewers, & Weinig, 2018). Ideally, circadian cycles should be optimized to match environmental parameters (West & Bechtold, 2015;Yerushalmi & Green, 2009), and a disruption in circadian rhythm cycles can result in decreased fitness (Green, Tingay, Wang, & Tobin, 2002;Michael et al., 2003). Given differences in both latitude and growing degree days between PSMI and CHWA, we would expect there to be differences in phenology between the populations, and this was evident during our collecting period. Even though we collected from both populations within one week of each other, and we tried to sample from both locations at the same time of day, some mature plants in CHWA were already budding, while mature plants in PSMI were still in the growth stage of their yearly life cycle. For most of the transcripts involved in these processes, there was not a corresponding SNP between the populations, suggesting these differences may be environmentally driven. However, a transcript associated with early flowering 3 protein (ELF3) displayed increased expression in the CHWA population and contained a SNP that differed in frequency between these populations. ELF3 has been shown to modulate both flowering time and circadian rhythm (Carré, 2002), and interestingly, it can also lead to increased salt tolerance (osmotic stress) in Arabidopsis (Sakuraba, Bülbül, Piao, Choi, & Paek, 2017).
These results suggest that environmental factors eliciting changes in timing and phenology may be helping to maintain these invasive populations.

| Phenotypic comparisons: germination and growth trials
To see what effects environmental factors might be having on different life-history traits of our populations, we set up common garden growth trials. Different environmental factors can have varying selective pressures on germination rates, seedling emergence success, and above-and belowground tissue allocation (Chauhan & Johnson, 2008;Taylor et al., 1995). In our common garden experiments, we initially observed that seeds collected from CHWA germinated quicker and had higher seedling emergence success than those collected from PSMI. The better performance of the CHWA population could be due to release from the abiotic stress factors that were indicated by our gene expression data. Improved performance when a species is removed from an environment imposing abiotic stressors is a common hypothesis and is used as one explanation for the success of invasive species (Catford, Jansson, & Nilsson, 2009 We saw no differences in above-and belowground tissue allocation after seedling emergence between populations, suggesting there are no genetic differences between these populations in relation to these growth measures. We expected the nutrient limitation in PSMI to have an influence on the above-and belowground tissue allocation of seedlings. In environments where nitrogen and phosphorus are the main limiting nutrients, root growth can be favored in seedlings relative to aboveground growth (Ericsson, 1995).
In contrast, shortage of Ca, which was present in higher quantities in PSMI than in CHWA, has been found to have little or no influence on above-and belowground tissue allocation in laboratory experiments (Ericsson, 1995). The lack of difference observed in root:shoot ratios in our plants could indicate that these factors do not influence tissue allocation resources in G. paniculata seedlings, or that these differences are not seen when G. paniculata is grown in a nutrient-sufficient environment.
For our common garden trials, in addition to some of the population differences identified, we also observed significant family effects in germination rate, seedling emergence success, and aboveand belowground tissue allocation ratios, suggesting the potential for genetic effects. Variation in these traits is known to be driven, in part, by genetic factors in other plants. For example, for Brassica oleracea, heritability estimates of mean seed germination time and root:shoot length are approximately 14% and 12%, respectively.
Looking to our gene expression data from our field-collected seedlings, we did not observe differential expression of candidate genes proposed to be involved in germination timing (i.e., AHG1, ANAC060, PDF1 (Footitt et al., 2020)); however, this could be due to the age of the seedlings upon collection. While the family effects we observed could be a function of genetic differences between seeds from dif- the demographic history of these populations. Thus, the genetic differences that we are observing may be confounded by the past history of these populations prior to initial introduction to these areas. Secondly, in this study we only examined one population within a sand dune habitat and one population from a sagebrush habitat. Again, because demographic history can be a confounding factor, we cannot explicitly state that differences between these environments are solely driving the differences in gene expression patterns we observed or that SNP differences between these populations are not simply due to genetic drift. In the future, we plan to include more populations from each habitat, as well as additional prairie habitats, to explore this further. However, given the close relationship between the environmental characteristics of these habitats and the GO processes that were enriched within each population, we think that these processes are worthy of further evaluation of how molecular mechanisms may be driving the success of G. paniculata in these distinct ecosystems. Third, while RNA-seq analysis allowed us to examine SNPs in differentially expressed genes, there could also be genetic differences in nontranscribed regions that regulate gene expression between these populations. In these cases, some of the differential gene expression that we are observing could still be due to genetic differences between these populations, even though no SNPs were observed between the transcripts. To capture this information, further genetic analysis comparing these two populations would need to be conducted. Fourth, while we only identified a small number of differentially expressed genes with potentially functional SNPs that differed in allele frequency by 0.5 between the two populations, we acknowledge that this is a conservative cutoff and we have not considered the potential pleiotropic effects these genes may have on the different enriched processes. Additionally, further work needs to be conducted to identify any functional effects of these identified SNP differences and assess whether they drive differences between populations. Finally, to fully assess local adaptation, more traditional approaches such as reciprocal transplant experiments are needed. Although given that G. paniculata is a prolific reproducer, transplanting more individuals into these sensitive habitats may bring significant ethical concerns. However, by identifying SNPs in differentially expressed genes that are divergent between these populations these data can provide an initial starting point to identify potential candidate genes that may be involved in adaptation to these novel habitats. Thus, regardless of these caveats, we feel that this work provides a good starting point toward identifying how different molecular processes influence G.
paniculata's success across these distinct ecosystems.
In conclusion, we found that G. paniculata seedlings from CHWA and PSMI displayed differential gene expression that was characteristic of the environment in which they were collected. In the nutrient-limited sand dune ecosystem, genes involved in responding to nutrients and phosphate starvation were upregulated. In the arid sagebrush ecosystem, genes involved in regulating responses to abiotic stress were upregulated. Given the small number of differentially expressed transcripts that contained divergent SNPs, we suggest that the majority of the expression differences associated with these enriched GO processes are likely driven by plastic responses to these different environments. Genetic divergence, however, cannot be completely dismissed given the differences in germination rates and seedling emergence success between the two populations in the common garden setting, although these seeds were collected from wild populations and maternal, environmental, and epigenetic variables could be contributing factors. Overall, this study reveals how variation in molecular processes can aid invasive species in adapting to a wide range of environmental conditions and stressors found in their introduced range.

ACK N OWLED G M ENTS
We would like to thank Emma Rice and Hailee Leimbach-Maus for assistance during seed collection and with the germination study.
We would also like to thank

CO N FLI C T O F I NTE R E S T
Thermo Fisher Scientific funded the majority of the sequencing and bioinformatics costs for this study. In exchange, we provided Thermo Fisher with QA/QC data regarding the sequencing performance obtained with their Collibri Stranded Library Prep Kits.
Thermo Fisher did not have any input regarding the design of the study, analysis of the data, interpretation of the data, or development of the manuscript.

DATA AVA I L A B I L I T Y S TAT E M E N T
All raw sequence reads associated with these data were deposited to the Sequence Read Archives (BioProject Accession #: PRJNA606240). Raw growth, germination data files, and R code for differential expression analysis and SNP identification are available on Dryad for review and will be made public once the manuscript is available (https://datad ryad.org/stash/ share/ 7XUtU P1t7w bBU6d d-hYM-FQss7 XOknK 2lf-HSCw9oSQ).