Multiple modes of selection can influence the role of phenotypic plasticity in species' invasions: Evidence from a manipulative field experiment

Abstract In exploring the roles of phenotypic plasticity in the establishment and early evolution of invading species, little empirical attention has been given to the importance of correlational selection acting upon suites of functionally related plastic traits in nature. We illustrate how this lack of attention has limited our ability to evaluate plasticity's role during invasion and also, the costs and benefits of plasticity. We addressed these issues by transplanting clones of European‐derived Plantago lanceolata L. genotypes into two temporally variable habitats in the species' introduced range in North America. Phenotypic selection analyses were performed for each habitat to estimate linear, quadratic, and correlational selection on phenotypic trait values and plasticities in the reproductive traits: flowering onset and spike and scape lengths. Also, we measured pairwise genetic correlations for our “colonists.” Results showed that (a) correlational selection acted on trait plasticity after transplantation, (b) selection favored certain combinations of genetically correlated and uncorrelated trait values and plasticities, and (c) using signed, instead of absolute, values of plasticity in analyses facilitated the detection of correlational selection on trait value‐plasticity combinations and their adaptive value. Based on our results, we urge future studies on species invasions to (a) measure correlational selection and (b) retain signed values of plasticity in order to better discriminate between adaptive and maladaptive plasticity.


| INTRODUC TI ON
The role of phenotypic plasticity in species invasions has long been debated (e.g., Baker, 1965;Ghalambor et al., 2007;Richards et al., 2006). Empirical comparisons of ancestral and derived populations, invasive and native noninvasive species, and species varying in invasiveness have provided valuable insights about this role and the evolution of plasticity (e.g., Bossdorf et al., 2005;Colautti et al., 2017;Colautti & Lau, 2015;Davidson et al., 2011;Godoy et al., 2011;Matzek, 2012;Molina-Montenegro et al., 2012;Morris | 4141 LACEY Et AL. et al., 2014;Muth & Pigliucci, 2007;Parsons & Robinson, 2006;van Kleunen et al., 2011;Yeh & Price, 2004). However, the debate continues because of the paucity of empirical attention given to several aspects of the invasion process. First, multiple traits, as opposed to a single trait, generally influence the fitness of an individual in a natural habitat. Whereas many empirical studies have examined the role of a single trait value or plasticity on invasion success, few have looked at the role of suites of traits or the responses of selection for genotypes varying in plasticity in nature (Ghalambor et al., 2007Hendry, 2016;Lacey et al., 2012). Because of this, the possibility of correlational selection has been ignored. Second, individuals entering a new habitat are likely characterized by genetic correlations among trait values and plasticities, and these correlations could affect the responses to selection (Auld et al., 2010). However, seldom have the interactions between these correlations and modes of selection been examined empirically (Schrieber et al., 2017). Third, costs and benefits of plasticity have generally been assessed using absolute values of plasticity in empirical studies (e.g., van Kleunen & Fischer, 2005Palacio-Lopez et al., 2015;Volis, 2009;Wang et al., 2018). This practice can, however, obscure fitness effects, that is, adaptive versus maladaptive plasticity (Dechaine et al., 2007;van Kleunen & Fischer, 2005Weinig et al.. 2006). Below we explore these aspects of colonization, using data from a manipulative experiment in which we transplanted individuals from a species' native range into its introduced range.
When a colonizing individual enters a new habitat, the individual brings a suite of genes that underlie the phenotypic plasticities of multiple traits, that is, underlie the reaction norms often used to describe plasticity. We define plasticity as the ability of a genotype to modify the phenotype of a trait in response to environmental change. Multiple molecular studies now support the existence of such plasticity genes (e.g., Cho et al., 2017;Corl et al., 2018;Des Marais et al., 2013;Gu et al., 2019;Han et al., 2019;Knies et al., 2009;Marshall et al., 2020;Morris et al., 2014;Nilson & Assmann, 2010).
Ideally, it would be helpful to have measures of these plasticities, that is, the potential for environmental flexibility in arriving colonists, and have measures of their genetic correlations, for example, among plasticities and trait values. These correlations could potentially influence the range of phenotypes expressed by colonists in a new habitat, thereby affecting the direction and magnitude of early evolution (Antonovics, 1976).
One would also like to identify which plasticities are the targets of phenotypic selection and quantify the direct and indirect effects on fitness (Antonovics, 1976;Endler, 1986;Lande & Arnold, 1983;Mitchell-Olds & Shaw, 1987). Empirically identifying these targets has not been easy. While we recognize that the phenotypic value of a trait can directly or indirectly affect fitness in a local environment (Via, 1993;Via et al., 1995), the plasticity of that trait may affect fitness directly or indirectly, and independently of the fitness effect of the trait's phenotypic value (Bradshaw, 1965;de Witt & Scheiner, 2004;Mousseau & Fox, 1998;Roff, 1997;Scheiner, 1993;Scheiner & Callahan, 1999;Schlichting, 1986;Sultan, 1987;van Tienderen, 1991;West-Eberhard, 2003). However, when a phenotypic value of a trait and its plasticity are genetically correlated, estimating independent fitness effects can be more difficult (Auld et al., 2010).
Similarly, few empirical studies of colonists have quantified the independent and correlated effects of a trait's value and its plasticity on fitness (Colautti & Lau, 2015;Davidson et al., 2011;Godoy et al., 2011;Matzek, 2012). Ideally, we would like estimates of selection on plasticity that account for both the independent effects of trait phenotypic values and plasticities and the genetic correlations among phenotypic values and plasticities for these functionally related traits (e.g., Endler, 1986;Kingsolver et al., 2001;Lande & Arnold, 1983).
We would also like estimates of linear and nonlinear modes of selection (Antonovics, 1976;Endler, 1986;Lande & Arnold, 1983;Mitchell-Olds & Shaw, 1987). Theoretically, selection can be linear (directional), quadratic (e.g., stabilizing, disruptive), or correlational, and selection on various combinations could potentially contribute to the evolution of a trait value and its plasticity in natural populations. Quadratic and correlational selection are expected to be common in nature Blows & Brooks, 2003;Roff & Fairbairn, 2007;Schluter & Nychka, 1994;Sinervo & Svensson, 2002), and in particular, correlational selection, that is, selection that favors certain combinations of traits, is expected to cause the evolution of genetic correlations. Measures of this nonlinear mode on plasticity in nature are very rare (Donohue et al., 2000;Roff & Fairbairn, 2012;Tucić et al., 1998), as they are for trait values generally (e.g., Kingsolver et al., 2001Kingsolver et al., , 2012. In thinking about the possible roles of plasticity in the process of early colonization and evolution in a new landscape, we envision that plasticity could influence the process in several ways. (a) Plasticity could, by itself, be adaptive, or maladaptive, thereby directly causing the mean fitness of a population to move toward or away from a fitness peak, respectively, as defined by the new adaptive landscape (e.g., Ghalambor et al., 2007;Hendry, 2016). (b) Plasticity could indirectly facilitate or impede movement toward a fitness peak because of a pre-existing genetic correlation with a correlated trait value or plasticity that is, itself, under direct selection (Auld et al., 2010;Ghalambor et al., 2007). (c) Plasticity could facilitate or impede movement toward multiple fitness peaks because of correlational selection, which favors multiple combinations of traits involving plasticity. (d) Plasticity could facilitate the maintenance of genetic diversity when a fitness peak is broad.
To explore these possibilities, we conducted an experiment that assessed the fitness effects of trait value and plasticity in reproductive traits in two temporally variable habitats in nature. Our habitats were far enough apart that a population in one habitat would not experience the environment of the other. Habitats resembled each other in terms of photoperiod but differed in temperature regime ( Figure S1). Also, each habitat was itself temporally variable (e.g., with respect to temperature, precipitation). Temporal variability could favor the evolution of plasticity, depending on the frequency and pattern of variation (Gabriel & Lynch, 1992;Gomulkiewicz & Kirkpatrick, 1992;Marshall et al., 2019;Moran, 1992;van Tienderen, 1991) We focused on reproductive traits of Plantago lanceolata L. because they contribute strongly to reproductive success, and, thus, to individual fitness and because the thermal environment during the long reproductive season is predictably variable at scales of weeks and months, as well as being diurnally variable.
Clones of native European Plantago lanceolata genotypes were transplanted into two habitats within the species' introduced range in North America. To expand the range of phenotypic variation upon which selection could act, we used genotypes derived from populations that spanned most of the latitudinal range of the species in Europe. Expanding phenotypic variation beyond that which currently exists in a habitat increases the probability of detecting selection (cf. Mitchell-Olds & Shaw, 1987;Wade & Kalisz, 1990).
Our study allowed us to explore how different modes of selection on plasticity might have affected the early evolution of P. lanceolata, given initial trait correlations. Our first goal was to determine which reproductive traits showed genetic variation in plasticity. We defined plasticity as a property of a genotype and as the ability, or flexibility, to modify its phenotype in response to environmental change. Focusing on this subset of traits, we then addressed the following questions for our European genotypes at each transplant site: (a) What are the genetic correlations between trait values and plasticities? (b) What is the most appropriate statistical model (e.g., linear, quadratic, correlational) for assessing selection on plasticities and trait values, given our data? (c) What modes of selection are acting on trait values and plasticities? Here we looked for evidence of directional, quadratic (i.e., curvilinear), and correlational selection.
(d) How are genetic correlations expected to affect the responses to selection? (e) What are the effects of using signed values of plasticity versus using absolute values to estimate costs and benefits of plasticity? Using clones of the same European genotypes provided the statistical power for these assessments.

| The study species and focal traits
Plantago lanceolata L. (English, or ribwort, plantain), Plantaginaceae, is a temperate, weedy, gynodioeceous, herbaceous perennial, native to Eurasia but now well established in disturbed areas, lawns, and grasslands in North America (Cavers et al., 1980). Many traits show intragenerational and intergenerational plasticity (Bradshaw, 1965;Caseet al., 1996;Lacey & Herr, 2000Primack & Antonovics, 1981;Schmitt et al., 1992;van Hinsberg, 1997;van Tienderen, 1990van Tienderen, , 1992van Tienderen & van der Toorn, 1991a, 1991bWolff & van Delden, 1987). Throughout the reproductive season, which in some regions lasts for 6 months, spikes (inflorescences) of tightly packed flowers on leafless scapes rise from basal rosettes. One can usually see multiple spikes in different stages of development on a plant once reproduction begins. Spike development is sequential and overlapping. Floral development on a single spike and scape elongation on that spike are influenced by temperature and can continue for several weeks (Lacey, pers. Obs.). The time from spike appearance to seed maturity on the spike typically ranges from 2 to 6 weeks, depending on the temperature and water availability. Floral reflectance is temperature-sensitive, and consequently, spikes of flowers change color throughout a reproductive season (Lacey & Herr, 2005;Stiles et al., 2007). Protogynous flowers are self-incompatible (Ross, 1973;van Damme, 1983) and predominately wind-pollinated (Cavers et al., 1980). Flowering onset and end times are predominately photoperiodically controlled (Snyder, 1948). and their time of appearance (e.g., Lacey et al., 2003). Finally, scape length in a wind-pollinated species should influence pollen dispersal distance and possibly also the temperature of reproductive tissues.
Longer scapes should increase the exposure of reproductive tissues to cooling by wind, whereas shorter scapes should enhance radiative warming from the ground.

| Genotype selection
In 2010, 50 European genotypes from 14 populations (1-4 genotypes/population) used in an earlier study (Lacey et al., 2010) were chosen from the latitudinal range of the species in Europe (Latitude = 41-62°N: Italy to Scandinavia) and across the range of thermal plasticities in floral reflectance (Lacey et al., 2010). This variation was expected to expand the range of phenotypic plasticity for the focal reproductive traits. All populations grew at altitudes <250 m above sea level. Parents of the experimental genotypes had been collected as seeds in Europe in 2000. To reduce maternal effects, parents had been grown in a greenhouse and then isolated by population for wind pollination and seed production under similar environmental conditions (Lacey et al., 2010). and May 13 at Blacksburg, VA. and Greensboro, NC, respectively), but differ in the thermal environment during the growing season.

| Transplant sites
MLBS has, on average, a colder and shorter reproductive season with a greater range of temperatures ( Figure S1: maximum temperature -minimum temp. from April 1 to October 1: MLBS = 23.8°C; NCAT = 19.5°C). A few scattered P. lanceolata growing naturally with forbs and grasses at both sites were removed from the plots before transplanting. During the experiment, we trimmed the vegetation surrounding transplants so that we could find P. lanceolata spikes.

| Experimental design
We grew clones of the experimental genotypes in teacups (approx. April 15, Julian Day 105). MLBS not uncommonly has snow-covered or frozen ground in early March. Clones were randomly assigned positions 20 cm apart. That summer, we recorded and marked biweekly spike production per clone. As spikes matured, we collected the spikes and recorded scape and spike lengths.
For our analyses, flowering onset of each clone was measured as the week of first spike appearance based on Julian days. Flowering duration was measured as the number of weeks between first and last spike appearance. Median week of spike production served as the proxy for the timing of spike production within the flowering season. Median week was the number of weeks that a clone took to reach 50% total spike production, starting from flowering onset. Because each clone produced multiple spikes and scapes, we calculated the mean length for each of these two traits for up to the first 20 scapes/spikes produced per clone and used the mean values per clone as the trait values for spike and scape length in the selection analyses. Scape lengths within an individual clone varied little over time ( Figure S2). Most plants stopped flowering by the 20th spike (Herrera, 2013). Spike lengths predictably declined throughout the reproductive season, which is typical of declining resources allocated to reproduction as a reproduction season progresses ( Figure S3 and Lacey et al., 2003).
Total seasonal seed production, the fitness proxy for each clone, was estimated by first estimating seed production per spike. We individually weighed a sample of spikes and counted their seeds (spike sample size: MLBS = 53, NCAT = 69). Linear regression models (PROC REG, SAS version 9.4, 2008) of these data were used to estimate seed production for all other spikes (MLBS: seeds = 380.6891 [spike weight], r 2 = .82; NCAT: seeds = 403.4024 [spike weight], r 2 = .77). Seed numbers were summed over spikes for each clone.

| Plasticity measurements
By necessity, plasticity, a genotype's ability to modify a trait in response to environmental change, was measured at the genotype level rather than the clonal level. Plasticity for each reproductive trait per genotype was determined by subtracting the mean value of a genotype's trait (i.e., mean value of clones per genotype) at MLBS from its mean value at NCAT. Thus, we examined the effects of both magnitude and direction of plasticity as expressed in natural settings.
A positive value meant that a trait had a higher mean trait value at NCAT, whereas a negative value meant that a trait had higher value at MLBS. This method provided information about the direction of phenotypic change, which would be lost if we used absolute values of plasticity in analyses. In the Discussion, we address the effects of using signed versus absolute values of plasticity. Calculating plasticity as the difference between trait means per site raised the question of whether or not a trait's value and plasticity were correlated with each other. Our analyses addressed this possibility. First, using PROC MIXED (SAS version 9.4), we examined the effect of transplant site and genotype separately on six reproductive traits: flowering onset, flowering duration, median week of spike production, total seasonal spike number, scape length, and spike length. G (Genotype) × E (Site) interactions were used to identify which traits were genetically variable for plasticity. Subsequent analyses were restricted to traits for which interactions were statistically significant (p < .05). These traits, in contrast to those showing only significant E(Site) effects, showed evidence of genetically different responses to the experimental sites, that is, showed evidence that the sites favored some genotypes over others based on their plasticity. We did not include genetically variable traits lacking plasticity, so that the number of observations in our dataset would exceed the number of predictors in selection models. Before all analyses, trait values and plasticities were standardized to a mean = 0 and variance = 1 within site.

| Statistical analyses
To see if local microsite variation affected the covariances between trait means and fitness, we used multilevel regression models (PROC MIXED) to conduct both phenotypic and genotypic selection analyses on the trait values. Local microsite variation can produce biased phenotypic results if the microsite variation alters the covariances (Lande & Arnold, 1983;Rausher, 1992). Such covariances can result either from environmental conditions that independently influence fitness and the focal traits or from selection on unmeasured traits that are themselves correlated with the traits of interest. Results of the regression analyses showed that the differences between genotypic and phenotypic trends were negligible, that is, there was little evidence of microenvironmental bias ( Figure S4). Therefore, we conducted all selection analyses using phenotypic measures for our trait values. This took advantage of all our data and increased our sample size and statistical power to detect selection.
Phenotypic selection (multivariate regression) analysis was used to estimate the direction and intensity of linear (directional) and nonlinear (quadratic and correlational) selection on each trait found to be genetically variable for plasticity (Lande & Arnold, 1983;Phillips & Arnold, 1989;van Tienderen, 1991). We used multilevel regression models (PROC MIXED) instead of least squares regression because clonal data were grouped by genotype. Multilevel regression properly incorporates predictors associated with both clone and genotype. This allowed us to calculate error terms at the clonal level for trait values and error terms at the genotypic level for plasticity (e.g., Raudenbush & Bryk, 2002;Singer, 1998). Also, we could examine the effects of plasticity predictors on our fitness proxy, independent of any possible correlations between trait values and trait plasticities. The number of observations in our models exceeded the number of independent predictors.
In order to determine the most efficient model for analyzing our data, we compared various subset multivariate regression models per site, depending on the question being addressed. The form for the full model 1 (example shown here includes 2 of the 3 traits) was: where W is the fitness proxy at a transplant site, α is a constant, X i-j are the site trait values, and pX i-j are the plasticity values of traits. Because traits were measured on clones nested in genotype, a multilevel model with random intercepts was fit, and thus, the model included error terms for both clone (ε) and genotype (ε g ). Linear selection gradients (e.g., β i , β j ) estimated the strength of directional selection on trait values, plasticities, and latitude. Linear selection gradients were estimated from models that included only linear terms. Quadratic selection gradients (e.g., ɣ ii , ɣ jj ) measured the curvature in fitness functions and also estimated stabilizing (e.g., negative ɣ ii ) and disruptive (e.g., positive ɣ ii ) selection on a trait value or plasticity. Cross-product selection gradients (e.g., ɣ ij , ɣ ipi , ɣ jpj , ɣ ipj , ɣ pij , ɣ pipj ) estimated correlational selection between pairs of trait values and trait plasticities. This type of model measures the change in distribution of a trait value or plasticity due to selection acting directly on the predictor, independent of changes due to correlations with other predictors included in the model (Lande & Arnold, 1983;Mitchell-Olds & Shaw, 1987). A positive ɣ ij value indicated that natural selection favored a positive correlation, whereas a negative ɣ ij indicated that selection favored a negative correlation. Because the experimental genotypes were progeny of parents collected from a range of latitudes in Europe, we included a linear term to account for source latitude of the parents. Quadratic coefficients and their standard errors were doubled (Stinchcombe et al., 2008).
To address the question of whether or not plasticity was a target of selection at each site, we compared the full model with a model including all trait values but lacking all plasticity predictors.
Second, to address whether linear, quadratic, and/or correlational selection had occurred, we compared the full model with the following reduced models: (a) a "quadratic" model with only linear and quadratic terms, (b) a "cross-product" model with only linear and cross-product terms, and (c) a "linear" model with only linear terms.
The goal was to determine if we could reduce the full model at each site and not lose predictive power to estimate fitness. We felt that this was the best way to address the global question of whether there was statistical evidence for a particular type of selection (e.g., quadratic). For example, our alternative hypothesis for the quadratic selection test was that there was at least one nonzero coefficient among the quadratic selection terms. If we did not have evidence to support this, we proceeded to drop the quadratic terms and refit the model. We compared the log-likelihood statistics of each pair of models using a chi-square test to determine if the full model was a better fit, or if we could reduce the full model by eliminating a set of predictors without losing much predictive power. Finally, using the most efficient model for each site, we examined the selection coefficients of individual trait values and plasticities.
The selection coefficients can be used to construct selection surfaces (Lande & Arnold, 1983;Phillips & Arnold, 1989). Therefore, to visualize how correlational selection might act on pairs of trait values and/or plasticities, we used PROC TPSPLINE (SAS version 9.4) to produce fitness surfaces for the pairs. This nonparametric procedure can complement the selection gradient analyses by revealing subtleties in selection surfaces (Brodie et al., 1995;Schluter & Nychka, 1994

| RE SULTS
Flowering began earlier and lasted longer at NCAT than at MLBS, as evidenced by genotypic differences in flowering onset, median week of flowering, and duration ( Figure 1 and Figure S1: NCAT week 17-31 [April-August] versus MLBS week 25-34 [June-September]). Also, genotypes at NCAT produced more and longer spikes (Figure 1). Site differences were statistically significant for all traits except scape length. All traits showed genetic variation, but only flowering onset, scape length, and spike length showed evidence of genotypic variation in plasticity, that is, genotype by site interactions. Therefore, the following analyses addressed these three traits.
Fitness showed a downward-concave relationship with flowering time (Figure 2a; Table 2). Thus, there was evidence for directional selection favoring early flowering, but not too early as indicated by the significant negative quadratic coefficient. In contrast, fitness increased with increased spike length and showed an upward concave relationship with spike length (Figure 2b; Table 2). The significant positive quadratic coefficient (ɣ ii ) indicated that fitness increased more than one would predict under directional selection alone.
These combined results provided strong evidence for selection favoring early flowering and long spikes. The curvature for scape length was slightly concave downward, but there was no evidence of selection on this trait (Figure 2c; Table 2A).
All possible cross-products (ɣ ij ) were statistically significant ( Table 2). The cross-products for onset by spike length and spike length by scape length were negative, whereas, the opposite was true for onset by scape length. These data provided  Comparisons of the reduced models (2-4) with the full model showed that with very little loss of predictive power, we could use "cross-product" model 3 (only linear and cross-product predictors) for our selection analysis. The "cross-product" model did not significantly differ from the full model (Χ 2 = 4.2 p = .6496, df = 6, log-likelihood for cross-product model = 115.9), and the "cross-product" model was superior to linear and quadratic models. These latter models showed lower predictive power when compared to the full model, that is, they significantly differed from the full model (difference for linear model Consequently, we used the cross-product model that included plasticity to examine individual traits at NCAT (Table 2B).
The regression analysis showed that the intensity and direction of selection for a trait most often depended on the value of another trait, be it a trait value or plasticity. Plasticity's contribution to fitness was manifest only via correlational selection, and all the statistically significant cross-products involved plasticity of at least one trait (Table 2B).
These contributions to fitness were not detected by using linear and quadratic models (Table S1). The quadratic model showed no significant effect of either spike or scape plasticity on fitness (Figure 4e,f).
In contrast, the correlational selection model showed disruptive selection on spike and scape plasticity ( Figure 5). The only evidence of direct selection in the correlational model was on spike length (Table 2B).
Evidence for correlational selection most frequently involved spike and scape traits. Data for spike length by spike plasticity and TA B L E 2 Linear and nonlinear selection coefficients (±1 SE) and p values from phenotypic selection analyses of effects of trait values, plasticities (plast) and cross-products on fitness at each transplant site scape length by scape plasticity showed evidence of disruptive, negative correlational selection (Figure 5a,b; Table 2B). Each fitness surface showed 2 peaks. One peak showed increased fitness for genotypes producing longer spikes and scapes, as well as "negative" plasticity. Negative plasticity meant that spikes and scapes at NCAT were shorter than at MLBS (genotypic plasticity = NCAT genotypic mean value − MLBS mean value). A second fitness peak was associated with "positive" plasticity (i.e., spikes/scapes at NCAT were longer than at MLBS), coupled with shorter spikes relative to the mean value at NCAT (Figure 5a) and scapes of average length (Figure 5b).
These fitness surface patterns were concordant with the negative ɣ ij values (Table 2).
Fitness surfaces for spike length by scape plasticity and scape length by spike plasticity also each showed two fitness peaks (Figure 5c,d). However, the cross-products were positive indicating positive correlational selection (Table 2B). For each surface, one fitness peak showed increased fitness for genotypes producing longer spikes and scapes and negative plasticity, as described above.
The other peak, however, was shifted toward the mean spike length at NCAT, that is, toward longer spikes (Compare Figure 5a and c) and scapes longer than the mean scape length (Compare Figure 5b and d). Lastly, the fitness surface for spike and scape plasticities ( Figure 5e) showed a somewhat wavy ridge extending generally from higher fitness for genotypes producing shorter spikes and scapes at NCAT than at MLBS (lower left) toward higher fitness for those producing longer spikes and scapes at NCAT (upper right). This pattern was consistent with the statistically significant positive ɣ ij value resulting from the regression analysis (Table 2). The pattern also paralleled the statistically significant genetic correlation between spike and scape plasticities detected at NCAT (Table 1).
Fitness was also influenced by correlational selection involving onset plasticity (Table 2B). The cross-product for onset plasticity by F I G U R E 2 MLBS phenotypic associations between reproductive traits and seed production. Trendlines show quadratic relationships spike length was negative, which was concordant with the highly significant negative genetic correlation between onset plasticity and spike length (Table 1). In contrast, the cross-product for onset plasticity by scape length was positive. Fitness surfaces for both crossproducts involving onset plasticity showed only one peak, more a mesa, where fitness was higher for genotypes flowering earlier at NCAT than at MLBS, that is, negative plasticity, over a range of spike and scape lengths close to their mean values (Figure 5f,g).
Finally, the analysis showed that parental latitude influenced offspring fitness at NCAT (Figure 6), though not at MLBS (Table 2). Offspring of southern European parents showed higher reproductive output than did offspring of northern parents. This was not unexpected given the warmer temperature regimes during the reproductive season at NCAT ( Figure S1), although we did not see the opposite pattern at MLBS.

| D ISCUSS I ON
To our knowledge, this is the first empirical experiment that has es- An evolutionary question that has been debated for several decades is whether or not plasticity can be a direct target of selection in nature (e.g., Sarkar & Fuller, 2003;Via et al., 1995). The answer has been elusive both because few studies have simultaneously estimated selection on both a trait value and plasticity in a natural setting and because even fewer have assessed nonlinear, as well and direct, selection (Baythavong & Stanton, 2010;Callahan & Pigliucci, 2002;Donohue et al., 2000;McIntyre & Strauss, 2014;Steinger et al., 2003;Tucić et al., 1998;van Kleunen & Fischer, 2005). Blows and Brooks (2003)  Targets of selection and evolutionary potential are well known to be environment-dependent (e.g., Auld et al., 2010;Bégin & Roff, 2001;Donohue et al., 2001), and our results are consistent with this environmental dependence. In contrast to the NCAT site, we found no evidence that plasticity influenced fitness at the MLBS site. Both of our transplant sites had similar temporal patterns of photoperiod and shade during the reproductive season. However, temperatures differed between transplant sites and changed over the course of the season. For many weedy species that reproduce over several months, abiotic and biotic environments can change as a reproductive season progresses (e.g., for P. lanceolata: temperature, seed predation, Lacey et al., 2003). Consequently, selective landscapes are likely to be phenologically dynamic and may help explain why one observes multiple peaks, or ridges, or large mesas on a fitness surface, as we did for spike and scape plasticities.
Additionally, nutrient and water availability may have changed phenologically and differed between sites. All have been shown to modify flowering onset and spike and scape lengths (Lacey & Herr, 2005;Primack & Antonovics, 1981;van Tienderen, 1990van Tienderen, , 1992, and any could have contributed to the differences in statistical results that we detected between sites. Several conclusions can be drawn from the NCAT results for P. lanceolata. The most obvious is that selection, both direct and correlational, favored longer spikes. From a fitness perspective, this makes intuitive sense. Spike length was strongly correlated with flower number (Herrera, 2013). Longer spikes, that is, inflorescences, increase potential seed production. The second is that correlational selection favored earlier onset of flowering. Given the warmer climate at NCAT, flowering earlier would potentially allow more time for reproduction, which can extend through August if late-summer rainfall occurs. Also, earlier flowering would allow plants to escape seed predation by grasshoppers, whose populations can explode in mid-summer in some years (Lacey et al., 2003). Third, fitness surfaces (Figure 5c-e) show that the combinations of long spikes coupled with negative scape plasticity F I G U R E 5 NCAT fitness surfaces showing correlational selection for pairwise combinations of trait values and plasticities: (a) spike length by plasticity, (b) scape length by plasticity, (c) spike length by scape plasticity, (d) scape length by spike plasticity, (e) spike plasticity by scape plasticity, (f) spike length by onset plasticity, (g) scape length by onset plasticity. color:dark blue = lowest fitness (− sign), dark red = highest fitness (+ sign). Each contour line represents a line of equal fitness. Values on axes and contour lines show standard deviations (Some genotypes can be seen as dots). Genetic correlations shown by straight lines (p = .0001). The dashed ellipses in (a), (b), and (e) represent three hypothetical samples of six individuals from a source population that have dispersed into a new hypothetical fitness landscape (a, b, or e). Individuals within each ellipse in (a) and (b) have relatively low fitness and the centroid is approximately the same distance from two fitness peaks. Individuals within the ellipse in 5E lie along the saddle of a fitness ridge (See the Discussion for the potential implications)

F I G U R E 6
The association between mean seed production per genotype (i.e., average of clones within each genotype) and the source latitude of each genotype shown for transplant sites: MLBS (solid triangles, dashed line) and NCAT (open squares, solid line) (i.e., shorter scapes at NCAT than at MLBS) or with no plasticity in scape length were less fit than the combination of long spikes and positive plasticity (i.e., longer scapes at NCAT than at MLBS). This result is consistent with the hypothesis that this trait-plasticity combination is advantageous in warmer climates. Longer scapes raise spikes farther from the ground, likely facilitating the cooling of reproductive tissues. Likewise, scape shortening in cooler climates, for example, MLBS, can increase the potential for both radiative warming from the ground and protection from cold winds.
Future research could further test this hypothesis.
Because natural selection acts on multiple traits simultaneously in nature, many evolutionary biologists have argued that correlational selection should be common, especially for functionally related traits (Blows, 2007;Blows & Brooks, 2003;Brodie & McGlothlin, 2007;Lande & Arnold, 1983;Phillips & Arnold, 1989;Schluter & Nychka, 1994). Moreover, correlational selection is posited to be a strong evolutionary force by which traits become functionally and genetically integrated (Brodie, 1992;Lande, 1980Lande, , 1984. Few empirical measures of correlational selection on a suite of functional traits in nature have existed to test this belief (Blows & Brooks, 2003;Brodie et al., 1995;Kingsolver et al., 2001;Nicolaus et al., 2016;Sinervo & Svensson, 2002;Wise & Rausher, 2016), and we have found only one plant study that provides evidence of correlational selection on plasticity: shade-dependent leaf-length plasticity in Iris (Tucić et al., 1998). For this reason, our study is noteworthy. We detected evidence for correlational selection acting on key reproductive traits, values, and plasticities, where P. lanceolata grows naturally. Results are consistent with the above argument that correlational selection can influence evolutionary change in a population and population responses to novel habitats. Had our data set been large enough to include 3-way interactions in a more complicated regression model, we might have detected more peaks and valleys on our fitness surface. However, there is still a need to develop analytical techniques that can integrate information from more complex models in a biologically understandable way (Brodie & McGlothlin, 2007;Brodie et al., 1995).
A colonizing species enters a new habitat with a set of genetic correlations among functionally related traits, and these correlations could constrain or facilitate adaptation to the new habitat. Our data showed evidence of both possibilities. At MLBS, correlational selection favored genotypes with long spikes and short scapes (Figure 3c).
However, the positive genetic correlation between spike and scape length of our European "colonist" experimental population would be likely to constrain evolutionary change in that direction. In contrast, at NCAT, spike and scape plasticities were positively genetically correlated with each other, perhaps reflecting the single developmental pathway controlling growth in these structures. At this site, however, the fitness surface produced from these data suggested a ridge of higher fitness rather than a peak (Figure 5e), and the ridge loosely paralleled the direction of the genetic correlation. This ridge could allow evolutionary change toward increased fitness in more than one direction. In general, correlational selection should promote and maintain genetic correlations (Cheverud, 1984;Lande, 1980Lande, , 1984, If so, we would predict that this genetic correlation between spike and scape plasticities would be strengthened.
Our data suggest that correlational selection can influence the direction of early evolutionary change, depending on where dispersing individuals land on an adaptive landscape. While the true adaptive landscape in the North Carolina Piedmont, where the NCAT plot was located, is not known, our fitness surfaces suggest that multiple peaks, ridges, or even mesas of higher fitness could characterize a landscape for an introduced species. Such topographical variation creates the potential for multiple responses to correlational selection. For example, suppose that a few individuals arriving from a source population (e.g., those within the dashed ellipses in Figure 5a,b) are deposited over multiple years in a fitness valley (e.g., While these are hypothetical scenarios, they illustrate the value of experiments that can measure correlational selection.
Our experiment differed in two methodological ways from those that have previously explored selection on plasticity in nature. First, rather than using trait value means averaged across transplant sites for our analyses, we estimated the effects of plasticities against trait values expressed in a single environment. This allowed us to assess the independent effects of a trait value and its plasticity in each environment where selection occurred. Second, we retained the signs of plasticity values in statistical analyses to explore fitness effects.
This allowed us to examine both the magnitude and direction of plasticity on fitness.
Biologists estimating plasticity's adaptive significance in nature have often used absolute values of plasticity to assess its costs and benefits (e.g., Palacio-Lopez et al., 2015;van Kleunen & Fischer, 2005Volis, 2009;Wang et al., 2018). However, this has likely obscured the fitness effects of plasticity (e.g., Auld et al., 2010;de Witt et al., 1998;Murren et al., 2015;Scheiner & Berrigan, 1998;van Buskirk & Steiner, 2009;van Kleunen & Fisher, 2005Weinig et al., 2006). Relyea (2002), van Kleunen andFischer (2005, 2007), Weinig et al. (2006), and Dechaine et al. (2007)  to produce a new 3D fitness surface that ignores direction (i.e., used absolute values), the two distinct fitness peaks disappear. Had we not retained the signs in our analysis, we could not have detected the fitness benefit of producing longer spikes and scapes at NCAT than at MLBS. Negative disruptive correlational selection for these plasticities would not have been detected. In another example, we could not have observed that earlier flowering at NCAT relative to MLBS, as opposed to plasticity in the opposite direction, was favored at NCAT (Figure 5e,f). Auld et al. (2010) pointed out that estimates of selection coefficients, and thus costs and benefits of plasticity, arising from multivariate regression analyses could be biased when two predictors are highly correlated, but only one is correlated with fitness.
Because of this problem, Auld et al. urged that data from relevant published studies be reanalyzed. We detected only one such case in the NCAT data. Flowering onset and plasticity were highly genetically correlated but only onset plasticity was found to affect fitness.
However, we detected this fitness effect by including direction as well as magnitude of plasticity in our analyses and by including correlational selection in our regression model ( Figure 5f). Thus, it is unclear how much, or if, there was identifiable bias. Correlational selection and signs for plasticity have typically not been included in studies examining costs of plasticity (See references cited above).
Therefore, in addition to the suggestion by Auld et al. (2010) to reanalyze data from published studies in order to examine the genetic correlations between trait values and plasticities, we urge that data be reanalyzed to include the direction of plasticity and correlational selection in regression models.
In conclusion, our study illustrates the importance of examining all modes of selection when assessing the fitness effects of plasticity when a species experiences habitat change, not only during invasions, but also more generally during habitat modification, for example, via urbanization, climate change. Also, the study reinforces the call for retaining the signs of plasticity in statistical analyses. Future studies might consider addressing these ques- Greensboro and the National Science Foundation for financial support (DEB 0236526 to EPL).

CO N FLI C T O F I NTE R E S T
The authors disclose no conflicts. Writing-review & editing (equal).

E TH I C A L A PPROVA L
This manuscript is not published elsewhere, and we have used no animals in this study.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data are available on Dryad (https://doi.org/10.5061/dryad. wwpzg msj1).