It’s about time: the temporal dynamics of phenotypic selection in the wild

Authors


* E-mail:Adam.M.Siepielski@Dartmouth.edu

Abstract

Selection is a central process in nature. Although our understanding of the strength and form of selection has increased, a general understanding of the temporal dynamics of selection in nature is lacking. Here, we assembled a database of temporal replicates of selection from studies of wild populations to synthesize what we do (and do not) know about the temporal dynamics of selection. Our database contains 5519 estimates of selection from 89 studies, including estimates of both direct and indirect selection as well as linear and nonlinear selection. Morphological traits and studies focused on vertebrates were well-represented, with other traits and taxonomic groups less well-represented. Overall, three major features characterize the temporal dynamics of selection. First, the strength of selection often varies considerably from year to year, although random sampling error of selection coefficients may impose bias in estimates of the magnitude of such variation. Second, changes in the direction of selection are frequent. Third, changes in the form of selection are likely common, but harder to quantify. Although few studies have identified causal mechanisms underlying temporal variation in the strength, direction and form of selection, variation in environmental conditions driven by climatic fluctuations appear to be common and important.

There is no substitute for careful and intensive field work if one wants to find out what is happening in natural populations Endler (1986, p. 125).’

Introduction

Selection is the mechanism of adaptive evolution. Indeed, selection may be responsible for much of the phenotypic diversification observed in nature (Rieseberg et al. 2002), and it has a prominent role in driving population divergence and speciation (Schluter 2000; Funk et al. 2006). Given the centrality of these processes in nature, it is not surprising that studies of selection have continually increased since Darwin (1859) introduced his theory 150 years ago (see reviews in Endler 1986; Kingsolver et al. 2001; Pagel 2009).

Selection is not only a central force in nature but it is also a dynamic one. For instance, selection can vary in strength (e.g. strong or weak; Kingsolver et al. 2001; Hereford et al. 2004), direction (e.g. positive or negative), form (e.g. linear or nonlinear; Brodie et al. 1995), space (e.g. Thompson 2005) and time (e.g. Grant & Grant 2002). Moreover, selection can interact to produce more complex patterns of variation (e.g. spatiotemporal variation in selection; Blanckenhorn et al. 1999). Understanding the dynamics of selection is a major focus of modern evolutionary and ecological studies.

Seminal to our current understanding of how selection operates in nature was the development of a statistical framework for quantifying selection on multiple quantitative traits by Lande & Arnold (1983). This development was particularly influential because it provided researchers with a simple tool with which to obtain standardized estimates of the strength, direction and form of selection (so-called ‘differentials’ and ‘gradients’), a means by which to understand how selection acts simultaneously on multiple traits, and how to detect the targets of selection. The strength and direction of selection are often a key focus because, when combined with estimates of trait heritabilities, estimates of selection can be used to predict evolutionary change (e.g. Grant & Grant 1995). Although this framework has been employed by numerous researchers to investigate the strength and direction of selection (see reviews in Endler 1986; Kingsolver et al. 2001; Hoekstra et al. 2001; Hereford et al. 2004), we know surprisingly little about the temporal dynamics of selection in nature (Grant & Grant 2002). In the review of studies by Kingsolver et al. (2001), measuring the strength of selection, most studies were not temporally replicated (the mean and median number of temporal replicates of selection over time was one). This leaves us with a gap in our basic knowledge of how selection operates as an important evolutionary force. For instance, what does the ‘temporal landscape’ of selection look like? Are changes in the strength of selection common? How consistent is the direction of selection? Does the form of selection vary among years? These and other relevant questions remain unanswered, yet they are important for understanding the patterns and processes of selection and adaptive evolution in natural populations.

Given the tremendous effort aimed at quantifying patterns of selection, a number of temporally replicated estimates of selection now exist thereby providing an exceptional opportunity to investigate the temporal dynamics of selection. Here, we assembled a database of selection differentials and gradients across several taxonomic groups, covering a range of quantitative traits compiled from temporally replicated studies of natural and sexual selection. We begin by refining what we mean by temporal variation in selection in the context of this review. We then highlight several reasons for why understanding temporal variation in selection is important. Next, we address a number of outstanding questions concerning the temporal dynamics of selection in wild populations. Finally, we consider potential sources of bias in studies included in the database we assembled. Our perusal of the past several decades of research into selection in natural populations reveals that selection is wildly dynamic.

What we mean by temporal’ variation in selection

There are several ways to envisage the dynamics of temporal variation in selection. Our focus here will be on interannual differences in selection on a given trait within a population, as this is largely the unit of replication in most temporally replicated studies. Because a population may experience selection every year, but the various traits under selection may vary between years, we explicitly consider temporal variability in selection to reflect variation on selection in each trait measured, not variation in whether a population experiences selection. Admittedly, even these multi-year datasets may be too short to detect rare episodes of selection (e.g. Gutschick & BassiriRad 2003). Indeed, because the temporal scale of environmental change can be quite great, differences detected among years in a 30-year study may not be more informative than differences detected in a 3-year study. In contrast, given the accumulating evidence for ‘rapid’ or ‘contemporary’ evolution on so-called ‘ecological time scales’ (e.g. Hendry & Kinnison 1999; Carroll et al. 2007), short-term fluctuations may be very important. Later, we discuss the importance of selection varying over different time scales.

Importance of understanding temporal variation in selection

Here, we highlight a number of issues ranging from adaptive evolution to conservation biology that illustrate the importance of considering temporal variation in selection. First, the overall magnitude and direction of adaptive evolutionary change is the result of selection on traits varying over time. Second, temporal variation in selection is thought to be an important mechanism maintaining variation within populations, and may thus limit or slow processes such as population divergence (Levins 1968; Sasaki & Ellner 1997) and local adaptation (Kawecki & Ebert 2004), particularly when selection fluctuates in direction among years (Frank & Slatkin 1990). Third, understanding patterns of selection among years and the responses to such selection are important because they can inform us about the speed with which a population can adapt to variable environmental conditions. This may be vital for understanding the evolutionary potential of populations to respond to a number of anthropogenic threats, such as introduced species, climate change and harvest-selection, among others (e.g. Western 2001; Carroll et al. 2007; Darimont et al. 2009). Fourth, environmental conditions vary over short and long time scales and the environmental coupling of selection and heritability can limit trait evolution (Meriläet al. 2001; Wilson et al. 2006). For instance, in the Soay sheep (Ovis aries), during years when environmental conditions are harsh, there is a strong selection on birth weight but little genetic variance in this trait, whereas when conditions are favourable, selection is weak and there is ample genetic variance. Such environmental coupling can limit the rate of evolution, maintain genetic variation and favour phenotypic stasis (Meriläet al. 2001; Wilson et al. 2006). Finally, the importance of considering microevolution in conservation, restoration and management is becoming increasingly recognized (e.g. Rice & Emery 2003; Stockwell et al. 2003). For example, Hendry et al. (2003) suggest that a priori estimates of selection on donor population traits in a new environment can be used to guide restoration efforts by indicating which trait values, and thus which family groups or individuals, are likely to respond favourably to the new environment. Given such potential application of the results of selection analyses to conservation biology, there is a need to measure the potential error of basing management decisions on estimates of selection derived without temporal replication.

Methods

Assembly of the temporal selection database

We reviewed the primary literature by searching the ISI Web of Knowledge database (v4.2; The Thompson Corporation, Chicago, IL, USA) using a keyword search for one or combinations of the following terms: annual variation, fluctuating selection, natural selection, phenotypic selection, sexual selection and temporal variation in selection. We also searched for all papers citing Lande & Arnold (1983), assuming that most authors using this standard approach for estimating selection would cite this paper. A number of authors estimated selection over multiple years, but presented their data averaged over the duration of their study and so we contacted these authors directly to obtain year-specific selection coefficients. When data were presented in figure format, we also contacted authors directly to obtain values of the selection coefficients. In one case (Grant & Grant 2002), we extracted selection coefficients from a figure using the digitizing software, Engauge Digitizer (http://digitizer.sourceforge.net). We performed our literature review during March 2008, and the resulting database therefore includes studies published up to this point (1986–2008). In contacting authors for annual coefficients or data presented in figure format, we also learned of several papers including pertinent data that were in preparation, in review or in press at the time, and we have included those as well (= 3, two of which are now published).

We used four criteria, comparable with Kingsolver et al. (2001), for identifying studies to include in our study. First, we only included studies that presented selection differentials or gradients for two or more years. Differentials describe total selection from both indirect and direct sources, whereas gradients describe the direct force of selection on a trait (Lande & Arnold 1983; Brodie et al. 1995). Most studies included estimates from consecutive years, but gaps in temporal intervals did not preclude inclusion in our database. Second, we only included studies from selection in the wild where no experimental/genetic manipulations were performed. Third, we only considered studies on quantitative traits showing continuous trait variation. Fourth, we only used standardized differentials or gradients (sensu Lande & Arnold 1983). These measures reflect selection on traits in terms of the relationship between relative fitness and variation in a quantitative trait measured in standard SD units, and are desirable because they allow for cross-study comparisons, irrespective of study organism, fitness measure or trait studied (Lande & Arnold 1983; Kingsolver et al. 2001). Recently, Hereford et al. (2004) recommended using mean-standardized coefficients because these allow for a more objective criterion for evaluating the strength of selection. Most of the studies we included were conducted before this recommendation, and not all studies had the necessary data to convert variance-standardized coefficients to mean-standardized coefficients and so we focused our efforts on variance-standardized coefficients. Regardless, as we note below, this should make little difference in our interpretations because we mostly focus on relative differences among years. We considered both linear and quadratic selection coefficients, but did not consider correlational selection coefficients because few studies reported them. We are aware that estimates of the strength of quadratic selection are often underestimated by one-half (Stinchcombe et al. 2008); however, most of our analyses are concerned only with relative differences among years within a given study and so our results are robust whether quadratic coefficients were calculated correctly (i.e. doubled) or not. We note where caution in interpreting patterns of quadratic coefficients is warranted. After identifying potential studies, we entered the coefficients that met the aforementioned criteria into a database (hereafter, ‘temporal selection database’). All of the data were then error-checked by at least two of us. We attempted to perform an exhaustive search and were exceedingly careful when entering the data (in many cases, each record was triple checked), but it is worth noting that we almost certainly overlooked some relevant studies and potentially introduced some errors when transcribing the data to our database.

In many studies, multiple datasets existed within studies. These within-study datasets represent selection estimated on different species, traits, fitness components, sexes, ages, populations or seasons. For clarity and where necessary, we use the term ‘dataset’ in this regard. In most cases, we included all datasets in our analyses although a few exceptions occurred, which we have noted below. We did not attempt a formal meta-analysis for two main reasons. First, because multiple traits and measures of fitness are reported per study, each estimate of selection is not independent (Gurevitch & Hedges 1999). Second, and as pointed out by Kingsolver et al. (2001), meta-analyses require information on the entire phenotypic variance–covariance matrix for each study, which was not available for most studies. Therefore, we rely mainly on graphical analyses and comparisons of distributions to assess the temporal dynamics of selection in the wild.

Overview of the database

We reviewed a total of 1569 studies. Of these, 89 studies met the aforementioned criteria and were included in our database. Eighteen studies included in the earlier review by Kingsolver et al. (2001) met our criteria for inclusion and so are also included in the temporal selection database. The database includes 3414 records and 5519 estimates of selection (Table 1). The number of temporal replicates ranged between 2 and 45 years, with a median of 3 and a mean of 7.6 years among studies (Table 2 and Fig. 1). The database is biased in favour of linear coefficients, vertebrates (especially birds), studies of natural selection and morphological traits (Table 1). We have provided the database itself as Appendix S1 and have posted it in the DRYAD website (http://www.datadryad.org/repo/), a repository for such databases.

Table 1.   Summary of the temporal selection database
 Number of items in the database
Studies89
Journals23
Records3414
Selection coefficients5519
 Linear differentials1989
 Linear gradients1989
 Quadratic differentials776
 Quadratic gradients765
Species73
Genera61
Taxon type
 Invertebrates482 (number of studies = 13)
 Plants365 (number of studies = 28)
 Vertebrates2567 (number of studies = 48)
Total types of selection
 Sexual selection512
 Natural selection2902
Trait type
 Behavioural21
 Life history1244
 Morphological1839
 Principal components310
 Physiological0
Table 2.   Summary of the studies included in the temporal selection database
 MedianModeRange
(Min)(Max)
Total number of traits21117
Total number of fitness measures11112
Total number of datasets56160
Number of temporal replicates32245
Figure 1.

 Frequency distribution of the number of temporal replicates among studies included in the temporal selection database. Because many studies included multiple datasets (see ‘Methods’), we have plotted the maximum duration of study among datasets within a given study.

Results

Overall, we found that the strength of selection on traits (averaged over years within a dataset) included in our review is exceedingly similar to that reported in Kingsolver et al. (2001), which included studies dominated by little to no temporal replication (Fig. 2). However, we did find a greater frequency of larger coefficients, indicating that consideration of long-term selection on traits increases the chance of detecting infrequent bouts of strong selection (Figs 2 and 3).

Figure 2.

 Frequency distributions of the mean of the absolute values of annual selection coefficients (binned at 0.05 intervals) are exceedingly similar to those presented earlier by Kingsolver et al. (2001) in that the strength of selection follows a negative exponential distribution with many examples of weak selection and fewer examples of strong selection. The top row corresponds to linear gradients (a; = 449) and differentials (b; = 333) and the bottom row corresponds to quadratic gradients (c; = 168) and differentials (d; = 144). The mean was calculated for each dataset within a study (see ‘Methods’ ) to reflect the average strength of selection for a given trait in a given study.

Figure 3.

 The ‘temporal landscape’ of selection shows that the strength and the direction of selection vary through time. Shown are selection coefficients for one randomly drawn dataset per study (see ‘Methods’) plotted against the generic year of the study. The first two columns are linear gradients and differentials, respectively, and the last two columns are nonlinear quadratic gradients and differentials, respectively. Different rows correspond to different taxonomic groups. Different lines correspond to different studies; there is no correspondence in line colour among the different panels.

We now shift our attention to the temporal dynamics of selection. Because of the large number of datasets reported per study (see ‘Methods’), we first present a single randomly chosen dataset within each study to graphically depict temporal variation in selection (Fig. 3). This figure shows that across major taxonomic groups, selection on traits is not constant through time but rather varies in a number of ways suggesting that the ‘temporal landscape’ of selection in nature is quite rugged (Fig. 3). We next focus on four aspects of temporal variation in selection: overall variation, strength, direction and form.

Overall patterns of temporal variation in selection

To provide an overall measure of temporal variation in selection on traits, we calculated the SD among the selection coefficients included within each dataset:

image

where st are the selection coefficients for each year (s1, s2,…, sn), and inline image is the mean of the selection coefficients over the total number of years of the study n. The SD is ideal because it describes the distribution of selection coefficients and is reported in the same units as the original measures (standardized selection coefficients). This measure describes temporal variation in selection due to changes in both the strength and direction of selection on traits among years.

Overall, the median SD of selection coefficients was similar across linear and nonlinear selection coefficients, with a right-skewed frequency distribution in all cases (Fig. 4). The distribution of SD values for linear coefficients relative to nonlinear coefficients were not significantly different for gradients (Wilcoxon rank sum test: = 0.568, = 0.570), but were significantly different for differentials (Wilcoxon rank sum text: = 3.920, < 0.0001), with nonlinear coefficients tending to have slightly larger SDs (Fig. 4). These analyses show that there is often considerable variation in selection from year to year; however, this variation reflects changes in selection due to both the strength and direction of selection. We next explore these two aspects of temporal variation in selection in isolation.

Figure 4.

 Frequency distributions of the SD of selection coefficients (binned at 0.05 intervals) show that overall variation in selection is often considerable through time. The top row corresponds to linear gradients (a; = 449) and differentials (b; = 333) and the bottom row corresponds to quadratic gradients (c; = 168) and differentials (d; = 144). Values ≥ 1.0 have been binned into a single bin so that the majority of the data can be displayed.

How consistent is the strength of selection among years?

To quantify temporal variation in the strength of selection on a given trait, we calculated the SD (as before) among the absolute values of the selection coefficients included within each dataset. We used the absolute values of the selection coefficients because here we are only interested in the strength of selection, not the direction.

The median SD of the absolute values of the selection coefficients was consistently ≥0.08 across both linear and nonlinear coefficients, with right-skewed frequency distributions in all cases (Fig. 5). Rarely, were there no differences in the strength of selection on traits among years. The distribution of values of the SD for linear coefficients relative to nonlinear coefficients were not significantly different for gradients (Wilcoxon rank sum test: = −0.071, = 0.944), but were significantly different for differentials (Wilcoxon rank sum text: = 2.751, = 0.006), with nonlinear coefficients tending to have a slightly larger SD (Fig. 5).

Figure 5.

 Frequency distributions of the SD of the absolute values of selection coefficients (binned at 0.05 intervals) show that the strength of selection often varies considerably through time. We calculated the SD using the absolute values of the selection coefficients so that the SD reflects temporal variation in strength but not the direction of selection. The top row corresponds to linear gradients (a; = 449) and differentials (b; = 333) and the bottom row corresponds to quadratic gradients (c; = 168) and differentials (d; = 144). Values ≥ 1.0 have been binned into a single bin so that the majority of the data can be displayed.

Overall, these patterns suggest that the strength of selection on traits often varies among years. In fact, the median SD of the absolute values of the selection coefficients (Fig. 5) approaches the median strength of selection on traits (Fig. 2). Because values of quadratic terms are often not doubled (Stinchcombe et al. 2008), we exercise caution in interpreting this finding for nonlinear coefficients. To explore this relationship more fully, we also plotted the SD (from Fig. 5) as a function of the mean strength of selection on traits (from Fig. 2). This graphical analysis indicates that as the strength of selection on a trait increases, there is a tendency for annual variation in the strength to increase as well (Fig. 6); however, the average strength of selection tends to be greater than the average interannual SD of selection (i.e. most points fall below the 1 : 1 line; Fig. 6).

Figure 6.

 The relationship between the SD of the absolute values of selection coefficients (from Fig. 5) and the mean of the absolute values of selection coefficients (from Fig. 2) reveals that there is a tendency for variation in the strength of selection to increase with stronger average selection on a trait. The line represents 1 : 1.

How consistent is the direction of selection among years?

We quantified consistency in the direction of selection for a given trait by calculating the proportion of positive selection coefficients relative to the total number of years selection was estimated on that trait. The choice of sign in this calculation is arbitrary because the proportion of positive coefficients = 1 – the proportion of negative coefficients. Thus, this index ranges from 0 to 1.0, with a value of 0 indicating only negative coefficients, a value of 1 indicating only positive coefficients and a value of 0.5 indicating equal numbers of negative and positive coefficients. Changes in the direction of linear selection imply shifts from positive directional to negative directional selection, or vice versa, whereas changes in the direction of nonlinear quadratic coefficients imply shifts from positive quadratic selection (i.e. disruptive selection, favouring trait extremes) to negative quadratic selection (i.e. stabilizing selection, favouring intermediate trait values), or vice versa (but see ‘Discussion’).

Overall, changes in the direction of selection are relatively common (Fig. 7). The median proportion of coefficients that were positive approached 0.5 in all cases (Fig. 7). The distributions of values of the proportion of positive coefficients for linear coefficients relative to nonlinear coefficients were significantly different for both differentials (Wilcoxon rank sum text: = −5.302, < 0.0001) and gradients (Wilcoxon rank sum text: Z = −3.197, = 0.001), with linear coefficients tending to have a slightly higher proportion of positive coefficients. In no case was the proportion of no change in the direction of selection (e.g. the sum of the 0 and 1.0 categories) greater than 0.5; however, linear coefficients tended to have a higher proportion of no changes in the direction of selection relative to quadratic coefficients (Fig. 7). This suggests that, perhaps, the direction of nonlinear selection is less consistent than linear selection. Note also that the difference between the plots showing the SD of raw values (Fig. 3) and the SD of absolute values of the selection coefficients (Fig. 4), which isolates variation in the strength of selection, provides a measure of temporal variation in selection on traits that is due to variation in direction alone.

Figure 7.

 Frequency distributions of the proportion of positive coefficients relative to the total number of years (binned at 0.05 intervals) show that selection tends to vary in direction from year to year. The top row corresponds to linear gradients (a; = 449) and differentials (b; = 333) and the bottom row corresponds to quadratic gradients (c; = 168) and differentials (d; = 144). One minus the proportion of positive coefficients equals the proportion of negative coefficients. The proportion of no changes is the sum of the 0 and 1.0 categories.

Does the form of selection vary among years?

Changes in the form of selection broadly refer to temporal variation in selection from linear (affecting the mean; e.g. directional selection) to nonlinear (affecting the variance; e.g. disruptive and stabilizing) selection (Brodie et al. 1995). However, we want to emphasize that the true form of selection does not necessarily fit neatly into one of these definitions, and so in many cases quadratic coefficients may be misleading (e.g. Schluter 1988; Schluter & Nychka 1994). This is the value of graphical approaches like cubic splines, which can allow for a better understanding of the true form of selection (e.g. Schluter 1988; Brodie et al. 1995). Indeed, fitness functions can take on decidedly complex shapes where visualization is key to their description (e.g. Phillips & Arnold 1989; Schluter & Nychka 1994). Although many researchers confirm the form of selection with cubic splines and visual analyses, these results are often not reported and, even if they are, the results do not lend themselves to later quantitative analyses. Hence, we lack a method to precisely quantify temporal variation in the form of selection in a way conducive to formal analyses. Therefore, we restrict our review of temporal variation in the form of selection to case studies.

Overall, studies that have considered temporal variation in the form of selection reveal that the form of selection can take on myriad shapes across years. For example, in pike (Esox lucious), the form of selection on body size varied from directional, to stabilizing, to more complex shapes showing, for example, several fitness peaks and valleys (Carlson et al. 2007). Similarly, in lark buntings (Calamospiza melanocorys), the form of selection varied from directional, to stabilizing, to complex shapes for a number of phenotypic traits, with different traits usually experiencing different forms of selection in different years (Chaine & Lyon 2008). In the latter example, selection combined over all years was, however, weakly directional. In perch (Perca fluviatilis; Svanbäck & Persson 2009) and damselflies (Ischnura elegans; Gosden & Svensson 2008), selection varied from directional to disruptive to stabilizing. In salmon (Oncorhynchus nerka), overall selection was consistently directional, although there were subtle nuances in the fitness surface among years (Carlson & Quinn 2007). Similarly, in the brown anole (Anolis sagrei), selection varied among years from directional to disruptive (Calsbeek et al. 2009). Together, these studies suggest that changes in the form of selection among years may be common but such verbal arguments are unsatisfying and highlight the lack of appropriate methods for quantifying variation in the form of selection.

Discussion

Our perusal of the vast literature on selection has provided us with a better understanding of the major features of the temporal dynamics of selection in the wild. This exercise has revealed that selection varies considerably among years (Figs 3 and 4), including differences in strength (Figs 3 and 5), direction (Figs 3 and 7) and likely form. We discuss the implications of these results in the following sections.

The strength of selection varies among years

We found a wide range of variation in the strength of selection among years (Fig. 5). Thus, the seemingly conflicting view points regarding the strength of selection in the wild being weak (Kingsolver et al. 2001) vs. strong (Hereford et al. 2004), are perhaps best viewed in light of annual variation in the strength of selection (e.g. Figs 3 and 5), in addition to how the strength of selection is inferred (e.g. Hereford et al. 2004). The answer is not as simple as whether selection is strong or weak, rather it can depend on when selection is quantified. In other words, our results suggest that most populations may experience infrequent bouts of strong selection tempered with other bouts of weaker selection (e.g. Figs 3 and 5), and there is a tendency for traits experiencing stronger selection to be accompanied by greater temporal variation in the strength of such selection (Fig. 6).

Concerns about the strength of selection have often focused on the fact that very strong selection cannot be sustained in a population for a variety of reasons, and our results provide support that strong selection is apparently rarely sustained. For example, demographically, strong and persistent viability selection can increase the risk of extinction because of the associated high mortality accompanying strong selection (Gomulkiewicz & Holt 1995). Genetically, populations tend to have a limited reservoir of genetic variation for directional selection to act on (but see McGuigan & Sgró 2009), although some forms of selection can increase genetic variation (e.g. Kaeuffer et al. 2006). The strength of observed selection is also sometimes seemingly at odds with indirect estimates of selection based on observed rates of evolutionary change in wild populations (Hendry & Kinnison 1999; Kinnison & Hendry 2001). By back-calculating the net intensity of selection, which is the hypothetical strength of ‘constant’ directional selection per generation needed to produce the extent of observed evolutionary change between generations, Kinnison & Hendry (2001) found that mean estimates of the strength of selection from Kingsolver et al.’s (2001) review were much stronger than their back-calculated estimates, so that observed levels of adaptive evolutionary change should be greater. This is a general feature of studies of evolution over the short run: ‘strong’ selection, ample variation, modest heritabilities, yet no or limited adaptive evolution (Meriläet al. 2001). Our analysis provides quantitative support for what these various authors thought might be happening: selection may be particularly strong at times, however, it is rarely consistent in strength (e.g. Figs 3 and 5).

Another factor potentially limiting adaptive evolution is that measures of selection can be biased by the effects of condition, environment and nutrition (e.g. Price et al. 1988; Schluter et al. 1991; Rausher 1992; Stinchcombe et al. 2002; Kruuk et al. 2003). In brief, environmentally or conditionally induced covariances between traits and fitness can cause biased estimates of selection and thus poor predictions of the extent of adaptive evolution. This effect is relevant here because it has consequences for temporal variability in apparent selection on traits (if condition varies from year to year) and for consistency of the direction of selection (because higher condition is always favoured). Using estimates of breeding values, as opposed to phenotypic values, may help in eliminating this bias (Kruuk et al. 2001; Stinchcombe et al. 2002).

The direction of selection varies among years

Understanding changes in the direction of selection is important because the overall extent of adaptive evolution is ultimately the long-term result of selection (and other evolutionary processes) varying over time. Our analysis reveals that the most common pattern is apparently for linear and nonlinear selection to occur in one direction or the other about half the time (Fig. 7). This suggests that changes in the direction of selection are common, although exceptions certainly exist (e.g. see the proportion of no changes in Fig. 7). If such a pattern reflects what many populations experience, we would expect to see, for example, little overall change, and any directional change should be gradual. Recent studies of younger cases of the fossil record have shown that directional change is often gradual, and in a pattern consistent with our analysis of ‘long-term’ studies. In an impressive 21 500-year time series of stickleback fossils, Hunt et al. (2008) showed that phenotypic evolution in sticklebacks gradually proceeded towards a new optima driven by natural selection, with apparent occasional changes in the direction of trait evolution. Such changes in direction could reflect sampling error; however, in a contemporary analysis of selection on threespine stickleback (Gasterosteus aculeatus), Reimchen & Nosil (2002) showed that changes in the direction of selection among years on some traits does occur. Of course, there is a certain danger in drawing analogies over disparate time scales (fossils over thousands of years vs. contemporary selection; see also Clegg et al. 2008), but overall these patterns mirror each other.

The finding of common changes in the direction of linear selection also supports the idea that most populations are reasonably well-adapted to local environmental conditions (Estes & Arnold 2007; Hereford 2009). Using selection estimates and quantitative genetics models, Estes & Arnold (2007) recently inferred that most adaptive evolution involves climbing a stationary peak in the adaptive landscape, with consistent directional change occurring early on and then wobble around the peak keeping a population near an adaptive optimum. In this case, we would expect regular changes in the direction of linear selection as populations wobble around the adaptive peak (e.g. Fig. 7). Alternatively, much of the variation observed in changes in the direction of selection may be a response to small changes in the environment (e.g. instability in the adaptive peak; Grant & Grant (2002); see also Clegg et al. (2008)). Finding relatively common changes in the sign of quadratic coefficients (Fig. 7) paints a muddled picture though, as one would expect the sign to always be negative (indicative of stabilizing selection) if a population were near an adaptive peak. However, we are less certain of the interpretation of these latter changes, because as noted earlier, the sign of quadratic coefficients does not necessarily reveal the true form of selection.

The extent of temporal variation in the form of selection is poorly understood

We suspect that the form of selection likely varies among years but were unable to address this question in a rigorous quantitative framework. On the one hand, a comparison of quadratic coefficients using the simple regression-based approach for estimating selection allows a quantitative test of nonlinear selection, but likely fails to capture the true form of selection in nature. On the other hand, nonparametric cubic spline analyses (e.g. Schluter 1988) allow a visual assessment of the true form of selection but do not facilitate quantitative comparisons among datasets. We are not the first to note this problem. For example, Blows (2007, and references therein) notes that nonlinear selection is often poorly estimated using the traditional regression approach for a variety of reasons and, thus, advocates the use of canonical analyses because they allow for both the strength and form to be quantified (see also Shaw et al. 2008). We therefore conclude that our understanding of temporal variation in the form of selection is poor, at best. There is clearly much analytical work to be performed in this area before we are better able to understand how the form of selection varies in wild populations.

Why does selection vary with time?

Many studies included in our review discuss the potential causes of temporal variation in selection on traits including variation in predation (e.g. Reimchen & Nosil 2002), pollinator assemblages (e.g. Schemske & Horvitz 1989), density of mate colour morphs (e.g. Gosden & Svensson 2008), density of conspecifics (Calsbeek & Smith 2007; Svanbäck & Persson 2009) and operational sex ratio (e.g. Madsen & Shine 1993), all of which vary temporally. Several studies identified fluctuating climate as an important factor causing selection to vary. Patterns of rainfall, in particular, emerged as a principal cause of temporal variation in selection although its effect was more often indirect. Studies included in our review, for instance, examined the importance of rainfall-mediated variation in flowering synchrony (Domínguez & Dirzo 1995), drought-mediated changes in food supply (Gibbs & Grant 1987; Grant & Grant 2002), drought-mediated changes in habitat structure (Calsbeek et al. 2009), drought-mediated selection on flower spur length (e.g. Maad & Alexandersson 2004) and rainfall-mediated variation in lake water level (Carlson & Quinn 2007). Still other studies considered the importance of fluctuating temperatures as a cause of temporal variation in selection (e.g. McAdam & Boutin 2003).

In general, determining the causes of selection is more difficult than quantifying selection or testing for its statistical significance. Consequently, our understanding of the causes of selection has greatly lagged behind our ability to detect selection, which clearly hinders our ability to predict how fitness landscapes (and thus the strength, direction and form of selection) will shift in response to climate change or other perturbations. A handful of studies included in our review tested for statistical correlates of selection (i.e. using regression or path analysis) and most of these were also related to variation in climate. For example, McAdam & Boutin (2003) report that spring temperatures were negatively correlated with red squirrel (Tamiasciurus hudsonicus) growth rates prior to recruitment across 13 cohorts. Carlson & Quinn (2007) show that across 10 years of varying lake levels, the largest salmon (Oncorhynchus nerka) strand in areas of low water and die at the stream mouth rather than reach the breeding grounds in the stream itself, especially in years of low lake level. Charmantier et al. (2008) demonstrated that the interval between great tit (Parus major) laying day and the timing of the caterpillar emergence is positively correlated with selection on egg-laying date. Two studies focused on systems subject to environmental change but found no evidence of concomitant changes in the strength, direction or form of selection (e.g. Reed et al. 2006; Charmantier et al. 2008).

Studies are needed that examine the relative importance of both abiotic and biotic drivers of selection in the same system. As an example, Coulson et al. (2003) report that neither variation in climate nor variation in conspecific density correlate with selection acting on red deer (Cervus elaphus) because selection operated on different fitness components in each year. Understanding the relative importance of the various drivers of selection within and among years is a challenging but exciting future direction for researchers measuring temporal variation in selection.

Selection on other time scales

Although we have considered the temporal dynamics of selection operating on an annual time scale, selection certainly varies on other time scales as well (i.e. over days, e.g. Blanckenhorn et al. 1999; or seasons, e.g. Hendry et al. 2003). Consequently, it is often observed that selection measured at one point in time may not reflect the strength, direction or form of selection acting across generations, or even among different life stages within generations (e.g. Schluter et al. 1991; Meriläet al. 2001; DiBattista et al. 2007). Hoekstra et al. (2001), for example, showed that the strength of viability selection varied inversely with the length of time over which selection was estimated (i.e. comparisons between days vs. years). Selection may fluctuate between seasons based on shifts in resource use (Benkman & Miller 1996). Selection may also differ among life stages within generations (Price & Grant 1984; Schluter & Smith 1986). In the medium ground finch (Geospiza fortis), for example, smaller birds are selected as juveniles vs. larger birds as adults (Price & Grant 1984). Local selection pressures are also likely to vary over longer time scales as well. For instance, post-Pleistocene rearrangements of local communities of interacting species result in differing selection pressures on plant seed defenses (e.g. Siepielski & Benkman 2007b). Indeed, most ecological communities are prone to temporal reshuffling in community membership (and thus putative selective pressures) owed to periodic changes in the Earth’s orbit on the scale of 10–100 thousand years caused by Milankovitch oscillations (Dynesius & Jansson 2000). Community interactions, particularly those between coevolving parasites and hosts (e.g. Red Queen dynamics), or predators and their prey, are systems in which selection pressures will change as one species undergoes evolutionary change (Thompson 2005) or as ecological outcomes of interactions differ between years (e.g. Thompson & Fernandez 2006). Selection can also occur at the species level over long time scales (i.e. millions of years), so-called ‘species selection’ (Jablonski 2008). In sum, these and other examples indicate that selection can vary with time in many ways. Unfortunately, for most of these temporal scales, data are sparse. Until larger datasets are amassed, our understanding of the dynamics of temporal variation in selection over these and other time scales will remain obscured.

Potential sources of bias

We used graphical analyses to examine potential sources of bias in the temporal selection database, many of which were also highlighted in the earlier review by Kingsolver et al. (2001) and follow-up papers (e.g. Hereford et al. 2004; Hersch & Phillips 2004; Knapczyk & Conner 2007).

First, there could be a publication bias against studies with small sample sizes and evidence of weak selection. To test for evidence of this source of bias, we examined the relationship between selection coefficients and sample sizes (see Kingsolver et al. 2001; Knapczyk & Conner 2007). Overall, we found that the strength of selection seemed invariant to sample sizes and studies with small sample sizes that reported weak selection were well-represented (Fig. S1). This source of bias is also presumably reduced when focusing on replicated estimates of selection because multiple estimates (including weak and non-significant ones) are presented. Nevertheless, we urge researchers to report non-significant results to minimize this potential source of bias.

Second, studies included in our database may have insufficient power to detect statistically significant selection (e.g. Hersch & Phillips 2004). Indeed, because selection is often weak (e.g. Kingsolver et al. 2001; Figs 2 and S1), large sample sizes are often needed to achieve the power necessary to detect statistically significant selection. Although we did not explicitly focus on whether coefficients were statistically significant or not, we note that sample sizes within a year ranged from 4 to 8088 (means across years for datasets, range = 15–6836), suggesting that small sample sizes and low power to detect selection almost certainly plagued some studies included in our review. To examine whether small sample sizes could potentially affect our results, we plotted the relationship between the SD of the absolute values of the selection coefficients and the proportion of positive coefficients in relation to sample size. This graphical analysis suggests that these results are not appreciably influenced by sample size (Figs S2 and S3, respectively). Regardless, we echo the call of Hersch & Phillips (2004) to estimate selection on several hundred individuals, if possible, and to report an estimate of the power to detect statistically significant selection.

Third, variation in the number of temporal replicates among studies could also affect our results. This is especially the case here because most of our estimates come from studies with few temporal replicates (Fig. 1). Overall, as the duration of the study increased, the SD of the absolute values of the selection coefficients tended to decline (Fig. S4). This pattern occurs for two reasons. First, as the duration of the study increases, the chance of detecting rare bouts of strong selection also increases (e.g. Fig. 3), but these more extreme values are effectively masked by the preponderance of weaker average selection when calculating the SD. Second, the number of years included in a study is used as the sample size when calculating the SD, and so the sampling distribution of the SD should be wider when fewer years are used to estimate it. The proportion of positive coefficients was more likely to be zero (indicating all negative coefficients) or one (indicating all positive coefficients) for short duration studies than for long duration studies, suggesting a weak effect of the number of temporal replicates on these results (Fig. S5).

Fourth, our use of the SD of the absolute values of the selection coefficients to address variation in the magnitude of temporal variation in the strength of selection may also have introduced bias. By using the absolute values, this introduces an upward bias in the value of the selection coefficients when the confidence limits of the coefficients overlap with zero (see Hereford et al. 2004). Hereford et al. (2004) showed that relative bias of absolute values of linear selection coefficients increases as a function of the relative error of the selection coefficients. The quantity s/|b| reflects relative error, where s is the standard error of the estimated linear selection coefficient, and b is the estimated value of the selection coefficient. When relative error is greater than 100% (i.e. when the SE is greater than the estimated selection coefficient; see Fig. 1 in Hereford et al. 2004), bias is often large. Much like reported in Hereford et al. (2004), relative error was often greater than 100% for selection coefficients included in the temporal selection database. To explore the consequences of this potential bias, we performed a second analysis in which we included only those estimates where the relative error was ≤ 100%. We found that estimates of the SD of the absolute values of selection coefficients for both linear gradients (median SD = 0.08, = 105) and linear differentials (median SD = 0.09, = 40) were nearly identical compared with the full dataset (cf. Fig. 5). Consequently, we suspect our estimates of the magnitude of temporal variation in the strength of selection are not appreciably biased.

Finally, although not a source of bias per se, random sampling error of selection coefficients will affect the estimates of variation in selection that we have compiled. In other words, there is actual variation in selection on traits among years and there is also a sampling error of the selection coefficients. It is possible to estimate the fraction of the variation in selection on traits among years that is ‘real’ using variance components analysis (Cooper & Hedges 1994), which requires information on SEs of the estimated selection coefficients. Unfortunately, SEs were reported for only 38 of the 89 studies included in the temporal selection database. Despite this, we conducted such an analysis (Appendix S2). Overall, this analysis suggests that the per cent of ‘real’ variation in selection on traits among years was often quite small, but there was considerable variation among studies: linear gradients (mean = 37%, range = 0–99%, = 32 datasets); linear differentials (mean = 26%, range = 0–90%, = 13); quadratic gradients (mean = 32%, range = 0–99%, = 14) and quadratic differentials (mean = 8%, range = 0–71%, = 9). The magnitude of random sampling error is largely dependent on sample size, which, again, highlights the importance of large sample sizes in studies of selection. Finally, we note that the aformentioned analyses are only possible if SEs are presented together with selection coefficients, and we urge researchers to do so, thereby allowing future studies to completely address this issue.

For all of the aforementioned reasons, we urge readers to be mindful that the database we have compiled includes various sources of bias and undoubtedly includes both real variation and sampling error.

Conclusion

In summary, our review suggests that a better understanding of adaptive evolutionary dynamics will require the inclusion of the temporal dynamics of selection and continued gathering of long-term datasets. Although short-term studies are informative and tell us that selection operates, they do not allow for a long-term perspective of anticipated evolutionary change, as evolution is very much unpredictable because environments are often unstable (e.g. Grant & Grant 2002). It is this instability of environments, genetic variation in traits and temporal variation in selection that continue to provide the raw material underlying the tempo of adaptive evolutionary change.

Acknowledgements

The authors thank C. Benkman, R. Calsbeek, R. Cox, J. Evans, A. Hendry, R. Irwin, L. Kruuk, T. Lenormand, M. McPeek, T. Parchman and S. Diamond for critical, helpful and thoughtful comments on earlier versions of this paper. Joel Kingsolver and Dolph Schluter deserve special thanks for many fine suggestions and additions. They are also extremely grateful to several authors who graciously provided the datasets and in some cases reanalysed the data. A.M. Siepielski was supported by the NSF (DEB-0515735, DEB-0714782), awarded to C. Benkman and M. McPeek, respectively. J.D. DiBattista was supported by an NSERC post-graduate fellowship. S.M. Carlson was supported by NSF (DBI-0630626) and U.C. Berkeley.

Ancillary