Regression analyses are central to characterization of the form and strength of natural selection in nature. Two common analyses that are currently used to characterize selection are (1) least squares–based approximation of the individual relative fitness surface for the purpose of obtaining quantitatively useful selection gradients, and (2) spline-based estimation of (absolute) fitness functions to obtain flexible inference of the shape of functions by which fitness and phenotype are related. These two sets of methodologies are often implemented in parallel to provide complementary inferences of the form of natural selection. We unify these two analyses, providing a method whereby selection gradients can be obtained for a given observed distribution of phenotype and characterization of a function relating phenotype to fitness. The method allows quantitatively useful selection gradients to be obtained from analyses of selection that adequately model nonnormal distributions of fitness, and provides unification of the two previously separate regression-based fitness analyses. We demonstrate the method by calculating directional and quadratic selection gradients associated with a smooth regression-based generalized additive model of the relationship between neonatal survival and the phenotypic traits of gestation length and birth mass in humans.

Determining the form of relationships between traits and fitness is central to many evolutionary studies. Under the assumption that a trait has a causative effect on fitness, the form of the relationship between a trait and fitness gives insight into the form of natural selection. Given this central importance in evolutionary biology, it is not surprising that a great many techniques for quantifying and characterizing the form of natural selection have been developed (Robertson 1966; Lande and Arnold 1983; Arnold and Wade 1984; Schluter 1988; Janzen and Stern 1998; Shaw et al. 2008), and also that long-running discussions exist regarding the merits and interpretation of different approaches (see, e.g., Endler 1986; Schluter 1988; Brodie et al. 1995).

Regression-based approaches have been central to the analysis of natural selection, and two regression-based techniques are particularly important. Following Pearson (1903), Lande and Arnold (1983) presented a means by which selection coefficients (gradients and differentials) can be obtained using parametric (multiple) regression analysis. In particular, their results lead to the simple regression equation

display math(1)

where w is relative fitness, that is, math formula, z are phenotypic trait values, a is an intercept, j and k index traits, β are directional selection gradients, γ are quadratic selection gradients, and e are residual errors (Walsh and Lynch 2012a). Assuming multivariate normality of math formula before selection the estimates of β and γ that can be obtained via equation (1) are directly applicable in a quantitative genetic framework for the prediction of evolution, and in standardized forms are easily comparable across studies. However, these estimators of the form of selection have limited interpretation in terms of assessing the form of selection (e.g., whether selection is predominantly directional, stabilizing, or disruptive). Essentially, although this approach yields very useful estimates of selection coefficients based on approximation of the (relative) fitness function, the shape approximated by the quadratic function itself should not be used to make quantitative inference of the form of selection. For example, positive or negative values of γ should not be interpreted as evidence for disruptive or stabilizing selection, respectively, even if they represent minima or maxima of the quadratic approximation within the range of observed phenotypes in a population (Schluter 1988). The main alternative to parametric regression has been the application of spline-based semiparametric regression analyses as first advocated by Schluter (1988). These analyses are desirable because they lead to inferences of the form of selection that make few a priori assumptions. In addition, these models can be fitted with much more sensitivity to nonnormal distributions of fitness residuals than the linear models used in the Lande–Arnold approach. Thus far these spline-based techniques have not replaced the parametric analysis because biologists cannot relate them to evolutionary parameters such as selection gradients, or compare them among studies in a standardized form.

As statistical models of relationships between phenotype and fitness, the least squares regression analyses used to generate selection gradient estimates using equation (1) is undesirable for statistical hypothesis testing. Residuals of fitness, or fitness components, are highly nonnormal. For example, viability over particular episodes of selection is typically a binary trait, and measures of reproductive success are generally better modeled as Poisson variables. Consequently, it has become common practice to use generalized linear models for statistically testing (null) hypotheses about the relationship between fitness and phenotype. However, because generalized linear models evaluate linear relationships between regression coefficients on imaginary but useful latent scales, they cannot be directly used for quantitative estimation of selection gradients. Also, it is very difficult to test hypotheses about the shape of phenotype-fitness functions. This is because link functions (i.e., typically logistic for binary data, logarithmic for Poisson data) impose curvature on the data scale when only linear terms are modeled on the latent scale.

Here we develop and illustrate an approach for obtaining quantitatively informative and interpretable selection gradients from any statistical characterizations of fitness functions, including spline-based selection analyses. The approach uses generalized models, allowing nonnormal distributions of fitness (components) to be explicitly modeled, while still yielding the quantitatively interpretable selection gradients that are typically obtained via least squares–based regression of relative fitness on phenotype. Given sufficient data, the approach is applicable using generalized additive models, that is, models with smooth regression terms and generalized response distributions, allowing flexible models of phenotype-fitness functions to be characterized in terms of explicit quantitative genetic metrics of selection.


Lande and Arnold (1983) showed how directional and quadratic selection gradients, that is, the first-order and second-order partial derivatives, respectively, of relative fitness, w, with respect to phenotype z, averaged over the multivariate normal phenotypic distribution observed in a population, math formula, are quantitatively related to evolutionary change. Specifically, for directional selection gradients and multivariate normal phenotype,

display math

and for quadratic selection gradients,

display math

Alternatively, the selection gradients can be written as

display math(2)

(as in Lande 1982, where population growth rate λ is the measure of population mean absolute fitness; see also Walsh and Lynch 2012b), and

display math(3)

Given any function math formula that can predict an individual's expected fitness, and a representative sample of the distribution of phenotype in a population, the population's expected mean absolute fitness can be obtained from prediction and averaging of individual expected fitness, that is, simply math formula. Consequently, all that is needed to calculate selection gradients using equations (2) and (3) using general functions relating individual fitness to phenotype are first and second (partial) derivatives of absolute fitness with respect to (multivariate) phenotype. With the widespread use of computers, powerful and precise numerical methods (e.g., Richardson 1911) for obtaining these derivatives are relatively easily implemented. Numerical methods could be used to obtain selection gradients using any of the four expressions above, but we focus on the latter two formulae to reduce the need for computationally expensive numerical integration. The importance of the first two expressions is that they are central to the derivation of the relationships between selection gradients and coefficients of least squares regressions (i.e., via eq. (1); Lande and Arnold 1983), which provided a highly desirable analysis before the general availability of flexible statistical computing packages.

Thus, arbitrary functions describing the dependence of absolute fitness on phenotype, that is, math formula, can be used for the estimation of selection gradients. If math formula is approximated by least squares regression, the analysis will be essentially identical to the least squares–based multiple regression of relative fitness on phenotype advocated by Lande and Arnold (1983), and widely applied since (e.g., Endler 1986; Kingsolver et al. 2001; Morrissey and Hadfield 2012). Alternatively, any parametric or semiparametric function for which predictions can be made of expected absolute fitness for any given phenotype will be applicable. As such, spline-based semiparametric regression such as those popularized by Schluter (1988) can be used for estimation of math formula and subsequent calculation of selection gradients.

To maintain generality, we suggest characterization of uncertainty of selection gradients via bootstrapping. The R package gsg provides standard errors using the parametric bootstrap as the default for Poisson and binomial fitness responses, and also provides optional algorithms based on case bootstrapping and simulation from the normal approximation to the posterior solution to the parameters of the fitness function. Standard errors are calculated from the standard deviation of the bootstrapped estimates (or estimates from the posterior distribution of the model parameters), and P-values are calculated based on the proportion of estimates above or below an a priori null value, typically zero. By default, smoothing parameters are not refitted for each bootstrap replicate (following Schluter 1988), but an option for reestimation of the smoothing parameter for each bootstrap replicate is available. All bootstrap-based inference of statistical uncertainty should be regarded as a pragmatic, rather than exact, approach; and in particular, bootstrap-based inferences about selection gradients derived from smooth function-based inferences about fitness functions should be regarded as experimental, as by Schluter (1988). In addition, a permutation algorithm is available where fitness records are randomized across phenotypes, yielding P-values, and approximations of the joint null distributions of β and γ, facilitating application of Reynolds et al. (2010)'s method for statistical hypothesis tests of diagonalized quadratic selection gradients.

Application and Discussion

We demonstrate calculation of quantitatively useful selection gradients from estimation of a fitness function generated with a semiparametric smoothing algorithm, using Karn and Penrose's (1951) data on neonatal survival as a function of birth mass and gestation length. The data are composed of 7037 records of live male births. The specific dataset was transcribed by Dolph Schluter for initial demonstrations of semiparametric estimation of fitness functions (Schluter, 1988; Schluter and Nychka, 1994). Gestation length is recorded in 5-day increments, and birth mass is recorded in 0.23 kg (0.5 pound) increments.

We estimated the function relating neonatal fitness, that is, survival recorded as zero or one, as a function of gestation length and birth mass using a tensor product smoother-based generalized additive model (GAM), using the default settings in the function gam() in the package MGCV (Wood 2006a). The method we present is general insofar as the particular method for characterizing the phenotype-(absolute) fitness map is not a critical feature of the analysis. We use the tensor product smooth in this case because (a) we deem the dataset to be large enough to support application of a bivariate smooth function, and (b) the tensor product smooth is desirable because it is scale invariant (Wood 2006b). The fitness function relating neonatal survival to birth mass and gestation length as estimated using the tensor product smooth is very similar to the function reported by Schluter and Nychka (1994); their figure 4 and the current Figure 1.

Figure 1.

The fitted tensor product smooth-based estimate of the function relating probability of neonatal survival to birth mass and gestation length for infant human males. Note that the phenotypic data are available with 0.23 kg (half pound) precision for birth mass, and 5-day precision for gestation length. However, to avoid superimposition of the datapoints to provide a more complete depiction of the distribution of these traits, small random deviates were added to all datapoints for the purpose of plotting only.

We applied equations (2) and (3) to the fitted GAM relating neonatal survival to birth mass and gestation length to obtain directional and quadratic selection gradients. For brevity, we will refer to these estimates as “derivative-based estimates.” We evaluated statistical uncertainty in the estimated selection gradients using the case-bootstrapping algorithm described above. Application of equations (2) and (3) and bootstrapping of the analysis was applied using the function gam.gradients() in the R package gsg, which we provide through the normal system of distributing R packages through http://cran.r-project.org/. For comparison, we also calculated selection gradients using equation (1), and calculated standard errors and P-values based both on standard t-statistics and by case bootstrapping. We detected significant directional selection of birth mass, but little or no directional selection of gestation length (Table 1a). We detected highly significant negative quadratic selection of birth mass and gestation length via neonatal survival, but no evidence of correlational selection (Table 1a).

Table 1. Estimated standardized selection gradients for body mass and gestation length via variation in neonatal viability in human infant males. In (a), estimates are made directly from a generalized additive model-based characterization of the relationship between absolute fitness (neonatal viability) and the two traits. In (b), selection gradients are estimated using the classical least squares–based regression analysis proposed by Lande and Arnold (1983). In both (a) and (b), standard errors and P-values are obtained from bootstrapping procedures, except those values in brackets which are based on t-tests
(a) GAM-based selection gradients
(b) Least squares–based selection gradients
βbm0.02920.0040 (0.0029)0.000 (0.000)
βgest0.00450.0036 (0.0030)0.200 (0.138)
γbm–0.05990.0060 (0.0040)0.000 (0.000)
γgest–0.01710.0050 (0.0033)0.000 (0.000)
γbm,gest–0.01020.0042 (0.0028)0.020 (0.000)

Broadly, selection gradients estimated via least squares multiple regression of relative fitness on bivariate phenotype are similar to the derivative-based estimates (Table 1). All least squares–based estimated selection gradients were of the same direction as the derivative-based estimates, and relative to the distributions of estimated selection gradients reported in the literature (Kingsolver et al. 2001), the point estimates generated by the two methods differ little. Differences in point estimates likely represent different effects of nonnormality of phenotype on the two estimation procedures. The difference of statistical significance of the correlational selection gradient (Table 1) seems relatively unimportant, given the very large sample size and very modest gradient estimates by either technique.

Our proposal that quantitatively interpretable selection selection gradients can be obtained by calculating derivatives of the mean (relative) fitness surface with respect to mean phenotype, based on general estimates of fitness functions, is applicable in any situation where classical least squares regression-based estimation of selection gradients is appropriate. However, the estimation of a fitness function based on semiparametric smooth functions will often be impractical because of limited sample sizes in many studies of natural populations. However, the general method we propose is equally applicable when less flexible models of the fitness function are more appropriate. For example, we might have characterized the fitness function using a fully parametric generalized linear model such as math formula, where math formula and math formula denote birth mass and gestation length. Given an intermediate amount of data, a bivariate smooth function of birth mass and gestation times might be inappropriate, whereas separate smooth functions of each trait might be feasible to fit. For example, with an intermediate amount of data, the fitness function might most appropriately be modeled as math formula, where s() denotes a univariate smooth function such as a cubic spline. In the case of the example analysis of selection of human birth mass and gestation length, both of these simpler alternative models yield very similar selection gradient estimates (not shown) to those we report based on the estimation of the fitness function with the bivariate tensor product smoother.

The rather dramatic fitness function, that is, an expected reduction of fitness of over 80% for individuals with the smallest birth mass (Fig. 1), corresponds to very modest selection gradients. The corresponding directional selection gradients, although highly statistically significant in the case of birth mass, are much smaller than the average reported estimated selection gradients in the literature (e.g., Kingsolver et al. 2001). It is worth noting that to some extent, the relatively large average absolute values of other estimated selection gradients is in part due to much smaller average sample sizes, and thus inflation of the apparent average absolute selection gradient due to sampling error. Biases affecting the perceived magnitude of selection gradients based on literature reports aside, the selection gradients we report here are indeed very modest. Although dramatic, the decline in fitness at small birth mass influences only a very small portion of the population, and so the effect on the average slope of the fitness function is relatively small. Thinking about the consequences of individual variation in terms of population mean fitness, that is, in line with equations (2) and (3), yields this insight. To further this perspective, we can consider the fitness landscapes, or the relationship between population mean fitness and population mean phenotype. Although the term “landscape” (“fitness,” “adaptive,” or “selective” landscapes) has been used loosely, we reiterate that in the context of selection of a quantitative trait, it should refer to population mean fitness as a function of population mean phenotype, or potentially higher moments of the phenotypic distribution, also a property of the population, not of individuals. The interpretive benefits of distinguishing between individual “fitness functions,” or “surfaces” and population-level “landscapes” is outlined by Arnold (2003). This use are consistent with the original formulation of the concept, with fitness depicted as a function of gene frequencies, that is, a property of a population (Wright 1932). Representations of the fitness landscape for human neonatal viability as a function of birth mass and gestation length are shown in Figure 2. Within a broad range of the currently observed population mean phenotype, population mean fitness varies little. A change of population mean birth mass of more than a standard deviation of the currently observed distribution would have to occur to reduce population mean fitness by as little as 5%. Because a representation of a fitness landscape integrates information from the fitness function with information about distributions of phenotype, it is potentially a more informative way to present a visualization of the magnitude of natural selection. The R package gsg also contains a function for the estimation of fitness landscapes based on general models of fitness functions.

Figure 2.

Representations of the fitness landscape for human neonatal survival as a function of population mean phenotype values for birth mass and gestation length. (a) shows the expected population mean fitness as a function of hypothetical population mean values of the two phenotypic traits ranging from one standard deviation below, to one standard deviation above, the observed means. (b) and (c) show population mean fitness as function of mean birth mass and mean gestation length separately, each evaluated holding the other constant at the observed population mean value. Dashed lines in (b) and (c) represent bootstrap-based 50% prediction intervals, which are roughly interpretable as standard errors.

Our approach may be viewed as a generalization of a method proposed by Janzen and Stern (1998). Their method allows quantitatively useful directional selection gradients to be calculated from models based on logistic regression analysis of trait–(absolute) fitness relationships where the logistic regression models are composed only of intercepts and linear terms. Our method allows arbitrary curvature of the fitness function and arbitrary error structures, that is, Poisson or exponential distributions could be used to model fitness via reproductive success or longevity, and then relate these fitness functions to quantitatively useful selection gradients, and indeed the software we provide accommodates this. We note that in several instances latent-scale regression coefficients of generalized linear models, that are not quantitatively interpretable as selection gradients, that is, that are not quantitatively useful for predicting evolution when used with the breeder's equation, have been called “selection gradients” in the literature. The approach we suggest is compatible with another recent development in the analysis of natural selection, the advent of “aster models” (Shaw et al. 2008; Shaw and Geyer 2010). When applied to selection, the primary general goal of aster models is to allow investigation of phenotype-fitness relationships via regression analyses that make statistically defensible assumptions about the distribution of total fitness for organisms with complex life cycles. Aster model-based selection analysis primarily focuses on parametric estimation of fitness functions, not selection coefficients. Given an Aster model–based inference of a fitness function, the method outlined here could be applied to obtain selection gradients.

The parameter estimates provided by the method proposed here have the same biological interpretation as selection gradients estimated by least squares regression following Lande and Arnold (1983), and are equivalent when phenotype is multivariate normal, and if the estimated fitness function adequately reflects the true fitness function. If the phenotype is not multivariate normal, our method will provide average slopes and curvatures of the fitness function, given the observed distribution of phenotype; the classical analysis will not. Selection gradients obtained from either approach relate quantitatively to expected evolution (following Lande 1979) assuming that breeding values are multivariate normal. The proposed method does not necessarily or entirely supplant Lande and Arnold (1983)'s analysis. Point estimates from the traditional analysis are unbiased, and also because selection gradients are simple regression parameters (not derived parameters, as in our analysis) the Lande–Arnold regression can potentially be incorporated directly into more general models of selection. For example, equation (1) may be more easily and directly integrated into hierarchical models (Royle and Dorazio 2008) of ecological covariates of selection, allowing flexible and comprehensive analyses of the form suggested by Wade and Kalisz (1990).


The important features of the methods proposed here are: (1) they are as generally applicable as previous methods, (2) they allow statistical inference to be made based on statistical models that accommodate reasonable distributions of fitness component response variables, providing more defensible inference of statistical uncertainty, (3) they provide all of the inferences of three different and currently commonly applied regression-based analyses of selection in a single statistical approach. Thus, they have the same useful features; specifically, they are compatible with the breeder's equation (or Lande 1979's formulation in terms of selection gradients, math formula), assuming multivariate normality of phenotypes and breeding values, and also in that they are comparable across studies when calculated in units of standard deviations. However, using methods that are now easily implemented with personal computers, we can now obtain these useful estimates in a more general way, relaxing assumptions about distributions of both fitness and phenotype, with more statistically defensible hypothesis testing, and in analyses that are unified with other regression-based approaches for characterizing natural selection. Thus, with the statistical functions that we provide for use with the free and widely used statistical platform R, this updated approach should be generally applicable in studies of selection in nature.


We thank D. Schluter for comments and for supplying the transcribed dataset. B. Walsh, S. Wood, C. Geyer, K. Johnson, and an anonymous reviewer also provided valuable comments.