Individual heterogeneity in life histories and eco-evolutionary dynamics

Individual heterogeneity in life history shapes eco-evolutionary processes, and unobserved heterogeneity can affect demographic outputs characterising life history and population dynamical properties. Demographic frameworks like matrix models or integral projection models represent powerful approaches to disentangle mechanisms linking individual life histories and population-level processes. Recent developments have provided important steps towards their application to study eco-evolutionary dynamics, but so far individual heterogeneity has largely been ignored. Here, we present a general demographic framework that incorporates individual heterogeneity in a flexible way, by separating static and dynamic traits (discrete or continuous). First, we apply the framework to derive the consequences of ignoring heterogeneity for a range of widely used demographic outputs. A general conclusion is that besides the long-term growth rate lambda, all parameters can be affected. Second, we discuss how the framework can help advance current demographic models of eco-evolutionary dynamics, by incorporating individual heterogeneity. For both applications numerical examples are provided, including an empirical example for pike. For instance, we demonstrate that predicted demographic responses to climate warming can be reversed by increased heritability. We discuss how applications of this demographic framework incorporating individual heterogeneity can help answer key biological questions that require a detailed understanding of eco-evolutionary dynamics.

Appendix S1: Details of model framework In this appendix we present in more detail the conceptual model framework including a dynamic trait x and a static trait y, as well as external variables (e.g. temperature or food level) denoted as θ. The framework is based on IPM/matrix models (Caswell, 2001;Easterling et al., 2000;Ellner & Rees, 2007). We present the model using IPM notation for continuous traits, but it can easily be converted for application to discrete traits as well (matrix model, replacing integrals by sums). We begin with a deterministic model, which is later extended to include demographic or environmental stochasticity. Key life history variables and population parameters are calculated using methods from matrix models (Caswell, 2001) and IPMs (Easterling et al., 2000;Ellner & Rees, 2006). A summary of the model parameters and variables is given in table S1.1.
The key features of this model compared to a "standard" IPM/matrix model is the separation between two kinds of traits: static traits are permanent over an individual's lifetime, while dynamic traits change over the lifetime (the most important examples of dynamic traits are age and size). For simplicity we present a model with just one dynamic and one static trait, but the framework can also include cases of more than one of each type. Including a static and dynamic trait requires the definition of a joint distribution for offspring values of the traits, and applying a Dirac delta function (or Kronecker delta in the case of discrete traits) to ensure that the static trait remains constant over the lifetime. . Ω x , Ω y Domains of the traits x and y, respectively. θ External variable(s)/ environment, e.g. temperature. n(x, y) Joint population density distribution of the traits (not a probability distribution). N = Ωx Ωy n(x, y)dydx Total population size. Vital rates s(x, y, θ) Survival probability. g(x ; x, y, θ) Distribution of x . δ(y) The Dirac delta function. b (x, y, θ) Fecundity (offspring number). f (x , y ; x, y, θ) Joint trait distribution for offspring traits. Kernels K = K(x , y ; x, y, θ) Projection kernel (function of all vital rate functions). S = S(x , y ; x, y, θ) Survival kernel. B = B(x , y ; x, y, θ) Reproduction kernel. I Identity kernel, same dimension as K (analogue to identity matrix). R = B(I − S) −1 Net reproduction kernel. Demographic outputs λ Expected long-term population growth rate (average fitness), the dominant eigenvalue of K. u(x, y) Joint stable structure of x and y, Ωx Ωy u(x, y)dydx = 1, right eigenfunction associated with λ. v (x, y) Reproductive values, Ωx Ωy v(x, y)u(x, y)dydx = 1, left eigenfunction associated with λ. V = Ωx Ωy v(x, y)n(x, y)dydx Total reproductive value. R 0 Net reproductive rate. Dominant eigenvalue of R. G 1 = ln R 0 / ln λ Generation time measured as time for population to grow by a factor R 0 . G 2 = λ v T Bu Generation time measured as mean age of mothers at stable distribution. σ 2 e Environmental variance, Var(ln V t+1 − ln V t |V t )λ 2 = σ 2 e + σ 2 d /V t . σ 2 d Demographic variance.

S1-1. Vital rates and projection function
The population density, describing the expected number of individuals having traits x and y in the population, is given by a joint distribution n(x, y) (not to be confused with a probability distribution), and the total population size is given by N = Ωy Ωx n(x, y)dxdy (where Ω x and Ω y are the sample spaces of x and y, respectively). The change in n(x, y) over time is described by the projection kernel, which is a function of all the vital rates.
For a given value of the environment θ, an individual with current trait values x and y will survive until next year with a probability s(x, y, θ). Next year it will (if it survived) receive a new dynamic trait value x according to a distribution g(x ; x, y, θ). Thus, the static trait y may also influence the transitions of the dynamic trait throughout the entire lifetime of the individual.
The individual also produces a number of offspring with expectation b(x, y, θ), that will enter the population next year. Each offspring will have some trait combination [x , y ], which is determined by the distribution f (x , y ; x, y, θ). This is a joint density distribution for x and y , recognizing that the traits will often be correlated at birth. In general this distribution may also include heritability (i.e., part of the variation in a trait that is genetically transmitted to offspring; Danchin, 2013). The definition of f (x , y ; x, y, θ) is flexible and enables modeling of a range of mechanisms for inheritance (for a recent discussion of different mechanisms, see Danchin, 2013), such as additive genetic inheritance (through y), parental effects including epigenetic effects (through x and y and their interaction), or cohort effects (through θ).
The projection kernel is a function of the vital rates describing how the joint distribution of x and y changes from one time step to the next, and is given by K(x , y ; x, y, θ) = S(x , y ; x, y, θ) + B(x , y ; x, y, θ) (S1.1) = s(x, y, θ)g(x ; x, y, θ)δ(y − y) + b(x, y, θ)f (x , y ; x, y, θ), where δ(y − y) is the Dirac delta function, described in more detail below. This function ensures that the static trait y does not change over the lifetime of an individual. The projection kernel consists of two main parts, a survival kernel S(x , y ; x, y, θ) and a reproduction kernel B(x , y ; x, y, θ). For a simpler notation, we will denote the kernels using K (projection kernel), S (survival kernel) B (reproduction kernel).
The Dirac delta function is not strictly a function, but can be loosely defined as A reduced model treating the static trait as a parameter It can sometimes be useful to consider a reduced version of the model, where the static trait y is treated as a parameter influencing the life history rather than a state variable. This corresponds to a model assuming the entire population has the same value of y which is perfectly inherited, as for instance in ESS (evolutionary stable strategy) analyses where the trait is varied to find a strategy (i.e. life history) that cannot be "invaded" by other strategies. The projection kernel of the reduced model is given by The assumption that all individuals have the same value of y may be more reasonable for studies comparing macro-evolutionary (long-term) patterns, such as ESS analyses, than for micro-evolutionary (i.e. short term, eco-evolutionary) processes where variation in y can be an important factor. However, due to the reduced dimensionality such a model is computationally easier to explore than the full model including variation in y, and may therefore be useful at least for initial explorations. S1-2. Demographic outputs calculated from the deterministic model Long-term growth rate, stable trait structure, and reproductive value The projection function K can be analyzed using standard matrix methods (after discretization; Caswell, 2001;Ellner & Rees, 2006;Rees et al., 2014) to obtain the long-term population growth rate λ (the dominant eigenvalue of K), the stable structure u(x, y) (right eigenfunction) and reproductive value function v(x, y) (left eigenfunction), scaled so that Ωy Ωx u(x, y)dxdy = 1 and Ωy Ωx v(x, y)u(x, y)dxdy = 1. We generally assume that all assumptions required for a unique λ to exist are met (Caswell, 2001;Ellner & Rees, 2006).

Net reproductive rate
The net reproductive rate R 0 is the expected lifetime reproduction of an individual, and a measure of the generation-to-generation population growth rate (Caswell, 2001). When offspring are born with different trait values (of x and/or y), the expected lifetime reproduction is conditional on the traits at birth. The generation to generation population growth is described by the kernel (Ellner & Rees, 2006), where B is the reproduction kernel as defined above, S is the survival/transition kernel, and I is the identity kernel (analogue to the identity matrix). Additional requirements need to be fulfilled for the existence of a unique dominant eigenvalue R 0 for this kernel, which defines the net reproductive rate (Section 5.3.4 in Caswell, 2001).

Generation time
Generation time is an important property used to characterize life histories.
Several processes also scale with generation time, both evolutionary (e.g. the fixation probability of a new allele) and ecological (e.g. extinction).
Generation time is intuitively understood as the time between generations, but for structured populations with overlapping generations the definition is not straightforward. Different measures have been used, and they usually produce slightly different values, depending on the life history (Caswell, 2001;Bienvenu et al., 2013). We consider two of these measures. The first is the time it takes for the population to increase by a factor R 0 , G 1 = ln R 0 / ln λ, while the second is the mean age of mothers in a population at the stable structure. In the heterogeneous population this is given by (Bienvenu et al., 2013)

S1-3. Including stochasticity
Demographic and environmental stochasticity in unstructured populations Natural populations will show random fluctuations due to stochasticity in the environment as well as inherent randomness in the individual events of survival, reproduction and trait transitions (in addition to observation errors, which we ignore here). Environmental stochasticity represents fluctuations in the annual growth rate arising from environmental effects that are common for all individuals, while demographic stochasticity arises from random variation among individuals in the events of survival, reproduction, and state transitions (May, 1973b;Lande et al., 2003).
For an unstructured density-independent population process (assuming no autocorrelation in the environment), the population size N follows a Markovian process, where the variance in population growth can be decomposed as (Engen et al., 1998) where ∆N = N t+1 − N t . The constants σ 2 e and σ 2 d are denoted the environmental variance and demographic variance, respectively, assuming no demographic covariance between individuals. Using a similar approach of Engen et al. (2007) where the last sum is a Taylor expansion of the term ln(1 + ε/(λN t )). Including terms up to the second order, we get the expression The long-term (stochastic) growth rate is the expectation of ln N t+1 − ln N t (conditional on the current population size N t ), and is given by Thus, unless the demographic variance is zero the stochastic growth rate is a function of N t and increases with increasing population size (Lande et al., 2003).
Assuming the expectation E[ε 4 ] is so small that it can be ignored, the variance of the population growth increment on log scale is given by

Stochasticity in structured populations
Several studies contributed to develop the theory of stochastic dynamics in structured populations, both considering demographic stochasticity (Bartlett, 1960) and environmental stochasticity (Lewontin & Cohen, 1969;May, 1973a;Turelli, 1977;Tuljapurkar, 1990). Later, Engen et al. (2005) combined the two types of stochasticity in an age-structured model, and demonstrated that the three parameters λ, σ 2 d and σ 2 e could be used to define an accurate diffusion approximation (a continuous time approximation of the discrete time population process) of the population dynamics, extending the result of Lande & Orzack (1988) to include demographic stochasticity. This demonstrated that these three parameters alone contain the necessary information to describe the population dynamics even with complex life histories. A number of results are also available from diffusion models that enables prediction of extinction risk and characterization of time to extinction (Karlin & Taylor, 1981). By a first order approximation they also showed that the stochastic growth rate takes the same form as in the unstructured case, extending the small-noise approximation of Tuljapurkar (1982) to include demographic stochasticity.
Describing the stochastic dynamics of structured populations is complicated by the fact that the population size will be autocorrelated due to transient fluctuations in the trait distribution, so that N is no longer Markovian as in the unstructured case. Engen et al. (2007) showed how the dynamics could be described by considering the total reproductive value V , calculated using the reproductive values of the mean environment. This quantity will follow the process of N , and the dynamics are approximately Markovian, just as in the unstructured case. Thus, the reproductive values work as a filter removing the transient fluctuations in the trait distribution, and the environmental and demographic variance can be calculated from this process. The above equations for the unstructured case are basically the same in the structured case, replacing N by V (Engen et al., 2007).

Stochasticity in IPMs
Both environmental stochasticity (Ellner & Rees, 2007;Rees & Ellner, 2009;Vindenes et al., 2011) and demographic stochasticity (Vindenes et al., 2011(Vindenes et al., , 2012  The effects of environmental stochasticity can be estimated through the regression models (e.g. mixed models) for the vital rates (Ellner & Rees, 2007), via specific environmental drivers (explained variation and covariation), via random year effects (unexplained variation and covariation), or both (e.g. temperature and random year effects; Vindenes et al., 2014). An important question to consider is at which scale the vital rates are modeled (e.g., the log scale, logit scale, square root squale, or absolute scale), as the environmental random effects are often assumed to be normally distributed on that scale. This normality assumption can be checked, for instance with diagnostic plots from the regressions or specific tests. If these show that the normality assumption is not met, another scale transformation may be needed.
Demographic stochasticity arises from inherent stochasticity in the processes of survival, transition and offspring production. For some of these processes the variance is always given by the mean. For instance survival is always a binomial process and the demographic variance in survival is entirely described by the survival probability. The variance in offspring number can be a function of the mean, but the distribution of offspring number will vary depending on species and population. If individual data on reproduction are lacking, the best we can do is to make some assumption on the nature of the distribution of offspring number, based on knowledge from other populations or from general knowledge of the life history of the species. Demographic variance in trait transitions largely depend on the type of trait. In matrix models trait transitions always follow a multinomial process, but calculating the probability of each possible event is not always straightforward. In IPMs, transitions in size (growth) are often assumed to follow a normal distribution, which may not always be biologically reasonable (although such a model may still provide an adequate description of the average dynamics). A normal distribution is not ideal as in theory it could produce negative body sizes, as well as impose shrinking when this is not necessarily biologically reasonable. Thus, defining a reasonable distribution for variance in dynamic transitions is not straightforward, and should be done with some care.

Demographic and environmental variance in IPMs
The demographic and environmental variance partition the variance in population growth, but can also be defined at the individual level (Engen et al., 2009). The demographic variance σ 2 d can be defined as the expected variance in an individual contribution to next year's total reproductive value (expectation with respect to environment, and variance with respect to individual contributions), while the environmental variance σ 2 e can be defined as the variance of the expected contribution (variance with respect to environment, and expectation with respect to individual contributions Engen et al., 2009;Vindenes et al., 2012). Ignoring environmental stochasticity (to simplify notation), the demographic variance of our model is given by (Vindenes et al., 2011) is the covariance between survival and reproduction, µ V S (x, y) and σ 2 V S (x, y) are the expectation and variance, respectively, of next year's reproductive value for an individual with states x and y, and µ V B (x, y) and are the corresponding parameters for an offspring. These are given by Thus, the reproductive value plays an important role in the definition of the demographic variance. When demographic stochasticity is ignored (large populations), the environmental variance as a function of the individual vital rates is given by (Vindenes et al., 2011) σ 2 e ≈ Ωy Ωx Ωy Ωx where w(x 1 , y 1 ) is the contribution of an individual with trait value [x 1 , y 1 ] to next year's total reproductive value, while w(x 2 , y 2 ) is the contribution from an individual of traits [x 2 , y 2 ]. Estimating this covariance across all combinations of trait values is generally difficult. Instead, we estimate σ 2 e from simulation of the process of V , using (Engen et al., 2007;Vindenes et al., 2011)

Extinction risk from diffusion approximation
Based on the diffusion approximation one can also calculate the risk of ultimate extinction U starting from a population size N 0 (Karlin & Taylor, 1981). For r s < 1 this probability is 1, but for r s > 1 the probability of extinction at N = 1 is given by (Engen et al., 2005) starting from a population of size N 0 . When both demographic and environmental stochasticity is included there is no simple expression for the distribution of time to extinction, T E . When demographic stochasticity is ignored, the time to extinction follows an inverse Gaussian distribution (Karlin & Taylor, 1981;Engen et al., 2005). If environmental stochasticity is ignored (only demographic stochasticity) the cumulative distribution of time to extinction at N = 0, conditional on starting at population size N 0 , is given by (Cox & Miller, 1965;Engen et al., 2005) .

S1-4. Notes on numerical implementation
All calculations were done with the software R (R Development Core Team,

2013). We have provided R code for all our examples as supplementary material,
where additional plots of various parts of the model are also provided. This code also demonstrates how the IPM can be discretized for numerical calculations.
Detailed procedures for general numerical implementation of IPMs are also described elsewhere (for instance Merow et al., 2014;Rees et al., 2014).
The discretization of an IPM is generally done after the vital rates have been estimated using regression, turning the model into a large matrix model before calculation of model parameters (Ellner & Rees, 2006). The dimension of the model (the number of "mesh points") determines the accuracy of the results, but computation efficiency can be reduced if the dimensions of the matrix is very large. This can lead to long computing times in particular if multiple matrix operations are involved, for instance in stochastic models. For initial exploration of the models we recommend using a lower dimension, and then for the final calculations to use a higher dimension. The dimension should be high enough that results are not altered (to a desired degree of accuracy) if it is increased further. To discretize the model we divide the trait space of the dynamic trait x into m stages (each of length ∆x) and the trait space of the static trait y into n stages (each of length ∆y). The vital rate functions for survival and fecundity are discretized into vectors of length mn (stacking the trait vectors), whereas the transition functions are discretized into matrices of dimension mn × mn. The resulting projection matrix is also a mn × mn projection matrix.

Appendix S2: Effects of ignoring heterogeneity
In this appendix we derive the results on consequences of ignoring heterogeneity presented in the main text. Here we assume that the underlying heterogeneous population (the "true" process) is defined by the model including a static trait y and a dynamic trait x, defined above. We then consider to the general cases where either all of the heterogeneity (i.e. both traits x and y) is ignored, or just part of the heterogeneity is ignored (i.e. ignoring the static trait y but still including x).
For each of the two cases we need to derive the corresponding model applying to the same underlying heterogeneous population. Such a model will be an unconditional model (i.e. unconditional with respect to y, or to x and y) and its parameters can therefore be calculated as functions of the parameters of the underlying model, applying formulas for unconditional mean, variance and covariance. The case where all structure is ignored corresponds, naturally, to an unstructured model, which is a simpler case than that of ignoring only part of the heterogeneity. In the latter case, the model will still be structured in x and new transition distributions for this trait must also be derived. Vindenes et al. (2008) applied the approach to study consequences of heterogeneity for demographic variance, but did not consider the case where only part of heterogeneity is ignored. Here we extend the approach to consider such cases, and also apply the approach to a range of parameters in addition to the demographic variance.
In this appendix we first derive the two comparison models ignoring all or part of the heterogeneity. In the following sections we apply these models to evaluate the consequences of ignoring heterogeneity for each of the demographic outputs mentioned in Appendix S1 (the long-term growth rate λ, the net reproductive rate R 0 , generation time G 1 and G 2 , demographic variance σ 2 d and environmental variance σ 2 e , and extinction risk). To help distinguish parameters from the heterogeneous and homogeneous models, we use a notation where parameters representing the model ignoring y are marked with an asterisk * (when necessary), while parameters for the model ignoring both x and y are marked with a double asterisk * * . For simplicity, we do not include the environmental variable θ in the equations here, but every vital rate can in principle depend on this variable as well.

S2-1. Comparison model ignoring all heterogeneity
From the heterogeneous model (model structured according to x and y) we have the joint stable trait distribution u(x, y) (scaled so that Ωy Ωx u(x, y)dxdx = 1).
This function can be considered as the joint probability distribution for sampling an individual with trait combination x and y (from a large population at stable distribution). Denoting the events of survival and reproduction (offspring number) as S and B, respectively, the estimates for survival probability and fecundity in this homogenous model are (Vindenes et al., 2008) These values are weighted averages, weighted by the stable structure. For the deterministic case, these are all the parameters needed to define the model, and the population growth rate is given by λ = s + b (which is exactly the same as in the heterogeneous model, as shown later).
To calculate the demographic variance, three more parameters need to be defined describing the unconditional variance and covariance of survival and fecundity. These are given by (Vindenes et al., 2008) Thus, even if the covariance from the heterogeneous model σ 2 BS (x, y) is zero, the covariance can be non-zero in the homogeneous model ignoring heterogeneity.

S2-2. Comparison model ignoring part of the heterogeneity
Many models account for some variation among individuals, for instance due to age or size, but other sources of heterogeneity may still be ignored. The derivation of a model ignoring the variation in trait y but keeping the trait x follows the same approach as above, but for this model we also need to derive the transition functions for the trait x that is still included.
Using the joint stable distribution u(x, y) from the heterogeneous model together with standard formulas for manipulation of conditional probabilities and expectations (in particular Bayes' theorem, the multiplication rule, and the law of total probability), obtaining expressions for the vital rates unconditional on y is relatively straightforward. As before, let the events of survival and reproduction (offspring number) be denoted as S and B, respectively. The unconditional survival probability and fecundity as a function of x are given by The offspring trait distribution also satisfies the condition Ωx f (x ; x)dx = 1.
The projection kernel of this model ignoring y is given by Using the same approach as in the previous subsection, the demographic variance components for survival, fecundity, and the covariance between survival and offspring are given by For this model calculation of demographic variance also requires that we consider demographic stochasticity arising from transitions in x and in the process of offspring allocation of x. Given the distributions g(x ; x) and f (x ; x), and the reproductive values v(x) calculated from the projection kernel above, we have

S2-3. Consequences of ignoring heterogeneity
In this section we use the comparison models to evaluate the consequences of ignoring heterogeneity for different model outputs, by comparison with the full heterogeneous model.

Consequences for the growth rate λ
Whether or not the underlying heterogeneity is recognized does not affect the value of the expected growth rate λ. This follows from the well known result in matrix models that when the stable distribution is reached, all stages will grow with the same rate λ (Leslie, 1945;Caswell, 2001). Thus, we should see the same growth rate λ no matter how the stages are defined.
Here we demonstrate this result using the model ignoring both x and y, although it holds also for the case where only y is ignored. Here, λ is given by To prove that this is the same as in the heterogeneous model, consider the equation for the right eigenfunction (Caswell, 2001), Ωy Ωx K(x , y , x, y)u(x, y)dx dy = λu(x, y). From this, and using the properties Thus, ignoring heterogeneity does not influence our estimate of long-term growth rate λ. We emphasize that this result applies to cases of hidden heterogeneity, i.e.
where the underlying population (the "true process") is the same, and only the model we use to describe it is changing. If the underlying population is not the same, i.e. when different populations are compared, any model parameter can be affected, including λ (Kendall et al., 2011).
Consequences for the net reproductive rate R 0 For the model ignoring all heterogeneity, we have R * * 0 = b/(1 − s). We could also find this using the standard formula for age-structured models (e.g. Caswell, 2001). Letting a denote age class, R * * 0 = ∞ a=1 s (a−1) b = b/(1 − s) (where the last equality is applying a standard formula for geometric series). For the comparison model ignoring only y, R * 0 is found in the same way as in the heterogeneous model, using the formula R * = B * (I − S * ) −1 , where the kernels S * and B * are now defined from the vital rates of this model.
From this we see that in general the net reproductive rate will be affected by ignoring heterogeneity, but the amount and direction of the bias will depend on the details of the underlying model. Some examples are provided in the main text.

Consequences for generation time
Since λ is not affected by ignoring heterogeneity, the effect on the first generation time estimate G 1 = ln R 0 / ln λ is determined entirely by the effect on R 0 . Thus, for the comparison models we have G * * 1 = ln R * * 0 / ln λ and G * 1 = ln R * 0 / ln λ. For the other measure of generation time, mean age of mothers at stable distribution, ignoring all heterogeneity gives the estimate G * * 2 = λ/b. Again, using the age-structured measure we get G * * 2 = ∞ a=1 bλ −a s (a−1) (Caswell, 2001), and from the formula of the sum of a geometric series this is also equal to λ/b.

Consequences for the demographic variance
Effects of heterogeneity on demographic variance was studied by Vindenes et al.
(2008) using matrix models, but not for the case ignoring just part of the heterogeneity. For the homogeneous model ignoring both x and y the demographic variance is given by while the demographic variance of the model ignoring y is given by Vindenes et al. (2008) showed that the effects of ignoring heterogeneity on demographic variance could be both positive and negative, depending on the details of the underlying model and the combination of heterogeneity in survival, fecundity, and transitions. The direction of the bias in environmental stochasticity introduced by ignoring heterogeneity depends on how the stochasticity enters the model, in particular on which vital rates are affected. In general, with stochasticity in fecundity the environmental variance is likely to be overestimated when heterogeneity is ignored, and vice versa for stochastic survival (Engen et al., 2007). Some examples are provided in the supplementary R code for the theoretical color example. The bias can be large, depending on the underlying heterogeneity.

Consequences for extinction risk
Ignoring heterogeneity affects the risk of ultimate extinction conditional on current population size P (U |N 0 ) through the demographic and environmental variance. Thus, to the extent that ignoring heterogeneity affects σ 2 d and σ 2 e , it will also affect estimates of extinction risk through these parameters. The estimated distribution of time to extinction will also generally be affected. Again, the direction and magnitude of the bias will depend on the underlying model. In the supplementary R code for the theoretical color example we provide an example of the different extinction risks calculated when all or part of the heterogeneity is ignored. One example is shown here in figure 1, with environmental stochasticity in survival. The model presented here is a one-sex model with two individual state variables, body length x and length at age 1 y, in addition to temperature T as environmental driver. Thus, in this particular case x = y for offspring. We assume a pre-breeding census so that the fecundity function includes offspring survival. Each year we assume that an individual first reproduces, then, if it survives, grows to reach a new size next year.

S3-1. Study system and data
The study system is described in detail elsewhere ( numbers, and continues today with few changes in methods other than a much reduced sampling effort (Le Cren, 2001). Each fall/winter (in most years from mid October to December), pike have been captured with 64 mm mesh gillnets in shallow areas of the lake. Captured pike were killed, measured for body length (in centimeters, measured as fork length), weighed (in kilograms), and sexed.
Opercular bones were removed for age and length back-calculation, following a method validated for Windermere pike by Frost & Kipling (1959). Since 1963, data on female reproductive investment (gonad weight, egg number, and egg weight) have also been collected (Frost & Kipling, 1967). In this study we use body length estimates that are measured and back-calculated using catch data for females from 1946-1990 (7939 individuals).
The survival data set (Winfield et al., 2013c) contains survival data from , for both males and females (3992 individuals), based on the capture mark recapture (CMR) study (for details on this study see Kipling & Le Cren, 1984;Haugen et al., 2007). There is no age information on these individuals, so survival is assumed to depend on length only. Data from both males, females and individuals of unknown sex (in total 3992 individuals) are used, in order to increase the statistical strength of the estimation procedure. Analysis of this data set is complicated by the fact that for individuals who were never recaptured, the time of death and therefore also the length at death are unknown (Windermere is a closed system so individuals do not leave the system). Vindenes et al. (2014) used a Monte Carlo resampling method to correct for this bias, including information from the somatic growth model.
The four data sets used by Vindenes et al. (2014) have been made available online (see URL for each data set in the references). They are i) Fecundity data (Winfield et al., 2013a), ii) Growth data (Winfield et al., 2013b), iii) Survival data (Winfield et al., 2013c), and iv) Lake temperature data (Winfield & Fletcher, 2013). The extended model presented here is based on the same data.

S3-2. Vital rate functions
Estimation of the vital rates from data was done with mixed effects models using the software R (R Development Core Team, 2013). Model selection between candidate models was based on AIC, details are provided by Vindenes et al. (2014). The main extensions in this model are inclusion of size at age 1 (heterogeneity) in the growth model and the survival model. We also evaluated potential effects of size at age 1 for fecundity (egg number), but these were so small that they were excluded from further analyses.
Below we describe each of the four main vital rate functions included in the model. The random year effects all have expectation zero, and are assumed to be multivariate Gaussian distributed in the stochastic calculations (Vindenes et al., 2014).

Fecundity
Fecundity is defined as the number of 1-year old female offspring produced by a female of length x at the temperature T , and is given by where p m (x) is the probability of maturity, m(x, T ) is the number of eggs produced, and s 0 is the survival probability from egg to 1-year old. The factor 0.5 enters because the model is female based, assuming a 1:1 sex ratio. The function m(x, T ) describing egg numbers was estimated by (Vindenes et al., 2014) and is given by where ε m is the random year effect for egg number. Data on offspring survival (from egg to age 1) is not available for this system, thus s 0 is a free parameter in the model. The value (s 0 = 0.00018) was chosen so that the population growth rate predicted by the model roughly matches the observed growth rate, as estimated by Langangen et al. (2011). This value is consistent with values reported from other studies (Wright, 1990). Egg survival is thus modeled as where ε s0 is the random year effect for egg survival (chosen to be normally distributed with standard deviation 0.1). The probability of maturity p m (x, T ) is also modeled on a logit scale, The values of β p0 and β px are chosen based on information from a study by Frost & Kipling (1967), who found that females mature at a size between 31-50 cm, with a mean size of 41.5. They also noted that in pike maturity is largely determined by size rather than age. Figure S3.1 shows the underlying functions and the resulting total fecundity b(x, T ).
The fecundity function presented here is slightly modified compared to the model of Vindenes et al. (2014) as here we included a reaction norm for size at maturity (without large changes in the overall fecundity function) and included two subsequent temperatures as egg formation happens in the year preceding the spawning year, while survival from egg to age 1 depends on temperature in the spawning year. This has no consequences for the deterministic model (assuming constant temperature), and only small consequences for the stochastic model. depends on temperature but the variance σ 2 y is constant. The mean (on absolute scale) is given by (Vindenes et al., 2014)

Offspring length distribution
where ε L1 is the random year effect for the offspring length distribution. The variance is constant. Figure S3.1 shows the distribution f (y ; T ) at three different temperatures.
If y has a heritable component, so that relatively larger parents (large y) tend to produce relatively larger offspring, this can be included in the offspring size distribution by adding an effect of parental size x in the mean, while reducing the variance σ 2 L1 accordingly.

Survival
The survival probability was estimated as a function of length and temperature by Vindenes et al. (2014). Here we use this function and also added an effect α of the static trait y, adjusting the intercept accordingly so that the average survival is the same as before. The survival model is then given by logit s(x, y, T ) = (β 0s − αȳ) + β xs x + β T s T + β T xs T x + αy, whereȳ is the average of y in the mean environment. Figure S3.2 shows the survival function for two values of α, a negative value corresponding to a trade-off between growth and survival, and a positive corresponding to an "individual quality" effect.  Figure S3.2: Survival as a function of length x shown for three values of y (length at age 1). The panels represent two scenarios for the effect of y on survival. A. A "trade-off" scenario, where rapid growth is associated with increased mortality (see also figure S3.3). B. A "quality effect" scenario, where individuals with rapid growth also have higher survival.
The parameter α measuring the effect of y on survival cannot be estimated from the CMR data. However, using the data from the winter fishery where age as well as y is measured, we considered the effect of y on age at capture, to get some idea of whether the effect is present and the direction of the effect. This analysis shows that age at capture was negatively associated with y, which may indicate a trade-off between rapid early growth and later survival ( figure S3.3).
Increased early growth will also make individuals susceptible to the fishery at an earlier age, but the fishing pressure from this scientific fishery is so low that it seems unlikely that this could explains all of the effect on mortality. Thus, this analysis at least indicates that such a negative trade-off might be present, although it cannot be used to give a reliable measure of α for the survival model.
In the results provided in the main text (also below) we therefore explore consequences of ignoring heterogeneity for different values of α ranging from strong negative values (trade-off effects) to positive ("individual quality" effects) values. This exercise also provides a good illustration of how the results depend on the degree of underlying heterogeneity (varying with α).

Effect of early size
Capture age (years) Length at age 1 y (cm) Figure S3.3: Effect of the static trait y (length at age 1) on age at capture. The estimated relationship (red line) is given by ln Age = 1.71 − 0.00984y. This result is only included as an indication that there might be a trade-off between rapid growth and mortality, and is not used in the model.

Somatic growth
The somatic growth model measures the distribution of next year's length x as a function of current length x, temperature T , and length at age 1 y. The mean and variance of this distribution was estimated by a mixed model including year as random effect and the other variables as fixed effects. Several candidate models were considered including second order effects of x, as well as interaction effects between x, y, and T . The variance of x was modeled as an exponential function of current length x (with no effect of y and T ). The final model (lowest AIC) for mean length is given by where ε G is the random year effect, while the variance function is given by where τ g = 10.7 and ν g = −0.0061. The resulting model for mean length next year includes an interaction between x and y as well as an interaction between x 2 and y (see figure S3.4). This second order interaction occurs because of stronger non-linearity in the growth trajectory of individuals of small y, while those of large y have more linear trajectories. The growth data thus suggest that initially small individuals increase their growth rates more at low sizes (an "attempt to catch up" with the larger ones), but then stop growing at a smaller final size, while the larger individuals show a more steady decline in the growth rate over their lifetime.
The final growth distribution g(x ; x, y, T ) is modeled as a lognormal distribution with mean µ G (x, y, T ) and variance σ 2 G (x, y, T ), truncated to zero for x < x to prevent individuals from shrinking in length (Vindenes et al., 2014).     denes et al., 2014). Egg survival to age 1 s * 0 is not estimated from data and covariance with other vital rates is set to zero.

S3-3. Projection kernel and results
For a given temperature T , the projection kernel for this model is given by In this particular example the distributions of the dynamic and static traits are exactly the same at age 1, therefore a delta function also enters the second term corresponding to the reproduction kernel. This ensures that if x = y for offspring, the probability of offspring obtaining a length combination x , y is zero.
To evaluate consequences of environmental stochasticity, we generate a sequence of projection kernels from the model. Here the stochasticity arises from temperature (a normal distribution; Vindenes et al., 2014) and through residual variation and covariation between the year effects of the estimated vital rates (variance/covariance matrix provided in R code, further details provided by Vindenes et al., 2014). These residuals were simulated using a multivariate Gaussian distribution. In the stochastic IPM the effects of temperature occurs at different time steps. Egg number depends on temperature the year before (as egg formation occurs in the female the autumn before the spawning year) while all other temperature-dependent vital rates depend on temperature of the current year. This was not included in the original model of Vindenes et al. (2014) but has very small effects on the results.

Reduced model treating y as a parameter
If y is treated as a parameter rather than a state variable (which is causing some conflicting implications as it implicitly assumes perfect heritability of y but at the same time allowing lengths at age 1 x different from y) the resulting model is only length-structured (not to be confused with the model ignoring y used to evaluate consequences of ignoring heterogeneity). Figure  1.8

Reduced model (y as parameter)
λ Length age 1 y (cm) α = -0.1 α = 0 α = 0.1 Figure S3.5: Effect of y on long-term growth rate λ in the reduced model treating y as a parameter rather than a state variable (assuming all individuals have the same y value), estimated for the mean temperature T = 10.5 • C. The different lines represent different assumptions of the effect of y on survival, where α = −0.1 corresponds to a growth/survival trade-off, α = 0 represents the case of no effect (y only affects growth) and α = 0.1 represents a case of a "quality effect" (individuals with rapid growth also have higher survival).
(pike with larger y have a slightly larger reproductive value at intermediate sizes). Note that age differences in reproductive value will still be large, as pike with large y will reach a given size x at a younger age than individuals of low y.
Thus, there are still large individual differences, but because they only affect growth they are largely captured by the dynamic trait x.
Effects of ignoring heterogeneity in y    Figure S3.6: Stable trait distribution (A-C, smallest lengths x corresponding to age 1 not shown) and reproductive value (D-F) for three values of α, where α = −0.1 (A,D) corresponds to a growth/survival trade-off scenario (λ = 0.942), α = 0 (B,E) represents the case where y affects growth only (λ = 1.074) and α = 0.1 (C,F) represents a "quality effect" (rapid growth is associated with higher survival, λ = 1.294). The spiky points at low lengths in the stable distribution occur because x = y for offspring, while after the first year there is a spread in x-values for each y.

S4-1: Vital rates, effect of environment and harvesting
The survival probability, fecundity, and mean and variability of the somatic growth rate (transition in x) of the model are shown in figure S4.1. We now include an environmental variable θ that has opposite effect on survival of red and green individuals, as shown in figure S4.2. Thus, for a given range of θ green individuals will be favored towards one end while red individuals are favored towards the other, with an unstable equilibrium at an intermediate value. Using this model we will also consider a case of harvesting, where a constant proportion of the population above a certain body size (both colors) will be be harvested. This is also included in the survival function.

S4-2: Genetic mechanism of inheritance of color
We will compare a model where color is genetically determined to the model used in example 1 of the main text, where color is randomly assigned at birth with probability 0.5 (an arbitrary selected value). The genetic model assumes a Mendelian mechanism of inheritance with three genotypes determined by two alleles (A and a) on one locus. Genotypes AA and Aa both produce the red phenotype, while genotype aa produces the green phenotype.
The model is female based, and we assume that the genotype distribution is the same in males and females (i.e., males and females have the same life history). Females reproduce once per year and choose a mate randomly from the population. Let r be the frequency of red individuals in the population, while p is the frequency of allele A. Since p is not observed, we aim to express the offspring color distribution as a function of r, for each color of parent. Using the Hardy Weinberg frequencies of the genotypes (p 2 , 2p(1 − p) and (1 − p) 2 ), the relationship between p and r is given by where the other solution is not applicable as 0 < p < 1. Let O R and O G denote the events that an offspring is red and green, respectively, while M R and M G denote the events of a red and green parent. Further, let M AA denote the event that a parent has genotype AA, and similarly for the other two genotypes. The conditional probabilities of offspring color given parental ( Figure S4.3: Predicted growth rate as a function of environment θ, for the nongenetic model with random assignment of color (red line) and the genetic model with Mendelian determination of color, for three values of r, the proportion of red individuals in the population. Figure S4.3 shows the predicted long-term growth rate λ (average fitness) as a function of the environment θ, for the two models (genetic and non-genetic).
For the intermediate values of the environment neither red or green individuals are well adapted and predicted growth rate is below 1. In the genetic model a population consisting of mostly red individuals has highest average fitness for low values of θ (for simplicity we call this a "red" environment), while for high values of θ a "green" population has highest fitness. The fitness of the "non-genetic" population is also highest towards either end of the environmental range, but not as high as in the best adapted population with genetically determined color. The difference can be interpreted as the genetic load of having a higher variability of colors. In other words, local adaptation is generally higher in the case where color is genetically determined, but this comes at the cost of loosing color variation.
If the environment is now changing, for instance from a value that favors red individuals to one favoring green, the response time of the population will depend on the initial distribution of red and green individuals at the time of the change, but also on the generation time relative to the rate of environmental change. If generation time is long compared to the rate of change, the population with genetically determined color will be slower to respond than the population where color is randomly assigned. Figure S4.4 shows an example where the environment suddenly changes from a value favoring red (θ = −0.4) to one favoring green (θ = 0.6) individuals, and the projected population growth in the models with random (non-genetic) and genetic determination of color. Both projections start from the same initial population of N = 100 at the predicted stable trait distribution for r = 0.5 in the genetic model. As the environment changes after a few time steps, the "genetic" population which had already become largely adapted to the red environment experiences a much larger decline than the "non-genetic" population, but then starts to increase and the growth rate of the population increases continually with the proportion of green individuals. The "non-genetic" population experiences only a small decline before increasing at a steady rate. Had the initial period of adaptation been a bit longer, the "genetic" population would not have had enough green individuals present (genetic variability) to respond to the change and would instead have gone extinct.
Many studies of eco-evolutionary dynamics focus on consequences of  Figure S4.4: Projected changes in population size and color distribution through a sudden environmental shift, for a population where color is genetically determined ("Genetic") and a population where color is randomly assigned ("Non-genetic"). The populations both start from a density of 100 individuals at the stable distribution (estimated for the genetic model with r = 0.5). The environment has a value of θ = −0.4 during the first time steps ("red" environment), then suddenly changes to a value of θ = 0.6 ("green" environment) at the time indicated by the vertical dashed line.
harvesting, and in particular harvesting based on demographic traits such as size (e.g., Traill et al., 2014). Therefore, we also considered what would happen in this example if we added proportional harvesting above some threshold size. Figure S4.5 shows the result for the same projections as in figure S4.4 (including the environmental change), but harvesting a proportion 0.8 of all individuals (both colors) above size x = 60. In this example, the growth rate of the "non-genetic" population becomes negative, leading the population process towards extinction, whereas the population with genetic heritability of color still manages to grow after a while, increasing the growth rate as the population is adapting to its new environment.