SUBSTITUTION RATES AT NEUTRAL GENES DEPEND ON POPULATION SIZE UNDER FLUCTUATING DEMOGRAPHY AND OVERLAPPING GENERATIONS

Authors


Abstract

It is widely accepted that the rate of evolution (substitution rate) at neutral genes is unaffected by population size fluctuations. This result has implications for the analysis of genetic data in population genetics and phylogenetics, and provides, in particular, a justification for the concept of the molecular clock. Here, we show that the substitution rate at neutral genes does depend on population size fluctuations in the presence of overlapping generations. As both population size fluctuations and overlapping generations are expected to be the norm rather than the exception in natural populations, this observation may be relevant for understanding variation in substitution rates within and between lineages.

One of the central results of the Neutral Theory of evolution (Kimura and Ohta 1971; Kimura 1983) states that the rate k of allele substitution (rate of evolution) at neutral loci is unaffected by fluctuations in population size and is simply equal to the mutation rate. The explanation behind this result goes as follows. The number of mutants that enter a haploid population of size N is equal to inline image; the number of individuals born in the last generation times the mutation rate inline image at the locus considered. Conversely, the probability that any new mutant allele reaches ultimate fixation (i.e., replaces the other alleles at this locus) corresponds to its initial frequency 1/N in the population. Because the substitution rate k is the product of the number of mutants (inline image) and the fixation probability (1/N), k becomes independent of N: k =μ. This result provides a justification for the assumption of a molecular clock (Zuckerkandl and Pauling 1962; Margoliash 1963), which allows dating evolutionary events such as host jumps, age of infectious disease outbreaks, and speciation events (e.g., Ingman et al. 2000; Korber et al. 2000; Nübel et al. 2010).

It is well known that for alleles under natural selection the substitution rate is not independent of effective population size, which is itself affected by past population sizes (Kimura 1983; Otto and Whitlock 1997; Orr 2000; Bromham and Penny 2003; Waples 2010). However, effective population size is generally difficult to estimate accurately for natural populations. As such, a frequent assumption in population genetics and phylogenetics is that the genetic markers under study are strictly neutral, which, in practice, corresponds to using genetic markers deemed largely neutral, such as synonymous mutations in sequence data. Under such conditions, the result that the substitution rate is independent of population size fluctuations can be applied and is often invoked in molecular population biology. However, the range of biological assumptions under which the substitution rate is independent of population size fluctuations may not have been fully explored.

Our attention has been drawn to a possible dependency of substitution rate on demography by empirical work on the plague (Yersinia pestis). Morelli et al. (2010) observed an excess of apparently neutral mutations reaching fixation following population size expansion during new plague epidemics. Additional empirical work on 118 complete genomes points to repeated episodes of acceleration of the molecular clock at apparently strictly neutral genes during episodes of population expansion of the plague and qualitatively similar patterns could be generated with stochastic simulations of serial outbreaks of plague lineages with overlapping generations (Cui Y. et al. unpubl. ms.). These results suggest that even for strictly neutral genes, the substitution rate may depend on population size fluctuations.

Indeed, one may expect that the combination of overlapping generations and fluctuations in population size may affect the product of the number of new alleles entering the population through mutation and the probability of each such new mutation reaching eventual fixation. Assuming that all mutations happen at reproduction (replication), the number of new mutants entering the system varies under overlapping generations and fluctuating demography. Thus, in a growing population, there will be a relatively large number of new offspring, but this number will not necessarily correspond to the inverse of population size if some adults are surviving. Conversely, in a declining population there will be little or no space left for newborn individuals carrying new mutations to enter the population if adults have a high survival probability. Here again, the number of new offspring is unlikely to correspond to the inverse of population size.

In this article, we present an analytical model for the substitution rate under the joint effect of overlapping generations and population size fluctuations. Although the two effects are hardly ever considered together in the same population genetics model, they represent both most reasonable assumptions. Population fluctuations and overlapping generations represent the norm rather than the exception throughout natural populations and their dynamic interactions are eagerly studied in evolutionary demography (Tuljapurkar 1989). The analytical model we develop allows us to show that the substitution rate at neutral genes does depend on demography in populations with overlapping generations and population size fluctuations. Moreover, the quantitative deviation can be strong depending on the demography considered.

The Model

Let us consider a haploid panmictic population of finite size with a discrete number of age classes (inline image) and consisting of individuals reproducing at discrete time points (inline image). This population can be characterized at each time point by its demographic state inline image, which could, for instance, be the size of the population at that time, the distribution of the number of individuals in each age class, or the current environmental conditions.

We assume that the demography of this population follows a time homogeneous Markov chain and denotes by inline image the transition probability from demographic state inline image in a parental generation at time inline image to demographic state inline image in the descendant generation at time inline image. We further assume that the Markov chain is aperiodic and irreducible so that it eventually enters a stationary distribution, with the probability of occurrence of state inline image at stationarity being denoted byinline image (Karlin and Taylor 1975; Grimmett and Stirzaker 2001).

Our aim is to evaluate the substitution rate (Kimura and Ohta 1971) in the population once it has reached the stationary demographic regime. We assume that mutant individuals are only produced upon reproduction (replication), with probability inline image, and enter the population as age class one individuals (inline image). We also assume an infinite site model of mutation (Kimura and Crow 1964) so that each mutant is of a novel type and may eventually reach fixation in the population.

We define the quantity

image(1)

where inline image is the expected number of age class one mutant individuals produced over one unit of time when the population is in state inline image and that will ultimately reach fixation in the population so that inline image is the expectation of this number over all demographic states. We have

image(2)

where inline image is the expected number of individuals of age class one in demographic state inline image in a descendant generation that have been produced by individuals in demographic state inline image in the parental generation and inline image is the probability of ultimate fixation in the population of a single copy of an allele in a population in demographic state inline image. When the process is run over a very long time, inline image gives the expected number of mutants reaching fixation in the population per unit time and thus qualifies as the substitution rate.The substitution rate can also be expressed as

image(3)

where

image(4)

which is the expected number of individuals in age class one when the population is in state inline image, and where inline image is the backward transition probability that a population taken in demographic state inline image descends from a parental population in demographic state inline image. The expression for inline image emphasizes that the number of mutants entering the population in a given demographic state depends on both the states of the parental and the descendant generations, and thus possibly on the sizes of the parental and descendant generations. We see from equation (3) that for k to become independent of population size, the values of population sizes appearing in inline image must cancel those appearing in inline image. To understand when this may be the case, we now consider several explicit examples.

FLUCTUATING DEMOGRAPHY WITHOUT OVERLAPPING GENERATIONS

Let us first consider the case in which there are no overlapping generations but fluctuations in population size are allowed. Further, consider that the demographic states determine population sizes so that in state inline image the total population size is inline image. Because all individuals in the parental generation die, we have inline image, whereby inline image. Further, it is well established that in this case, the fixation probability of a single mutant is simply the inverse of the number of individuals in the population: inline image (Charlesworth 1980; Kimura 1983).

Substituting these expressions into equation (3), we obtain inline image, which states that the mutation rate (μ) equals the substitution rate (inline image) for neutral allele. In other words, we recover the classical result that population size fluctuations do not affect substitution rates at neutral loci in a population with discrete nonoverlapping generations.

OVERLAPPING GENERATIONS WITHOUT FLUCTUATING DEMOGRAPHY

We now consider the case of a population with overlapping generations but constant size N. For simplicity, we assume that every individual in a given demographic state has the same survival probability to the next time step than any other individual (i.e., no senescence). We further assume that the demographic state determines the survival probability of individuals from one generation to the next so that inline image, where inline image is the survival probability of an individual in state i. With this, the expected number of individuals in age class one when the population is in state inline image is inline image and thus depends on the transition probabilities of the demographic process. But as every individual in a given generation (newborn or surviving one) has exactly the same fitness, the fixation probability of a single mutant is inline image in any state.

Substituting these expressions into equation (3), we have

image(5)

which is the mean number of mutant age class one individuals produced in the population per individual. Hence, introducing overlapping generations reduces the substitution rate as fewer age class one individuals are produced per generation and therefore mutants. If the survival probability is constant (inline image), the substitution rate further simplifies to inline image. This is again a classical population genetics result (Charlesworth 1980).

FLUCTUATING DEMOGRAPHY AND OVERLAPPING GENERATIONS

We will now consider two different demographic models in which population sizes fluctuate over time and there are overlapping generations. To obtain explicit analytical expressions, we assume as above that individuals do not senesce and all have equal survival probabilities within a given demographic state, which is assumed to determine population size. However, this survival probability may depend on the population size of successive demographic states, for example, during a decline of population size. Under these assumptions, the expected number of individuals of age class state inline image descending from state inline image can be written as

image(6)

where inline image is interpreted as the survival probability of a random individual living in a population in demographic state inline image in a parental generation and that will be in demographic state inline image in the descendant generation. Because inline image, the population sizes must satisfy the constraint inline image for a given inline image value. The expression for inline image says that the number of age class one individuals in demographic state j is the total number of individuals in that state minus the expected number of adults in state i that survived the transition. The fixation probability of a single mutant in state inline image in this model is still inline image as every individual has exactly the same fitness.

Two-state demography

We now consider the case in which the population fluctuates between two sizes inline image and inline image. The transition probabilities of the demography are written as inline image and inline image for inline image, where inline image. The stationary probability that the population takes size inline image is then given by inline image and with complementary probability it takes size inline image (Grimmett and Stirzaker 2001).

Substituting the stationary probabilities, the transition probabilities, and equation (6) into equation (4), we find after algebraic simplification that

image(7)

for inline image and inline image. This equation illustrates that the number of age class one individuals in a given demographic state may depend on both the sizes of the population in the parental and descendant generation. Hence, population size is unlikely to cancel out from the product inline image in equation (3), and we find that the substitution rate can be expressed under the present assumptions as

image(8)

The population sizes, inline image and inline image, cannot be factored out from this equation and the substitution rate thus depends on all the features of the demographic process.

To be able to compare the effect on inline image of varying population size relative to the case in which demography is constant, we introduce the standardized substitution rate inline image, which is the substitution rate inline image divided by its equivalent under constant population size (inline image). An example of the substitution rate with constant population size in the presence of overlapping generations was provided above (see eq. 5).

Using equation (8), we find that the standardized substitution rate can be written as

image(9)

If we consider that inline image, holding everything else constant, then we can see that the substitution rate is increased relative to what would be expected in the absence of population size fluctuations if inline image and reduced otherwise. That is, population size fluctuations affect positively the substitution rate when the survival probability during a transition from the small to the large population size (inline image) tends to be large, and the survival probability during a transition from the large to the small population size (inline image) tends to be small.

Why does an increase in the survival probability during a transition from the small to the large population (inline image) tend to increase ks? From equation (7), we can see that when a transition from small to large population size occurs (from state 1 to 2), the reduction in k due to overlapping generations in a situation of constant demography is equal to inline image. By contrast, under fluctuating demography and overlapping generations, this reduction is given by inline image, which is lower than inline image by our assumption that inline image. Population expansion will thus reduce the decrease of k due to the presence of overlapping generations (because s12N1/N2 is lower than s12), in other words, population expansion will increase ks. Conversely, a reduction in population size will exacerbate the reduction of k due to overlapping generations (because s21N2/N1 is lower than s21).

In the case in which we enforce inline image, so that the probabilities of occurrence of the demographic states are the same, we further obtain

image(10)

and if we consider the extreme (arguably biologically unrealistic) values for survival rates, where all individuals survive when the population is growing or remains stable but all individuals born at previous time steps die during a bottleneck (inline image and inline image), we obtain

image(11)

If inline image is much larger than inline image, the ratio in inline image will eventually vanish, and then inline image = 2, which represents an upper bound for the standardized substitution rate for a demography with only two population sizes. In the following, we will expand the model to an arbitrary number of population sizes to explore whether we can generate stronger quantitative effects on the standardized substitution rate.

More demographic states

We now consider a situation with c demographic states, with transition probabilities inline image, inline image, and inline image for inline image. That is, the population either remains in state inline image with probability inline image or moves to state inline image with complementary probability and when it has reached the last state returns to state 1. The stationary distribution of this model is inline image. Substituting these stationary probabilities, transition probabilities, and equation (6) into equation (4), we find that

image(12)

which illustrates again that the number of age class one individuals in a given demographic state may depend on both the sizes of the population in the parental and descendant generation. The substitution rate (eq. 3) can be expressed as

image(13)

We now consider the case in which inline image for all i so that the population follows a dynamic with increasing population size followed by a drastic bottleneck once it has reached state c. Then, we can further assume that inline image and inline image, except inline image, so that as above all individuals survive when the population is growing or remains stable but all individuals born at previous time steps die during a bottleneck. In this case, the standardized substitution rate becomes

image(14)

which can be made approximately as large as the number of demographic states c (population sizes) if the difference in successive population sizes is extremely large during the expansion phase, which may be unrealistic for natural populations. This example illustrates that certain demographic regimes may result in a marked increase in the substitution rate compared to the one expected for a population with overlapping generations but no demographic fluctuations. Population size fluctuations coupled with overlapping generation could in principle also decrease the substitution rate. However, under biological realistic situations, when survival is higher when the population is growing and lower during population contractions, the effect will generally translate into an acceleration of the standardized substitution rate relative to that observed under constant demography.

Explicit demographic model

To gain some intuition about the extent to which the substitution rate is affected by varying population size in a natural system, we evaluate the standardized substitution rate (ks) by assuming a standard serial demographic process consisting of population expansions with density-dependent competition followed by a bottleneck once the population has reached carrying capacity. To that aim, we further assume that the parameter inline image tends to zero in equation (13), so that the process moves in cycles of growth and bottleneck in a quasi-deterministic way, and that the model describing the growth phase is given by

image(15)

for inline image. Hence, during the phase of growth, a proportion s of adult individuals survive to the next generation, and produce new offspring according to the Beverton–Holt model of density-dependent competition (discrete time analogue of the logistic model; Begon et al. 1996; Brannstrom and Sumpter 2005) with parameters r, which describes the fecundity of an individual in the absence of competition, and inline image, which captures the effect of density-dependent competition. If there were no bottlenecks, then the population would eventually settle in the carrying capacity inline image.

To evaluate the standardized substitution rate under the growth law described above (eq. 15) and in the presence of bottlenecks, we assume that, given values for the parameters s, r, and inline image, we let the population grow from its initial size inline image until it has reached the first size in the interval inline image and then let the bottleneck occur. The number of time steps it takes to reach this interval defines the number c of demographic classes in the model. In Figure 1, we graph the standardized substitution rate and associated demography as a function of s for given values of r and inline image. Figure 1 shows that inline image increases with s and that the effect of demography for instance increases its value by 50% under the demography depicted in C.

Figure 1.

Substitution rate and demographies under growth law eq. (13) for r = 1.2, η= 0.001, and various values of survival s. (A–C) Population demography for s = 0.1, s = 0.5, and s = 0.9, respectively. Time steps (t) are represented in the x-axis and population census size on the y-axis. (D) The standardized substitution rate as a function of increasing values of s, with s = 0.1, s = 0.5, and s = 0.9 corresponding to the demographies represented in A–C highlighted on the curve.

Discussion

In this article, we have shown that the substitution rate at neutral genes may strongly depend on population size fluctuations. This effect does not require unusual assumptions, as it is sufficient jointly to consider overlapping generations with fluctuating demography. Overlapping generations is the norm rather the exception in natural populations. Fluctuating population sizes seem equally common in nature, even if the amplitude of population size fluctuations is expected to be generally greater in prokaryotes than in higher eukaryotes.

Our model suggests that the dependence on demography of the substitution rate at neutral markers should be strongest in populations characterized by strong fluctuations in population size and a reasonably long life span (i.e., probability of survival from one time step to the next). Deviations from a strict molecular clock are not uncommon in empirical datasets (Drummond et al. 2006). Moreover, empirically observed mutation rates over short periods of time are often larger than substitution rates estimated over longer evolutionary timescales (Lambert et al. 2002; Howell et al. 2003). There has been considerable discussion in the literature on the causes that may underlie this difference, which has been sometimes ascribed to purging of slightly deleterious mutations being a slow process (Ho et al. 2005; Rocha et al. 2006; Soares et al. 2009). Our results suggest that in some cases, population size fluctuations should also be taken into account.

The dependence of the substitution rate on demography in our model is a distinct effect from the one previously described for loci under selection (Kimura and Ohta 1974; Pollak 1982; Otto and Whitlock 1997; Rousset 2004). Under the assumptions of population size fluctuations and overlapping generations, the substitution rate is affected by population size irrespectively of the selection regime. Indeed, the effect described in this article is driven by the relative balance between the number of new mutants entering the system and the fixation probability of neutral alleles at any time step (μnj and πj, respectively in eq. 3). The number of mutants that enter the population is a function of census and not effective population size for both neutral and selected alleles. The eventual fixation probability of neutral alleles is also a function of census population sizes and not effective population size (Appendix 1 of Leturque and Rousset 2002).

As such, the number of new mutants per time step (μnj) will remain unaffected by the regime of natural selection acting on the locus under scrutiny (unless mutation and demography are themselves affected by selection). However, the fixation probability (πj) of a mutant in a given demographic state will be affected by natural selection. Under the deterministic demography described in the “Explicit demographic model” section, πj may be replaced by the approximation πj > 2S Ne/Nj, where S is the selection coefficient and Ne the effective population size (Ewens 1967; Kimura 1970; Otto and Whitlock 1997). This shows that the substitution rate at selected loci is not a direct function of effective population size, and will depend explicitly on the number of new mutants entering the system in each demographic state.

We evaluated the substitution rate under very simplistic demographic scenarios, and more realistic cases would have to be analyzed in the future with further theoretical work or with experimental evolution setups. However, our results already allow us to make some general predictions on how population size fluctuations and overlapping generations are likely to affect variation in substitution rates within and between lineages.

Within lineage variation in the substitution rate due to demography may be difficult to assess. Most animal and some plant species (in temperate regions at least) tend to follow seasonal population size fluctuations. Fairly regular cycles are also expected in many obligate pathogens, which alternate population bottlenecks during host-to-host transmission with exponential population growth after infection of the host. A good example would be influenza, which is characterized by a fairly constant host-to-host transmission time interval of 2–3 days (Fraser et al. 2009), and cannot survive outside the human host. Under such regular cycles, substitution rate is simply expected to accelerate. It should in principle be possible to run a comparative analysis of substitution rates among various organisms to test for an effect of demographic fluctuations. The difficulty stems from the large number of possible confounding variables. As such, the best strategy would be to compare sister taxa with highly contrasted demographies.

Variation of substitution rates between lineages undergoing different demographic trajectories should be easier to detect. A good example may be the plague (Y. pestis), which is an endemic bacterial disease in rodents and can also survive in the soil for some time. Local outbreaks and epidemics can flare up, some of which led to the historical pandemics and more recently to the global pandemic at the end of the 19th century. In line with the predictions of our model, there seems to be an excess of new mutations reaching fixation in lineages having generated new epidemics (Morelli et al. 2010). Unpublished simulations (Cui et al. unpubl. ms.) show that lineages having generated a larger number of outbreaks in the past accumulate an excess of neutral single nucleotide polymorphisms (SNPs). This pattern seems highly robust and could be replicated over a very wide range of parameter values considered.

HIV lineages infecting communities with different risk-taking behaviors may provide another interesting empirical test case. From our results, we would predict higher substitution rates in lineages of HIV circulating in groups at higher risk of infection than in lineages found primarily in less risk-prone communities. Such an effect would be driven by the variation in time intervals between host-to-host transmissions, which will lead to lineages circulating in high-risk communities experiencing more frequent population expansions and bottlenecks corresponding to recurrent infections of new hosts along the transmission chain.

In this article, we have shown analytically that substitution rates at neutral loci are affected by population size under fluctuating population sizes and overlapping generations, which are both expected to be the norm rather than the exception in natural populations. We hope that our model will motivate future theoretical and empirical work to assess the importance of this effect in natural populations.


Associate Editor: J. Wares

ACKNOWLEDGMENTS

We are grateful to M. Achtman, D. Balding, B. Charlesworth, T. Jombart, and L. Weinert for useful discussions and comments on earlier drafts of the manuscript. This work was supported by the Biotechnology and Biological Sciences Research Council (BBSRC) grant BB/H008802/1, the European Research Council (ERC) grant 260801-BIG_IDEA), and the Swiss National Science Foundation (SNSF) grant PP00P3–123344.

Ancillary