A SELECTION MODEL OF MOLECULAR EVOLUTION INCORPORATING THE EFFECTIVE POPULATION SIZE

Authors


Evolution implies changes in allele frequencies over generations. Eventually, unless subject to balancing selection pressures, alleles segregating in a population are destined to arrive at either of two endpoints, loss or fixation. The probability of fixation is influenced by several factors that may vary over space and time, and also interact with each other. Population structure (spatial separation coupled with nonrandom mating among demes) and immigration are examples of such factors (Whitlock and Gomulkiewicz 2005). In addition, selection and drift also drive the fixation process. Directional Darwinian selection favors certain genotypes at the expense of others and will enhance the spread of advantageous alleles in the population so that the likelihood for their fixation is increased. Random genetic drift on the other hand is a phenomenon that arises from a stochastic process in which sampling in finite populations result in allele frequency change purely due to chance events. Selection and drift act in concert, and the stochastic nature of drift means that variant alleles can change in frequency in a direction opposed to that incurred by their selective advantage or disadvantage. Generally, we should expect the importance of drift to increase with decreasing selection coefficients (s, the relative fitness of individuals carrying a variant allele) of nonneutral alleles.

Although the theoretical framework associated with selection and drift in population genetics is well established (Fisher 1922; Wright 1931), empirical data on their relative importance across lineages generally remain sparse. By the mid of the last century, the Modern Synthesis put strong emphasis on selection (Huxley 1942). However, surprisingly, when molecular data started to appear soon thereafter, the observations of vast amounts of genetic diversity within populations (Shaw 1965) and an approximately linear rate of protein evolution over time (Zuckerkandl and Pauling 1965) gave fuel to the idea of drift as a potent force, notably as postulated in the Neutral Theory of Molecular Evolution of Mooto Kimura (Kimura 1968; King and Jukes 1969). The neutralist–selectionist debate (Gillespie 1991; Nei 2005) has been around since then and is still on the agenda of evolutionary biologists, with cases made both for selection (Gillespie 2001; Hahn 2008) and nonadaptive processes (Lynch 2005; 2007a,b).

Whole-genome analyses provide the necessary broad-scale perspective on molecular evolution; they go beyond the heterogeneous nature of the evolution of individual genes or genomic regions and allow for an overall assessment of the factors shaping molecular evolution in the lineages under study. A number of such studies have recently given indisputable evidence for positive selection affecting many genes in vertebrate as well as invertebrate genomes (e.g., Clark et al. 2003; Bustamante et al. 2005; Nielsen et al. 2005; Begun et al. 2007; Kosiol et al. 2008). However, although selection is thus gaining increasing empirical support as representing a key driver in molecular evolution, there are accumulating data to suggest that the role or strength of selection predictably varies among lineages in at least two different respects, in both cases relating to the effective population size (Ne). For one of them, natural selection is challenged by genetic drift. The purpose of this Commentary is to shed light on these new results as they, in my opinion, argue for that models of molecular evolution have to make realistic account of demographic characteristics of individuals lineages.

Slightly Deleterious Mutations and the Effective Population Size

The first observation comes from multiple species comparisons of the mean rate of functional divergence in protein sequences. This rate is typically quantified by the ratio of rates of nonsynonymous to synonymous substitution (dN/dS). When comparing the rate of protein divergence among lineages, a higher dN/dS means either more adaptive evolution or relaxed selective constraints (as long as dN/dS < 1). Large-scale analyses of orthologous gene sets in several mammalian genomes suggest that lineage-specific mean dN/dS varies among clades following a distinct pattern (Table 1). The emerging picture includes an increase in mean dN/dS in the primate lineage, particularly so in hominids. The so far highest dN/dS ratios are seen in the human and chimpanzee lineages since their divergence about 6 million years ago, and in the platypus lineage (marsupials) since its split from opossum (monotremes). The lowest ratios are recorded in the mouse and rat lineages, whereas intermediate ratios are seen in the dog and opossum lineages.

Table 1.  Comparisons of mean dN/dS in different lineages. Each analysis is based on the same set of genes in the different lineages.
Lineage (mean dN/dS)Reference
Human (0.249)>Chimpanzee (0.245)>Macaque (0.191)>Dog (0.140)>Mouse (0.127)>Rat (0.121)Kosiol et al. 2008
Human (0.132)=Platypus (0.132)>Dog (0.128)>Opossum (0.125)>Mouse (0.105)Warren et al. 2008
Human (0.112)>Dog (0.095)>Mouse (0.088)Lindblad-Toh et al. 2005
Chimpanzee (0.175)>Human (0.169)>Dog (0.128)>Macaque (0.124)>Mouse (0.104)Gibbs et al. 2007
Human and Chimpanzee (0.20)>Mouse and Rat (0.14)Mikkelsen et al. 2005

Seemingly, these data indicate a negative correlation between effective population size and mean dN/dS; small rodents such as mice and rat are usually abundant whereas human (until recently) and chimpanzee populations are relatively small (as is the platypus). (Of course, census number can widely exceed the effective population size but the two parameters should correlate well within a particular class of organisms, like mammals.) The accumulation of nucleotide substitutions in coding sequence is a slow process and lineage-specific estimates of dN/dS are obviously the result of fixation of mutations over evolutionary time scales. For this reason, the effective size of contemporary populations may not necessarily be representative for the effective population size of past populations along the lineage in which dN/dS is estimated. However, coalescence models can be used to infer ancestral effective population sizes (Burgess and Yang 2008). Figure 1 shows the relationship between estimated effective population size and mean dN/dS in different internal as well as terminal lineages of primates and mouse. There is a strong negative correlation (r= 0.990, P= 0.0013, N= 5). Another way of showing this relationship is to use generation time as a proxy for effective population size, as these two variables tend to be (negatively) correlated (Chao and Carr 1993). Clearly, mean dN/dS correlates with generation time (r= 0.814, P= 0.0487, N= 6) in mammals.

Figure 1.

The relationship between estimates of effective population size and lineage-specific mean dN/dS (data from Kosiol et al. 2008). Datapoints in descending dN/dS order are: terminal mouse clade following the split from rat (Ne from Ideraabdullah et al. 2004), the internal primate clade prior to the split of Old and New World monkeys (Ne from Burgess and Yang 2008), the hominid clade prior to the split of human and chimpanzee (Ne from Burgess and Yang 2008), the terminal chimpanzee clade (Ne from Caswell et al. 2008), and the terminal human clade (Ne from Eyre-Walker et al. 2002). dN/dS.

An increasing dN/dS ratio with decreasing effective population size is not compatible with an increasing role of adaptive evolution. Rather, and as mentioned by several authors (Eyre-Walker et al. 2002; Lindblad-Toh et al. 2006; Kosiol et al. 2008; Warren et al. 2008), these observations are consistent with the prediction from population genetics theory of reduced efficiency of purifying selection in small populations. This was originally formulated by Tomoko Ohta in the Nearly Neutral Theory of Molecular Evolution (Ohta 1973, 1992). She showed that mutations would be strongly selected only if s >> 1/4Ne, in practice meaning that drift should be expected to overwhelm the effects of selection on mildly deleterious mutations in small populations. As a consequence, such mutations can accumulate and contribute to amino acid divergence in lineages with small Ne.

Although I here focus on a qualitative rather than quantitative perspective on how the role of selection varies among lineages, an idea of the relative magnitude of the accumulation of slightly deleterious mutations in small populations can be obtained from the analysis of ≈16,000 1:1 orthologues in six mammalian genomes made by Kosiol et al. (2008). They found that mean dN/dS is about twice as high in the human (0.249) and chimpanzee lineages (0.245) than in the mouse (0.127) and rat (0.121) lineages. If one makes the assumption that the majority of nonsynonymous substitutions in rodents are either neutral or advantageous, then about half of all nonsynonymous substitutions in humans and chimpanzee should represent unfavorable alleles that would have been expected to be removed by purifying selection, had population sizes been much larger. Clearly, there are a number of confounding factors associated with this estimate, for example, related to the distribution of fitness effects of new mutations in different lineages (Eyre-Walker and Keightley 2007) and the neutrality of synonymous sites (Chamary et al. 2006). However, there is at least one reason to consider this estimate conservative, as there are both theoretical arguments and empirical data to suggest that adaptive evolution has not been as prevalent in small primate populations as it has in rodents, which will be discussed below. In summary, although natural selection is clearly an important force in protein evolution, a selection model of molecular evolution has to incorporate the contribution of genetic drift as effective population size decreases.

Adaptive Evolution and the Effective Population Size

Population genetics theory stipulates that selection is not only more efficient in removing deleterious alleles from large populations, but the fixation of advantageous alleles is also facilitated. Estimating the extent to which molecular evolution is driven by adaptive processes has until recently proved difficult due a shortage of sequence information on both interspecific divergence and intraspecific diversity. However, thanks to resequencing efforts aimed at the retrieval of large-scale data from multiple individuals, this is now possible. The proportion of nonsynonymous substitutions that represents fixation of advantageous mutations can be estimated by contrasting dN/dS for sequence data on divergence and diversity (where it is denoted pN/pS) data (McDonald-Kreitman 1991; Eyre-Walker 2006). Given that the fixation time of beneficial mutations is expected to be much less than for neutral or slightly deleterious alleles, the former category, by their transient nature, should make a limited contribution to polymorphism data. If dN/dS > pN/pS, then protein evolution is at least partly due to the fixation of favorable alleles. This can be quantified using the simple expression

image

Several factors are known to influence this estimate, including the presence of slightly deleterious mutations segregating in the population, in which case pN/pS cannot be taken as a strict neutral reference (Charlesworth and Eyre-Walker 2008). This is usually dealt with by estimating pN/pS from common alleles only, given that slightly deleterious alleles are less likely to drift to high frequencies (Fay et al. 2001). Moreover, as the incidence of adaptive evolution is likely to vary among genes and gene categories (Welsh 2006), pN/pS and dN/dS should be estimated from a random and sufficiently large set of genes.

Levels of adaptive evolution have in this way been quantified in humans and Drosophila, and to some extent also in Arabidopsis, birds, and bacteria. Estimates of the proportion of amino acid substitutions driven by positive selection vary from zero in the selfing Arabidopsis thaliana (Bustamante et al. 2002), zero (Fay et al. 2001; Mikkelsen et al. 2005; Zhang and Li 2005) up till 10–20% in humans (Gojobori et al. 2007; Boyko et al. 2008), 20% in chicken (E. Axelsson and H. Ellegren, unpubl. data), 30–55% in Drosophila melanogaster and D. simulans (Fay et al. 2002; Smith and Eyre-Walker 2002; Sawyer et al. 2003; Bierne and Eyre-Walker 2004; Welch 2006; Begun et al. 2007; Shapiro et al. 2007; but see Sawyer et al. 2007), and >50% for Escherichia coli and Salmonella enterica (Charlesworth and Eyre-Walker 2006). These estimates are sensitive to a number of assumptions, such as the neutrality of synonymous sites, the constancy of selective constraint, and a stable demographic history. This said, there is an apparent trend of the level of adaptive evolution increasing with increasing population size, lowest in selfers and hominids and highest in bacteria, as predicted.

Conclusions

Mutations can be considered effectively neutral when 4Nes < 1. This creates an interval of (negative and positive) selection coefficients, different for different populations, in which drift largely determines the fate of new mutations (Nei 2005). Mutations that are effectively neutral in a small population will run the risk of loss due to drift may be positively selected to reach fixation in a large population. Conversely, mutations that are harmful and selected against in large populations will behave as neutral in small population and may eventually be fixed. There were empirical data from limited datasets to support a negative correlation between the fraction of nonsynonymous substitutions eliminated by selection and generation time (which covaries with effective population size) already before the first mammalian genome sequences became available (Keightley and Eyre-Walker 2000). Moreover, human population geneticists do acknowledge that the evolution of our own species has been heavily influenced by demography. However, it is my impression that the idea that genetic drift plays an important role in the molecular evolution of protein-coding sequences has not been acknowledged in full by all evolutionary biologists. The new data from comparative genomics are thus important in providing a more nuanced picture of the molecular evolutionary process.

One implication of these results is that there will not be just one model of molecular evolution applicable across species. This is also emphasized by Hahn (2008) in his selectionist view of molecular evolution, in which compelling evidence against many of the assumption of the Neutral Theory are usefully provided. However, Hahn concludes that “In fact, whatever the proximate causes of deviations from neutrality, the ultimate results are likely to be the retardation of adaptation and the fixation of mildly deleterious mutations (Hill and Robertsson 1966).” It is thus important to be clear in that a selection model of molecular evolution incorporating the effective population size does not necessarily give credit to the Neutral Theory; it simply suggests that adaptive evolution is more prevalent in large populations and that natural selection is less efficient in removing slightly deleterious mutations in small populations. What we now need is both theoretical work that can predict the molecular evolutionary process under various demographic and ecological scenarios, and genomic data from a wide variety of organisms to test these predictions.

Following from the observations discussed here, lineages characterized by large ancestral effective population sizes could be considered more adapted due to higher rates of fixation of beneficial alleles and lower rates of fixation of slightly deleterious alleles (Eyre-Walker and Keightley 1999; Theodorou and Couvet 2006). In the long run, this may ultimately place small populations in genetic peril and increase the risk of extinction (Popadin et al. 2007), an issue that should be taken into consideration in relation to an increasing anthropogenic influence on biodiversity.

Associate Editor: M. Rausher

ACKNOWLEDGMENTS

Economical support from the Swedish Research Council is acknowledged. I am grateful to comments made by M. Jakobsson, M. Lascoux, J. Mank, and J. Wolf.

Ancillary