Patterns of epistasis in RNA viruses: a review of the evidence from vaccine design


Christina L. Burch, Department of Biology, CB No. 3280, University of North Carolina, Chapel Hill, NC 27599, USA.
Tel.: 919-843-2691; fax: 919-962-1625;


Epistasis results when the fitness effects of a mutation change depending on the presence or absence of other mutations in the genome. The predictions of many influential evolutionary hypotheses are determined by the existence and form of epistasis. One rich source of data on the interactions among deleterious mutations that has gone untapped by evolutionary biologists is the literature on the design of live, attenuated vaccine viruses. Rational vaccine design depends upon the measurement of individual and combined effects of deleterious mutations. In the current study, we have reviewed data from 29 vaccine-oriented studies using 14 different RNA viruses. Our analyses indicate that (1) no consistent tendency towards a particular form of epistasis exists across RNA viruses and (2) significant interactions among groups of mutations within individual viruses occur but are not common. RNA viruses are significant pathogens of human disease, and are tractable model systems for evolutionary studies – we discuss the relevance of our findings in both contexts.


Resolving the way in which deleterious mutations interact in their effects on fitness is crucial to many problems in evolutionary biology and population genetics (Lynch & Gabriel, 1990; Kondrashov & Crow, 1991; Kondrashov, 1993; Partridge & Barton, 1993; Barton & Charlesworth, 1998; Wolf et al., 2000; Kondrashov & Kondrashov, 2001). Under the multiplicative model, mutations act independently and the total fitness of a genotype equals the product of the fitness effects of its component mutations. Independent effects of deleterious mutations can be expressed graphically as log fitness that decreases linearly with increasing mutation number (Fig. 1). When mutations do not act independently, defined as epistasis, the fitness effect of a mutation can change depending on the total abundance of deleterious mutations in the genome. Epistasis changes the relationship between log fitness and mutation number, making it curvilinear (Fig. 1). Synergistic (negative) epistasis results if deleterious mutations are more harmful together than would be expected from their separate effects, and produces accelerated fitness loss with increasing mutation number. In contrast, antagonistic (positive) epistasis produces decelerated loss of fitness as deleterious mutations accumulate.

Figure 1.

Hypothetical fitness effects of increasing deleterious mutation number. Each curve is based on the general model described in the text. The solid line shows multiplicative effects, where each additional mutation causes a comparable reduction in fitness. The dashed line shows synergistic epistasis; each additional mutation causes a disproportionate fitness reduction. The dotted line represents antagonistic epistasis; each additional mutation causes a diminished reduction in fitness.

The predictions of several prominent theories in evolutionary biology depend critically on the nature of epistasis. Synergistic epistasis allows sexual reproduction to be favoured over asexual reproduction because sex creates mutational combinations of very low fitness that are more quickly eliminated by selection (the mutational deterministic hypothesis; Kondrashov, 1988; but see Agrawal, 2001; Siller, 2001). Thus, synergistic epistasis provides a possible explanation for the widespread occurrence of sex despite its associated costs (Maynard Smith, 1978). Synergistic epistasis also can explain how outcrossing allows species with relatively small local or inbred populations (for instance, mammals and trees) to survive the accumulation of deleterious mutations through genetic drift (Peck et al., 1997). Compensatory adaptation (restoration of fitness) may occur faster under synergistic epistasis because it allows the marginal effect of each compensatory mutation to be greater (Moore et al., 2000). In contrast, if epistasis is nonexistent or antagonistic, a number of the above predictions can be reversed. For example, antagonistic epistasis favours the evolution of asexual over sexual reproduction (Charlesworth, 1990; Otto, 1997). Although the existence of epistasis has been firmly established through population genetic studies (Fenster et al., 1997), much less is known about the extent and form of epistatic interactions (Otto, 1997; Barton & Charlesworth, 1998; West et al., 1999).

Theory has suggested mechanisms for generating both forms of epistasis, but empirical studies have yet to reveal any consistent pattern (Szathmary, 1993; Fenster et al., 1997; Peck & Waxman, 2000). It is not known whether the lack of pattern is real or simply the result of experimental and statistical limitations. Limitations arise both from difficulties inherent in conducting and interpreting empirical studies of epistasis (West et al., 1999), and from an inability to measure fitness in the natural environment of study organisms (Waxman & Peck, 1999). As reviewed elsewhere (West et al., 1999; Peters & Keightley, 2000), studies of epistasis in most organisms have taken two general forms: (i) examining the expected distribution of fitnesses among sexually produced offspring of a pair of parents, or (ii) directly determining whether or not log fitness is a linear function of mutation number. The first approach can determine the average form of epistasis, but does not estimate the average magnitude of epistasis between pairs of mutations, or whether the form and magnitude of epistasis varies among mutation pairs. The second approach, when pursued by examining fitness effects of known mutations alone and in combination, provides a more rigorous test of epistasis because the average magnitude and variance of epistasis can be determined.

Empirical studies of epistasis

The earliest examination of log fitness as a function of mutation number comes from a mutation accumulation experiment in Drosophila melanogaster (Mukai, 1969). In that experiment, competitive viability of a wild type chromosome (relative to a nonrecombining balancer chromosome) was shown to decrease faster with increasing generations of mutation accumulation, suggesting a nonlinear decline in log fitness. However, generation number provides only an indirect estimate of mutation number and allows alternative explanations for the apparent decline in viability, such as selective improvement of balancer chromosomes or nonlinear increases in mutation rates caused by instability of transposon copy number (Keightley, 1996; Nuzhdin et al., 1996). More recent studies allowed mutation number to be determined directly. Elena & Lenski (1997) used transposon mutagenesis to generate 225 genotypes of Escherichia coli with varying numbers of mutations, and then measured genotype fitness. No overall pattern in epistasis was found; rather, some mutation combinations showed synergistic epistasis whereas others displayed antagonistic epistasis. Similarly, de Visser et al. (1997) created lines of Aspergillus niger with different combinations of known deleterious mutations and found no epistasis on average. However, the latter data cannot exclude synergistic epistasis because all the possible mutational combinations were not generated, implying that the worst combinations could have been removed by selection prior to testing.

Whitlock & Bourguet (2000) tested the action of epistasis using a set of deleterious visible mutations in D. melanogaster. Of two fitness components, male mating success and productivity (combination of female fecundity and offspring survivorship), only the latter showed a pattern of strong average synergistic epistasis. However, both fitness measures featured some antagonistic and some synergistic interactions between specific sets of mutations. These experiments provide strong evidence that epistasis between individual pairs of mutations is common and that the form and magnitude of epistasis varies between mutation pairs. They provide little, if any, evidence that the average form of epistasis is synergistic, and do not suggest that the form of epistasis can be generalized across organisms.

Vaccine design and interactions among deleterious mutations

One rich source of data on the interactions among deleterious mutations that has gone untapped by evolutionary biologists is the literature on the design of live, attenuated vaccine viruses. Rational vaccine design depends upon the measurement of individual and combined effects of deleterious (attenuating) mutations. Appropriate mutations can then be sequentially added to a vaccine virus to minimize reactogenicity and the likelihood of reversion to virulence, while maintaining satisfactory immunogenicity (Murphy & Chanock, 2001). These manipulations generally utilize modern molecular biology techniques such as reverse genetics and site-directed mutagenesis, and most have been conducted in the last decade. In addition, the rigorous requirements of vaccine development necessitate in vivo (naturalistic) measures of viral fitness. Thus in creating vaccines for viruses such as dengue, influenza and human immunodeficiency virus (HIV), virologists often inadvertently conduct an ideal test of epistasis. Indeed, reports of such studies have specifically noted that the effect of combined mutations could not always be predicted by the effect of single mutations (e.g. Skiadopoulos et al., 1998) implicitly acknowledging the action of epistasis. Here we compile and analyse data from vaccine-related studies involving RNA viruses, to determine the occurrence and nature of epistatic mutations in these pathogens.


Review of the literature

In order to analyse the evidence for epistasis in RNA viruses, we first reviewed the biomedical literature using Medline (National Library of Medicine, National Institutes of Health, USA) to identify studies directly or indirectly related to development of live-attenuated RNA virus vaccines. Additionally, we traced relevant references included in the studies identified on Medline. We included only those studies which focused on deleterious mutations and that measured: (i) fitness of a ‘wild type’ virus (although in some cases this virus contained other mutations which were not manipulated during the study), (ii) fitness effects of individual mutations, groups of mutations, or segments bearing mutations in the otherwise ‘wild type’ background, and (iii) fitness effects of some combination of paired mutations in the same ‘wild type’ background. In some cases, the measurement of the effect of combined mutations preceded the measurement of the effect of individual mutations. We included studies that featured a variety of types of mutation, including substitutions and deletions in coding and noncoding regions of the genome, but we excluded studies of whole-gene deletions. Although we do not claim to have included every relevant study on RNA viruses, we have collected a representative and unbiased sample of the literature.

Although our literature search was unbiased, the possibility exists that the mutations targeted in studies of vaccine design might comprise a biased set. The mutations in our literature sample can be grouped into three categories: (i) site-directed mutations in genes of known function, (ii) site-directed mutations in conserved regions of the viral genome, and (iii) mutations acquired during selection experiments, such as adaptation of viruses to cold temperature or to nonhuman tissue culture. Studies of mutations in categories (i) and (ii) rarely had an a priori expectation as to whether the mutations would be of large effect, and in fact many were of minor effect (Brown et al., 1999). In category (iii), the studies often sought to identify mutations responsible for attenuation in humans and, therefore, screened mutations of both major and minor effect. Even so, it seems likely that our literature sample will contain an excess of mutations with obvious effects on viral fitness, and consequently, a deficiency of mutations with barely detectable effects. However, mutations with catastrophic effects on viral fitness will also be under-represented in our data sample because viruses that were not viable or that replicated to very low titres could not be used in the phenotypic assays presented in the compiled studies. Thus, bias in our data set is expected to be against both extremes of fitness effects, making our sample strongly representative only of mutations with intermediate effects on fitness.

We collected data from 29 studies using 14 different viruses (Appendix 1). From each study we recorded the type of each individual mutation, the fitness measures used and the fitnesses of the ‘wild type’ virus, viruses containing individual mutations, and viruses containing combinations of two and (in some studies) three individual mutations. A few studies reported performance measures for viruses containing four or more mutations. These data are described in their entirety in Fig. 2; however, for brevity only the first three mutations are included in Appendix 1.

Figure 2.

Observed effects of deleterious mutation number on fitness in eight pathogenic RNA viruses. Each point represents the fitness, relative to the wild type virus, of an individual mutant carrying the specified number of mutations. The curves represent the best fitting model of the form lnW = αk + βk2 where estimates of α and β were obtained from a bootstrap analysis.
*For the data shown in this figure, it was possible to count the number of nucleotide (or amino acid) substitutions that comprised the mutations listed in Appendix 1. Therefore, this modified count was used for the x-axis shown here.
†There are mutations included in this figure that do not appear in Appendix 1.

Compiling fitness data

As studies of human viruses are limited to measuring performance in vitro or in an animal model, there is no standard method for measuring fitness. From each study, we recorded all the reported measures of viral performance. These measures included, but were not limited to, growth rate, peak titre, lethal dose, infectious dose, percent mortality in animal model, difference in titre between low and high sodium bicarbonate concentrations, difference in titre between low and high temperatures, plaque size and polymerase activity. In some studies, viral titres were measured at a permissive temperature and over a large range of higher (restrictive) temperatures. These cases provide the only exception to our reporting of complete data; rather, we report viral titre at the permissive (baseline) temperature, and the reduction in titre at one higher temperature: 39 °C if mutants showed growth at this temperature, or 38 °C otherwise.

We made an attempt to consider the relevance of the various performance measures to fitness under natural conditions. Performance measures such as replication rate and infectious dose have a very clear relationship to viral fitness. Even measures such as shut off temperature and titre reduction from low to high temperature are clearly related to fitness because they serve as an indicator for the extent to which the virus can spread to warmer areas of the body (e.g. into the warmer lower respiratory tract form the cooler upper respiratory tract) or survive elevated temperatures during host fever. However, the relevance of some performance measures, such as plaque size, to fitness under natural conditions is questionable. Therefore, measures of in vivo replication in an animal model were analysed separately from other performance data, as in vivo replication is the most widely accepted estimator of fitness under natural conditions.

For each performance measure, the natural log of fitness is expressed relative to that of the wild type virus. Log fitness of the single mutants is determined using equation 1:


where rwt and ri are the performance measures (e.g. growth rates), respectively, of the wild type virus and of the virus containing mutation i, and Wi is the relative fitness of the virus containing mutation i. Observed log fitness values for double mutants were calculated in the same manner:


where rij and Wij are, respectively, the performance measure and relative fitness of the virus containing both mutations i and j.

In cases where fitness was too low to be accurately measured, i.e. not distinguishable from zero, the natural log of fitness is undefined. Therefore, we do not report log fitnesses in these cases. If zero values most often occurred in double or triple mutants, our exclusion of these data would bias against a finding of synergistic epistasis by creating a threshold below which fitness cannot fall. However, in our data set zero values occur in single mutants nearly as often as they occur in double mutants (seven vs. nine instances, respectively). In addition, the exclusion of zero values might affect our findings if it produced a tendency to exclude a particular type of performance measure. We examined whether in vivo performance measures were more often excluded because of zero fitness measures than were in vitro performance measures. Thirty percent of the in vivo measures (17 of 56) were excluded, compared with 19% of the in vitro measures (12 of 63). Although these numbers suggest that proportionally more in vivo measures are lost because of zero fitness values, the difference is not statistically significant (χ2 = 2.346, d.f. = 1, P = 0.1256). Thus, we have no a priori expectation of how excluding these data should bias our findings.


Statistical tests for epistasis: the sign test

To test for epistasis using the multiplicative model, we first compared the observed log fitness of viruses containing two mutations with that expected based on the effect of each mutation in isolation. Expected log fitness values for viruses bearing two mutations were generated under the assumption of no epistasis, and were calculated using equation 3 (Elena & Lenski, 1997; Wade et al., 2001):


Each combination of mutations yielded one comparison of observed and expected log fitnesses. For example, consider the first entry in Appendix 1 of mutations in dengue virus (Men et al., 1996). The observed, or measured, log fitness of a virus containing both mutations 1 and 2 was −1.3. The expected log fitness, determined by summing the log fitness of each of the single mutants, was −1.4 because lnW1 + lnW2 = −0.1−1.3 = −1.4. In this case the observed log fitness of −1.3 is greater than the expected log fitness of −1.4, so the interaction between mutations 1 and 2 is antagonistic.

For studies containing combinations of more than two mutations, we analysed only nonintersecting pairs of those mutations, so that no mutation was contained in more than one analysed pair. In this manner, we ensured that the mutation pairs analysed were independent and avoided pseudo-replication. Each comparison was assigned a value of 1, 0 or −1 if the virus carrying two mutations was, respectively, more fit, equally fit (exactly the same value as expected), or less fit than expected. We had no means to distinguish performance measures that accurately reflected viral fitness from those that did not, therefore, our first pass analysis examined all the reported performance measures for each mutation pair. As the different performance measures cannot be considered independently, we used the median comparison value across all fitness measures for each virus. Thus, for each mutation pair examined, synergistic epistasis was concluded only if a majority of performance measures indicated that the observed  ln Wij was less than the expected inline image, and antagonistic epistasis was concluded only if a majority of performance measures indicated the opposite (i.e. inline image).

We then conducted a sign test to determine whether observed and predicted fitness differed in a consistent direction across studies. The sign test was used instead of a more powerful parametric test because the nature of the data, collected by different researchers and using different fitness measures, ensured that measurement errors were not normally distributed. A comparison of the number of mutation pairs that yielded a higher fitness than expected to the number that yielded a lower fitness than expected revealed no significant difference [sign test, all fitness measures: N(1) = 17, N(0) = 3, N(−1) = 11; P = 0.1725]. To determine whether the results of the first analysis were influenced by the inclusion of performance measures that had little relevance to the natural environment, we also conducted a sign test using only the in vivo performance measures. This test also failed to reveal a significant predominance of one type of epistasis [N(1) = 9, N(0) = 1, N(−1) = 4; P = 0.1334].

Statistical tests for epistasis: regression analysis

From studies in which three or more mutations were combined into a single virus, we additionally attempted to detect epistasis by investigating the linearity of the relationship between log fitness and mutation number (Elena & Lenski, 1997; Whitlock & Bourguet, 2000). Linear regressions were performed using log fitness as the dependent variable and mutation number (1, 2, 3 mutations, etc.) as the independent variable. In this regression, log fitness is described by an equation of the form


where Wk is the average fitness of genotypes with k mutations, α < 0 for deleterious mutations, and β defines the interaction between mutations: β < 0 for synergistic interaction, β > 0 for antagonistic interaction (Elena & Lenski, 1997).

We conducted regression analyses using data from 10 studies in which viruses contained at least three deleterious mutations [DEN (two studies), FLU (two studies), HIV, PIV (two studies), REO, RSV, VEEV; Appendix 1). In most cases, the data used for regression analyses were derived directly from Appendix 1. However, in some cases the mutations described in the appendix actually consist of blocks of nucleotide or amino acid substitutions. In these cases, we used the number of nucleotide substitutions as our measure of mutation number where they were reported, and used the number of amino acid substitutions otherwise. It turned out that all the studies containing at least three deleterious mutations investigated the effects of nucleotide substitutions, and none investigated deletions, so we did not have to decide whether large deletions should be counted as single or multiple mutations. The only other deviation from the data reported in Appendix 1 resulted because some studies reported data for many more than the three mutations. Although there was not space to include these additional mutations in Appendix 1, we included data for the additional mutations in the regression analyses.

As the mutants within these studies do not contain independent sets of mutations, use of a least squares linear regression is inappropriate. Therefore, we conducted a bootstrap analysis (Sokal & Rohlf, 1995) in which the data were resampled 1000 times, and estimates of α and β were obtained using least squares regression for each resampled data set. The bootstrap approach produces unbiased estimates of the regression parameters α and β, and their standard errors. This approach maximizes statistical power because it allows us to use all the fitness measures provided by each study, and it is robust to variance heterogeneities which exist in these data. We report here the results of regression analyses performed on only one fitness measure for each virus per study. We did conduct regression analyses using the other reported fitness measures, however, the results did not differ qualitatively from those reported here.

The α and β estimated from each of the 10 regressions are reported in Table 1. Although seven of the 10 regressions yielded positive estimates of the quadratic (interaction) term β, none of the 10 regressions yielded a β that differed significantly from zero. A power analysis (included in Table 1) demonstrates that few data sets had the power to detect interaction effects (β) smaller than 20% of the independent effects (α) of mutations. Thus, the regression analysis was not sufficiently powerful to detect weak epistasis. In sum, the results of the regression analyses parallel the results of the sign test in that they provide no conclusive evidence for directional epistasis.

Table 1.  Regression analysis.
VirusModel parametersBest fit modelP-value*Smallest detectable interaction effect (% of α)†
  1. *P-values are associated with addition of the highest order term to the model. Thus, for models containing only α, P < 0.05 denote a significant fit of the linear parameter α. For models containing both α and β, P-values denote the fit of the quadratic parameter β.

  2. †To determine the power of our analyses, we calculated the smallest detectable interaction effect by dividing the 95% confidence interval for β by α. Thus, the smallest detectable interaction effect is reported as a percentage of the independent (linear) effect of mutations. Any interaction effect weaker than that reported here will not achieve significance in this analysis.

DEN (1)α ln  W = 0.08 − 0.553k<0.0001 
α & β ln  W = 0.083 − 0.512k − 0.022k20.336823.1
DEN (2)α ln  W = −0.615 − 0.340k0.0100 
α & β ln  W = −0.394 − 0.541k + 0.032k20.352234.9
FLU (1)α ln  W = 1.352 − 2.064k<0.0001 
α & β ln  W = 0.591 − 0.548k − 0.494k20.1513195.8
FLU (2)α ln  W = 0.963 − 0.647k0.0108 
α & β ln  W = 2.21 − 0.987k + 0.010k20.483652.7
HIVα ln  W = −0.361 − 0.101k0.0631 
α & β ln  W = −1.296 + 0.117k − 0.008k20.159115.9
PIV (1)α ln  W = −0.361 − 0.101k0.0631 
α & β ln  W = 1.359 − 3.553k + 0.345k20.123818.8
PIV (2)α ln  W = −3.193 − 0.305k0.1721 
α & β ln  W = −2.768 − 0.755k + 0.040k20.384540.0
REOα ln  W = 1.401 − 1.184k<0.0001 
α & β ln  W = 1.612 − 1.282k + 0.008k20.453012.0
RSVα ln  W = −1.036 − 1.256k0.0098 
α & β ln  W = −0.531 − 2.469k + 0.449k20.217952.3
VEEVα ln  W = −0.031 − 0.076k<0.0001 
α & β ln  W = 0.117 − 0.138k + 0.005k20.335320.2


The negative results of the sign tests and the regression analyses can be interpreted in a number of ways. As is the intention, both statistical tests will yield negative results if interactions are not sufficiently common or sufficiently strong to be detected. However, interactions could be common, but equally divided between antagonism and synergism, and sign tests and regression analyses would still be unable to detect epistasis.

We are unable to differentiate between these two possibilities using the existing data. Further, we have only limited power even to suggest that the net effect of epistasis is small. The sign test is especially weak. A power analysis shows that, given 28 observations that deviate from the null expectation (as in the ‘all fitness measures’ data set; see Results), a sign test could reject the null hypothesis only if 68% or more of the 28 interactions were of one type (1 or −1). Thus, epistatic effects could be fairly strong and mostly of one type and still be undetectable by a sign test. Although the regression analyses also showed an inability to detect epistasis, they are more descriptive than the sign test because they allow us to estimate the net effect of epistasis. When we consider the magnitude of β (the interaction effect) relative to the magnitude of α (the independent effect), the median value of β/α is 0.059. Thus, the net effect of epistasis is small, only 5.9% of the independent effect of mutations. This number is in agreement with findings of other studies of RNA viruses, in which measures of epistatic effects ranged between 5 and 9% (Elena, 1999; Crotty et al., 2001). Detection of epistatic effects this small in magnitude would require both the use of numerous mutation combinations, and an extremely accurate measure of fitness. Few, if any, of the studies in our data set meet both these criteria.

Although we could detect no overall trend, it is clear that there are interactions between particular pairs of mutations within our data set. Some studies reported both mean values and standard errors for each of their performance measures. In these cases it is possible to determine the existence of a significant interaction effect on a case by case basis, for example, the first combination of mutations examined in one of the DEN data sets (Hanley et al., 2002). This mutation combination yielded an observed log fitness of −0.46 ± 0.21 and an expected log fitness of −4.14 ± 0.48 (titre in mouse brain). The large difference between these values is statistically significant (t20 = 7.01, P < 0.0001) and indicates antagonistic epistasis. Likewise, the first mutation combination examined in RSV (Whitehead et al., 1998) yielded an observed log fitness of −4.14 ± 0.55 and an expected log fitness of −6.91 ± 0.74 (titre in mouse nasal turbinates). These values also differ significantly (t16 = 3.0, P = 0.0071), indicating antagonistic epistasis. These mutation pairs are just two of the many pairs that exhibit significant interactions.

Overall, our findings are consistent with other empirical studies in which no evidence, or mixed evidence for epistasis was detected (de Visser et al., 1997; Elena & Lenski, 1997; Elena, 1999; Whitlock & Bourguet, 2000). However, the limitations of the data should be considered. First, and most importantly, the mutations analysed were not chosen randomly. Many studies chose mutations in particular genes of known function, so that mutations were not distributed randomly across the genome. This nonrandom sampling of mutations would bias the results, for example, if many mutation combinations were within individual genes, and if such combinations were more likely to interact than mutations between genes. As the mutation combinations included in our study only rarely consisted of mutations within a single gene, we have no expectation that these mutations are more likely to show interactions than a random sample. Secondly, many studies used multiple measures of fitness, and for a given pair of mutations these different measures sometimes yielded inconsistencies in terms of expected fitness. This pattern is not uncommon. Whitlock & Bourguet (2000) detected epistasis in a set of mutations affecting productivity in Drosophila, but the same mutations were additive in their effects on male mating success. These observations suggest that studies of epistasis should incorporate multiple measures of fitness. Thirdly, the studies of human pathogens did not measure fitness in the natural host or, where relevant, in the natural vector used for transmission. Rather, the studies used a combination of proxy measures of fitness (e.g. replication in vitro) and replication in animal models. However, these measures represent the best available approximation of attenuation in humans and are used to guide vaccine design. Moreover, measures of replication are measures of Darwinian fitness, albeit in an experimentally circumscribed environment. Nonetheless, the data analysed here suggest that no consistent tendency towards a particular form of epistasis between deleterious mutations exists across RNA viruses.

RNA viruses are important to biomedical scientists as pathogens, and to evolutionary biologists as tractable model systems (Morse, 1994; Turner, 2003). Our inability to find a consistent pattern of epistasis between deleterious mutations, despite the existence of interactions between particular mutation pairs, has implications for both medicine and evolutionary biology. From a biomedical perspective, evolutionary models can be used to inform vaccine design regardless of the average form of epistasis, as long as interactions of both forms exist. Vaccine design differs from other applications because researchers can choose the particular mutations used to create the vaccine virus. Thus, evolutionary theory can be used to predict the consequences of choosing to incorporate a greater number of antagonistically or synergistically interacting mutations into the vaccine virus. Specifically, evolutionary models can inform vaccine design by predicting the effects of epistasis on the persistence of vaccine-derived viruses and their reversion to virulence. That such reversion is a significant threat was highlighted by the recent cases of paralytic polio in Haiti and the Dominican Republic. These infections were caused by vaccine-derived virulent poliovirus that evolved after vaccination strategies failed to confine the attenuated virus (CDC, 2000).

Incorporation of mutations with antagonistic interactions into candidate vaccines could reduce persistence and reversion to virulence in a number of ways. Recombination or co-infection with other components of a multiple-component vaccine, with endogenous virus, or with a circulating wild type virus could contribute to persistence of the vaccine virus genome and to reversion to virulence. However, the use of antagonistically interacting mutations would reduce these risks because the fitness improvement resulting from recombination or co-infection is smaller when interactions between attenuating mutations are antagonistic (Kondrashov, 1988). Reversion to virulence can also occur through mutations that compensate for the effects of attenuating mutations. The use of antagonistically interacting mutations will reduce this risk as well, because compensatory mutations have been shown to yield smaller fitness improvements when interactions between attenuating mutations are antagonistic (Moore et al., 2000). Finally, the use of antagonistically interacting mutations in vaccine design makes intuitive sense, because fitness effects diminish with mutation number, allowing more mutations to be incorporated into the viral genome, thereby safeguarding attenuation (Murphy & Chanock, 2001). Thus, further studies of epistasis in RNA viruses will not only clarify evolutionary patterns but also generate relevant information for the creation of vaccines. Unlike vaccine design, applications in evolutionary biology (such as species conservation) require knowledge of the general form of epistasis rather than just the form of epistasis between selected mutation pairs. In conservation biology, epistasis plays a role in determining the rate of fitness loss via genetic drift in small populations (Kondrashov, 1994; Schultz & Lynch, 1997). It is difficult to accurately determine the form of epistasis in most endangered populations because they are typically composed of organisms featuring long generation times; rather, several attempts have been made to determine the form of epistasis in model systems. Unfortunately, the absence of a predictable pattern for epistasis in tractable models (e.g. this study; Elena & Lenski, 1997, de Visser et al., 1997) makes it difficult for these studies to inform conservation policy and other applications. More generally, evolutionary biologists should reconsider hypotheses that assume any overall form of epistasis; at least among RNA viruses there is little support for such assumptions.


We thank Christian Mandl, Thomas Kinney and Alan Engelman for kindly providing numerical data sets that served as the basis for published graphical figures. We thank Timothy Wright for careful review of the manuscript. P.E.T. acknowledges financial support from the US National Science Foundation.


Table Appendix1. 
Virus* [Wild type]†Type of mutation‡Fitness measure ln relative fitness of virus with mutation§
1231231 + 22 + 31 + 31 + 2 + 3
  1. *DEN, dengue virus; FLU, influenza virus type A; HN, Hanta virus; HIV, human immunodeficiency virus; IBDV, infectious bursal disease virus; PIC, Pichinde virus; PIV, parainfluenza virus; PV, poliovirus; REO, Reovirus; RSV, respiratory syncytial virus; SeV, Sendai virus; SVDV, swine vesicular disease virus; TBEV, tick-borne encephalitis virus; VEEV, Venezuelan equine encephalitis virus.

  2. †‘Wild type virus’: designation of virus strain to which mutations were added; not necessarily a true wild type.

  3. ‡Individual mutations are reported exactly as they are reported in the primary literature, thus large deletions and substitutions are often counted as a single mutation. SS, small (≤3 nt) substitution; SD, small (≤3 nt) deletion; LS: large (>3 nt) substitution; LD, large (>3 nt) deletion; LI, large (>3 nt) insertion; SC, addition of stop codon; U, unknown number of changes.

  4. §Natural log of the ratio of the fitness of the mutated virus to the fitness of the ‘wild type’ virus.

  5. ¶na, not applicable.

  6. **undef, undefined because fitness of the mutated virus was not distinguishable from zero.

  7. ††nm, not measured.

  8. ‡‡These mutations are not independent of other mutation pairs in the study. Nonindependent mutation pairs were not included in the sign test, but were included in regression analyses.

DEN [WtDEN4]LDLD Plaque size [C6/36 cells]−0.10−1.30na¶−1.30nanana
Men et al. (1996)
DEN [16681]SSSSSS% Mouse mortality−6.14−2.64−0.15undef**−2.64−1.54undef
Butrapet et al. (2000)   Peak titre [LLC-MK2 cells]−0.36−0.26−0.470.07−0.680.01−0.81
   Peak titre [C6/36 cells]−6.40−5.93−0.54−10.60−4.87−3.03−9.81
   Plaque size [LLC-MK2 cells]−0.39−0.48−0.44−1.04−0.96−1.13−1.58
DEN4 [rDEN4]SSLD Titre at 35 °C [Vero cells]0.00−2.99na−2.53nanana
Blaney et al. (2001)   Titre at 38 °C/titre at 35 °C [Vero cells]−0.92−3.91na−3.68nanana
   Titre at 35 °C [Huh-7 cells]0.00−2.53na−2.53nanana
   Titre at 38 °C/titre at 35 °C [Huh-7 cells]−0.46−3.68na−2.76nanana
   Titre in mouse brain−0.46−6.68na−7.14nanana
DEN4 [rDEN4]SSLSLSTitre at 35 °C [Vero cells]−3.22−6.45−2.53−2.30na−4.14na
Hanley et al. (2002)   Titre at 38 °C/titre at 35 °C [Vero cells]−1.150.46−0.69−0.69naundefna
   Titre at 35 °C [HuH-7 cells]−2.76−6.22−1.84−1.15na−2.76na
   Titre at 38 °C/titre at 35 °C [HuH-7 cells]−1.61−2.531.84undefna−1.61na
   Titre in mouse brain−2.99−1.15−1.38−0.46na−5.30na
Flu [A/WSN/33]SSSS Neuraminidase (NA) activity at 33 °C [COS-1 cells]0.26−1.29na−1.53nanana
Basler et al. (1999)   NA activity at 39.5 °C/NA activity at 33 °C−0.05−0.46na−0.75nanana
SSSS NA activity at 33 °C [COS-1 cells]−3.77−3.73naundefnanana
   NA activity at 39.5 °C/NA activity at 33 °C−2.04−2.48naundefnanana
Flu [A/WSN/33]SSSS Titre at 40 h p.i. [MDCK cells]−0.23−5.76na−8.06nanana
Fodor et al. (1998)           
FLUSSSSSSShut off temperature [MDCK cells]−0.03−0.03−0.03−0.05−0.03nm††−0.08
[A/LA/2/87 AA wtPB2]   Titre at 32 °C [MDCK cells]−0.46−0.92−1.15−0.92−1.38nm−2.76
Subbarao et al. (1995)   Titre at 39 °C/titre at 32 °C [MDCK cells]−2.53−2.99−5.76−8.98undefnmundef
   Peak titre in hamster nasal turbinates−0.46−0.690.23−3.68−1.38nm−4.61
   % Hamsters with detectable virus in turbinates0.020.020.02−0.050.02nm−0.17
   Peak titre in hamster lungs−1.38−1.84−0.92undefundefnmundef
   % Hamsters with detectable virus in lungs−0.63−0.34−0.15undefundefnmundef
FLU [LA wt]LSLS Titre at 34 °C [MDCK cells]−0.28−0.62na−2.95nanana
Parkin et al. (1997)   Titre at 39 °C/titre at 34 °C [MDCK cells]−2.00−1.63na−10.48nanana
   Titre in mouse nasal turbinates−0.46−1.13na−5.66nanana
   Titre in mouse lungs−4.79undefnaundefnanana
HN+A101 [clone 1]SSSS % Mortality in newborn mice (s.c. inoculation)−0.04−1.62naundefnanana
Ebihara et al. (2000)   1/mean survivial time (i.c. inoculation)−0.13−0.16na−0.20nanana
   Virus titre at day 2 p.i. [MBMEC cells]−0.69−1.38na−0.46nanana
   Virus titre at day 8 p.i. [MBMEC cells]0.23−1.15na−1.61nanana
HIVSSSS Peak RT activity [Jurkat cells]−0.58−1.68na−7.27nanana
[pNL43/Xma1]SSSS Peak RT activity [Jurkat cells]−0.34−0.50na−0.01nanana
Brown et al. (1999)SSSS‡‡ Peak RT activity [Jurkat cells]−0.050.07na−0.43nanana
LS‡‡LS Peak RT activity [Jurkat cells]−0.03−0.05na−0.52nanana
LS‡‡LS‡‡ Peak RT activity [Jurkat cells]−0.02−0.02na−0.94nanana
LS‡‡LS‡‡ Peak RT activity [Jurkat cells]−1.31−2.16na−6.02nanana
LSLS Peak RT activity [Jurkat cells]−0.76−0.11na−1.82nanana
LS‡‡LS‡‡ Peak RT activity [Jurkat cells]−0.96−1.02na−6.28nanana
IBDV [rCEF94]LSLS Mean titre after transfection [QM5 cells]−4.61−0.69na−9.21nanana
Boot et al. (2001)   Peak titre after 3 passages [QM5 cells]−2.07−0.23na−2.30nanana
PIC [S18L18]SSSS Days of fever [guinea-pig]−0.32−1.30na−1.70nanana
Zhang et al. (2001)   % Original body weight at day 13 [guinea-pig]−0.30−0.44na−0.36nanana
   Titre in guinea-pig serum day 12 p.i.−4.61−9.44na−9.21nanana
   Titre in guinea-pig spleen day 12 p.i.−4.84−11.97na−11.51nanana
PIV [rPIV3]SSSSSSTitre at 32 °C [LL-MK2 cells]−3.45−2.99−3.68−3.45−3.91−6.91−4.14
Skiadopoulos et al. (1998)   Titre at 39 °C/titre at 32 °C [LL-MK2 cells]−0.92−2.07−6.45undef−3.22undefundef
   Titre in hamster nasal turbinates 4 days p.i.−1.84−5.76−7.14undef−6.10−9.21−6.10
   Titre in hamster lungs 4 days p.i.−3.68−5.41−2.65undef−4.03−8.40undef
PIV [rPIV3]LSSS Titre at 32 °C [LL-MK2 cells]−0.92−0.69na−0.46nanana
Skiadopoulos et al. (1999)   Titre at 39 °C/titre at 32 °C [LL-MK2 cells]−3.45−9.44naundefnanana
   Titre in hamster nasal turbinates−0.23−5.99na−4.84nanana
   Titre in hamster lungs−0.23−8.98na−8.98nanana
PIV [rPIV3]LSLS Titre at 32 °C [LLC-MK2 cells]−0.23−2.30na0.23nanana
Tao et al. (2000)   Titre at 38 °C/titre at 32 °C [LLC-MK2 cells]−8.52−0.46na−10.59nanana
   Titre in hamster nasal turbinate day 4 p.i.−3.22−3.22na−8.06nanana
   Titre in hamster lung day 4 p.i.−10.82−7.83na−11.51nanana
PV [PV1(M)pDS306]LSLS Lesion score [monkey]−3.90−0.20na−3.57nanana
Omata et al. (1986)   Spread value [monkey]−3.210.01na−2.92nanana
   % Monkey paralysisundefundefnaundefnanana
   Titre at high [Na(CO3)2]/titre at low [Na(CO3)2]−9.65−1.36na−8.96nanana
   Titre at 40 °C/titre at 36 °C−8.13−5.09naundefnanana
   Plaque size [monkey kidney cells]−0.63−0.10na−0.83nanana
LS‡‡LS‡‡ Lesion score [monkey]−1.78−3.12na−3.57nanana
   Spread value [monkey]−1.20−2.15na−2.92nanana
   % Monkey paralysis−1.79undefnaundefnanana
   Titre at high [Na(CO3)2]/titre at low [Na(CO3)2]−0.46−10.50na−8.96nanana
   Titre at 40 °C/titre at 36 °C−4.14−9.60naundefnanana
   Plaque size [monkey kidney cells]−0.17−0.90na−0.83nanana
LS‡‡LS‡‡ Lesion score [monkey]−1.24−1.13na−3.57nanana
   Spread value [monkey]−0.70−0.61na−2.92nanana
   % monkey paralysis−1.79undefnaundefnanana
   Titre at high [Na(CO3)2]/titre at low [Na(CO3)2]−2.28−7.16na−8.96nanana
   Titre at 40 °C/titre at 36 °C−6.88−10.43naundefnanana
  Plaque size [monkey kidney cells]−0.13−1.27na−0.83nanana
PV [S1F/480A/6203U]SSSS Fraction macaque paralysis0.290.00naundefnanana
McGoldrick et al. (1995)   Lesion score [macaque]−0.09−0.02na−0.40nanana
   Titre at 38 °C/titre at 35 °C [BGM cells]−0.46−0.23na−0.46nanana
REO [wt REO st3]SSU Plaquing efficiency at 39 °C−4.99−7.17na−12.90nanana
Roner et al. (1997)   Plaquing efficiency at 30 °C [L929 cells]       
REO [T1L/tsA279 hybrids]UU Plaquing efficiency at 39 °C [L929 cells]−0.35−0.33−1.65nmnm−0.21−0.15
Hazelton & Coombs (1995)   Plaquing efficiency at 40 °C [L929 cells]−0.601.780.43nmnm0.790.60
RSV [rA2]LDLS Titre in cotton rat nasal turbinateundefundefnaundefnanana
Cheng et al. (2001)   Titre in cotton rat lung−3.57−3.75naundefnanana
   Virus shedding from infected monkey−0.75−0.75na−1.04nanana
   Peak titer in monkey nasopharyngeal swab−8.01−5.32naundefnanana
   Peak titre in monkey tracheal lavage fluid−7.91−5.35na−8.85nanana
RSV [rB/HPIV3]LILI Peak titre in monkey nasopharyngeal swab−2.76−1.84na−3.91nanana
Schmidt et al. (2002)   Peak titre in monkey tracheal lavage fluid0.920.23na−0.23nanana
LILI Peak titre in monkey nasopharyngeal swab−2.99−3.91na−3.45nanana
   Peak titre in monkey tracheal lavage fluid1.380.46na0.23nanana
RSV [rA2]SSLSSSTitre at 33 °C [HEp-2 cells]0.600.00−2.12−3.50nana−0.23
Tang et al. (2002)   Titre at 39 °C/titre at 33 °C [HEp-2 cells]−0.18−0.92undef0.76nanaundef
   Titre in cotton rat lungnd0.23−7.53ndnanand
LSSS Titre at 33 °C [HEp-2 cells]−2.30−0.51na−2.76nanana
   Titre at 39 °C/titre at 33 °C [HEp-2 cells]0.53undefnaundefnanana
   Titre in cotton rat lungnd0.23na−5.78nanana
RSV [rA2CP]SSSSSSTitre in mouse nasal turbinates−2.76−4.14−0.92−4.14nm−2.99−4.14
Whitehead et al. (1998)   Titre in mouse lungs−1.38−3.451.38−4.61nm−0.69−4.61
RSV [rA2CP]   Titre in mouse nasal turbinates−1.40−1.40−1.80undefnaundefna
Whitehead et al. (1999)   Titre in mouse lungs−0.20−1.20−2.30undefnaundefna
SeV [rSeV]SCSC Titre in chicken eggs (day 9)−2.30−2.30na−15.61nanana
Latorre et al. (1998)           
SVDV [vSVLS201M00]SSSS Fraction of pigs showing vesicles0.00−0.92na−1.61nanana
Kanno et al. (2001)   Lesion score of pigs at 10 days p.i.−0.24−2.30na−2.48nanana
   Serum titre of pigs at 14 days p.i.−0.74−0.83na0.16nanana
TBEV [wtNeudoerfl]SSSS Titre at 12 h p.i. [CE cells]−0.690.623.26−0.360.30nmnm
Mandl et al. (2000)   Titre at 21 h p.i. [CE cells]0.650.251.08−1.950.63nmnm
   1/Lethal dose 50 [5-week-old mice]−2.860.80−10.36−7.30−4.10nmnm
   1/Infectious dose 50 [5-week-old mice]−3.320.00−4.80−3.07−2.59nmnm
TBEV [Vs-c]SSSS Plaque size [PS cells]−0.300.05na−0.23nanana
Gritsun et al. (2001)   CPE (% cell destruction after 48 h; PS cells)−0.410.00na−1.10nanana
   Kill rate = 1/average mouse survival time−0.060.01na−0.19nanana
   % Mouse mortality−0.56−0.15na−0.75nanana
VEEV [VE/IC-115]SSSSLS% Mouse mortality following i.c. challengenm−1.390.00−0.47−0.690.00−0.13
Kinney et al. (1993)   % Mouse mortality following i.p. challenge0.00undef0.00undefundef−0.47−0.98
   Plaque size [Vero cells]−0.32−1.04−0.15−0.89−0.53−0.35−0.41