The search for the alleles that matter, the quantitative trait nucleotides (QTNs) that underlie heritable variation within populations and divergence among them, is a popular pursuit. But what is the question to which QTNs are the answer? Although their pursuit is often invoked as a means of addressing the molecular basis of phenotypic evolution or of estimating the roles of evolutionary forces, the QTNs that are accessible to experimentalists, QTNs of relatively large effect, may be uninformative about these issues if large-effect variants are unrepresentative of the alleles that matter. Although 20th century evolutionary biology generally viewed large-effect variants as atypical, the field has recently undergone a quiet realignment toward a view of readily discoverable large-effect alleles as the primary molecular substrates for evolution. I argue that neither theory nor data justify this realignment. Models and experimental findings covering broad swaths of evolutionary phenomena suggest that evolution often acts via large numbers of small-effect polygenes, individually undetectable. Moreover, these small-effect variants are different in kind, at the molecular level, from the large-effect alleles accessible to experimentalists. Although discoverable QTNs address some fundamental evolutionary questions, they are essentially misleading about many others.

Many lines of inquiry in evolutionary biology share the goal of identifying the allelic variants that underlie phenotypic variation and divergence. In fields from evo-devo to population genetics, the hope is that the identities of the functional variants will reveal the position of nature in the parameter space defined by the extremes of our models: additivity versus pervasive epistasis, pleiotropy versus modularity, oligogenic versus polygenic adaptation, micro- versus macromutation, common versus rare alleles, protein coding versus cis-regulatory, balancing selection versus mutation-selection balance. If only we could put our hands on the actual causal variants, the quantitative trait nucleotides (QTNs), maybe we could put these tired old debates to bed (Tanksley 1993; Orr 1999; Barton and Keightley 2002; Phillips 2005; Mitchell-Olds et al. 2007; Stern and Orgogozo 2008). This is the QTN program, and its admirable commitment to empiricism so dominates research in molecular evolutionary genetics that its premises are rarely questioned. By broad consensus, all we need to do to answer our questions is to identify the alleles that affect phenotypes. At some point, the catalog of QTNs will be sufficiently large that patterns and their interpretations will be obvious to all. The major debate currently seems to be over the question of whether we have already arrived at that point or whether we need to collect more QTNs (Pennisi 2008, 2009).

For the QTN program to succeed, the allelic variants it discovers must be representative examples of the underlying pool of QTNs. I argue that this condition is rarely met, and, perhaps, cannot be met. Progress requires that we carefully distinguish between questions answerable by the QTN program and those that demand alternative approaches.

Statement of the Problem

In January 1848, James Marshall found gold flakes in the millrace of John Sutter's saw mill. Within months, news of the discovery leaked and the rush was on. Thousands left home, rounding the Cape, crossing the Isthmus, or joining the wagon trains headed west. Soon the easy pickings were gone, and consortia of miners banded together to blast more flakes from the hills. Extraction technologies proliferated: first rockers and long toms, then gravel dredges, and finally hydraulic mining, which washed whole mountains through giant sluices to recover dense gold flakes from the riffles.

Modern day QTN prospecting is the Sierra Nevada of the 1850s. The shiny (Mendelian) nuggets are rapidly being collected, and ever larger teams of researchers with ever more powerful technologies are now probing whole genomes to find their quarry. But visible flakes of placer gold represent a small fraction of the global gold reserve; most gold is in microscopic particles concealed in low-grade ore (Mudd 2007). These particles are immune to mechanical separation. If the stuff of evolution is often alleles of microscopic effect, large-effect nuggets can tell us little about the material basis for evolution. All of the questions that the QTN program promises to answer are confounded by a more basic question: what is the phenotypic effect-size distribution of evolutionarily relevant mutations?

Although our current catalog of QTNs has provided insights into both the evolutionary forces and the functional mechanisms by which alleles shape phenotypic variation and divergence (Stern and Orgogozo 2009), it represents a biased sample of evolutionary causes and molecular functions. The answers the catalog provides may not be germane to many of the questions we asked in the first place. More general answers about ultimate and proximal causes of phenotypic variation and evolution may be resting undiscovered in the piles of waste rock tailings recklessly strewn by our QTN-mining machinery.

The problem of ascertainment bias is not a new one: it was a focus of Lewontin's 1974 book, The Genetic Basis of Evolutionary Change. The problem, in its basic formulation, is what Lewontin termed an epistemological paradox: “What we can measure is by definition uninteresting and what we are interested in is by definition unmeasurable” (p. 23). The difficulty through much of the 20th century was that the genes underlying phenotypic variation and divergence were detectable only when they had such dramatic effects as to behave as Mendelian genes, with genotypes inferable from phenotypes. These genes were believed to be of little consequence for evolution, according to Lewontin: “the substance of evolutionary change at the phenotypic level is precisely in those characters for which individual gene substitutions make only slight differences as compared with variation produced by the genetic background and the environment.” This micromutationist perspective, with its dismissal of large-effect alleles, was hardly unique to Lewontin (e.g., Charlesworth et al. 1982). It was based on the preceding half century of evolutionary biology, built on the synthesis forged between biometrical and Mendelian genetics. The critical model underlying this synthesis is the infinitesimal, derived from Fisher's polygenic model of inheritance (Fisher 1918), a simple abstraction that attributes continuous variation to very large number of mutations of infinitesimal effect. Although infinitesimal theory has always been technically wrong (there are after all a finite number of nucleotides in a genome), its simplicity facilitated the development of a vast and empirically successful body of quantitative genetics theory (Crow 2008; Hill 2010). And although it was proposed for the sake of its mathematical properties, the infinitesimal model fit well with the genetic interpretation of Fisher's (1930) geometric model of adaptation, which held that mutations that influence many traits are likely to influence some for the worse, so that alleles of small effect are most likely to be net beneficial (see Note 1 in Supporting information). The synthesis, with its infinitesimal model of quantitative genetics and its geometrical model of adaptation, had no room for macromutationist theories that attributed evolution to the spontaneous appearance of mutants or “sports” (Charlesworth et al. 1982).

Cracks appeared in the micromutationist synthesis around 20 years ago (Orr and Coyne 1992). Systematic genome-wide approaches to mapping phenotypically relevant alleles (Lander and Botstein 1989) promised to reveal the quantitative trait loci (QTLs), the elusive genes whose substitutions make slight differences. Over the last two decades, geneticists have discovered one large-effect QTLs after another (see Note 2 in Supporting information), refuting the infinitesimal theory (Orr 1999, 2005a). Gradually, the success of QTL mapping has led to a new consensus, one that views alleles of detectably large effects as the norm and not the exception (e.g., Farrall 2004; Bell 2009).

Why has the pendulum swung so far? Part of the answer lies in the development of a theoretical model that seems to anticipate and justify the importance of large-effect QTLs. The now-standard history of adaptation genetics (Orr 2005a) begins with Fisher's geometric model, which, as noted above, predicts that most beneficial mutations will be of small effect. Kimura (1983) built on Fisher's result, recognizing that small-effect mutations are at great risk of being lost by genetic drift when rare, and consequently mutations of intermediate effect are likely to predominate in adaptive fixation (see Note 3 in Supporting information). Orr (1998a) extended Kimura's result to derive the distribution of effect sizes for a complete adaptive walk. The result is now the textbook model for the genetics of adaptive fixation: the effect-size distribution of adaptive substitutions is approximately exponential, with a few large- and many small-effect mutations, the former typically substituting before the latter.

Orr's model has been widely embraced. The model is elegant and its geometry provides a visual intuition for how adaptation might work. Moreover, empirical data corroborate it: the suite of mutations fixed during adaptation in microbial experimental evolution often follows the predicted distribution and sequence (Bell 2009). Orr's model removes the Mendelian stigma from large-effect QTLs: these are not anomalies—these are part of the stuff of evolution.

The reassessment of large-effect alleles and the revitalization of the geometric model have had salutary effects on the evolutionary genetics research program, and these contributions had the potential to usher in an age of effect-size empiricism and pluralism, as endorsed by Orr and Coyne (1992). Instead, in some research communities, the conclusion that the infinitesimal model does not always hold has been taken to mean that the geometric model always does (see Note 4 in Supporting information). Yet Orr's version of the geometric model, as he has explained at length (Orr 1998a, 2005b), deals with a very specific genetic scenario: a single bout of adaptive evolution to a fixed optimum with no standing variation. Even in that limited case, its claims about the effect-size distribution say little about absolute effect sizes; these are dependent on the dimensionality of phenotype space and on the actual effect sizes of the realizable molecular mutations (Orr 2005b). Nevertheless, the apparent convergence of theory and data behind an exponential effect-size distribution has resulted in a sort of gold fever, where QTL mappers expect every shovelful of data to yield a large-effect nugget.

Below I provide a critique of the large-effect consensus. I first argue that QTNs mapped to date are effectively Mendelian, not simply samples from an exponential distribution, and I describe biases that undermine the utility of QTL data for characterizing the effect-size distribution. Second, I show that theory neither requires nor predicts an abundance of large-effect alleles in most cases, and I introduce several lines of evidence—eQTLs, genome-wide association studies (GWASs), genomic selection results from agriculture, and ubiquitous weak selection—documenting the nearly infinitesimal basis of many quantitative traits. Next, I show that small-effect alleles are different in kind from large-effect alleles at the molecular level, underscoring the challenge of using QTN data to understand the relationship between molecular function and phenotype evolution. None of the arguments presented here is novel. I aim to introduce readers in some evolutionary subfields to results that are well known in others.

Contemporary evolutionary genetics has matured past the point of simplistic dichotomies. Extreme models, holding that evolution is always and everywhere a matter of waiting for the desired sport, or invariably a matter of reshuffling limitless infinitesimals, have no adherents. Evolution contains multitudes, and genetic architectures in nature span the range: mono-, oligo-, polygenic. My goal is not to make sweeping claims about the effect-size distributions that underlie evolutionary change, but to articulate the limits of the QTN program.

Known Causal Variants are Not Typical QTNs

Despite the surprise that initially met large-effect QTLs, the segregation of large-effect loci was not a new discovery. Such loci were familiar to Mendel and Fisher. Indeed, the question has never been whether large-effect loci segregate and contribute to divergence, it has been whether such loci are typical or aberrant. The opinion of much of the postsynthesis evolutionary genetics community of the 20th century landed on the side of aberrant: these loci are uninformative about most variation and adaptation. Lewontin wrote

“visibles are neither a random sample of allelic substitution nor a random sample of loci since they are of such drastic effect. The same objection applies to the classic visible polymorphisms such as banding in snails, pattern polymorphism in Lepidoptera and ladybirds, or to strongly selected biochemical polymorphisms such as sickle-cell anemia or thalassemia in man.” (Lewontin 1974, pages 97–98, citations removed).

Are the loci that we are able to map today merely the molecular alleles corresponding to atypical visible polymorphisms?

Recently, Stern and Orgogozo (2008) compiled a valuable catalog of the evolutionarily relevant mutations characterized to date. (As the QTN program seeks all causal variants that affect phenotypes in nature, I consider the catalog of Stern and Orgogozo our best current understanding of the nature of QTNs, and I use “QTN” interchangeably with “evolutionarily relevant mutation” or “causal variant.”) Very few of the cataloged loci were found by QTL mapping; the majority derive from candidate gene studies or linkage mapping of Mendelian genes. Roughly a third of the cataloged QTNs are from domesticated organisms, including, for example, six independent null mutations in myostatin in different breeds of cattle, mutations that confer completely penetrant recessive “double-muscling” in the affected animals. Many other QTNs have a known and highly specific mechanistic relationship with the segregating phenotype. In dipterans, resistance to acetylcholinesterase-targeting insecticides maps to variants in the targeted gene, AChE. Mutations in this gene comprise 15/331 (4.5%) of the QTNs in the catalog. Pyrethroid-resistant dipterans have mutations in the Vssc1 sodium channel, mutations that confer up to 11,300-fold resistance relative to wild type (Guerrero et al. 1997); these mutations are 6.9% of the QTNs (see Note 5 in Supporting information). Many other QTNs underlie discrete pigmentation phenotypes and fall in the small suite of reliable candidate genes for such traits: tan, yellow, ebony in flies, MC1R, agouti, and OCA2 in vertebrates, and genes in the anthocyanin pathway in flowers (see Note 6 in Supporting information).

These are old-fashioned Mendelian genes; their identification does little to ameliorate the concern that the alleles we can discover are not those that typify complex trait evolution.

Where are the QTNs for complex traits? Where, for example, are the QTNs for wing shape in Drosophila (Weber et al. 1999; Mezey and Houle 2005; Mezey et al. 2005; Palsson et al. 2005)? This is a model complex phenotype. It influences performance and responds to selection in the lab, drawing on perhaps hundreds of underlying loci (Weber 1990). It differs among species (Houle et al. 2003). Dozens of QTLs have been mapped (Weber et al. 1999; Zimmerman et al. 2000; Mezey et al. 2005). Despite hundreds of person years of effort and all the resources available for the preeminent model insect, there are no mapped QTNs. The best candidate is a noncoding single nucleotide polymorphism (SNP) in the promoter of the Egfr gene, mapped by association (Palsson and Gibson 2004; Palsson et al. 2005). This SNP explains less than 1% of the trait variance in one population and none in another and may simply be a marker in linkage disequilibrium (LD) with the causal variant. What if D. melanogaster wing shape is a typical complex trait?

None of this is to call into question the reality and importance of large-effect alleles. These are bona fide instances of evolution, real molecular variants that contribute to diversity and divergence. Their characteristics may accurately reflect those of the larger universe of mutations with similar characteristics. The question is, what universe is that? Stern and Orgogozo embrace that question by documenting multiple universes, with important differences among QTNs derived from different kinds of evolution: natural versus artificial selection, intraspecific variation versus interspecific divergence, physiological traits versus morphological traits. But it remains possible that the attributes of large-effect alleles are not the same ones that characterize small-effect alleles. If large-effect QTNs are the only ones we are capable of mapping, their discovery is mute about their generality.

Our QTN mapping capabilities are very limited. The path from QTL to QTN typically requires functional assays, and in most species the perturbations induced by experimental manipulations are likely to dwarf the effects of all but the largest effect QTNs. Only in yeast, in which true allelic replacements are feasible in otherwise isogenic lines, have QTNs with modest effect been validated (Deutschbauer and Davis 2005; Gerke et al. 2010). In Caenorhabditis elegans, which has experimental resources only slightly less powerful than yeast, all of the QTNs mapped to date (mutations in npr-1, mab-23, zeel-1, tra-3, plg-1, scd-2, glb-5, tyra-3, and ppw-1) are effectively Mendelian (i.e., there are discrete phenotypic classes and individual can be accurately assigned to classes based on genotype), and in three of these cases, the mutations actually arose in the laboratory and were unknowingly selected (Hodgkin and Doniach 1997; de Bono and Bargmann 1998; Lints and Emmons 2002; Tijsterman et al. 2002; Kammenga et al. 2007; Palopoli et al. 2008; Reiner et al. 2008; Seidel et al. 2008; McGrath et al. 2009; Rockman and Kruglyak 2009; Bendesky et al. 2011) (see Note 7 in Supporting information).

In short, our ability to collect gold nuggets may not be informative about the nature of gold ore. Truly glittering nuggets, such as lactase in humans (Tishkoff et al. 2007) and couch potato in American D. melanogaster (Schmidt et al. 2008), are nuggets nonetheless (see Note 8 in Supporting information). Even if we embrace the geometric model's prediction of exponential effect-size distribution, our QTN successes have only sampled the most extreme outliers in the distribution's tail. If the largest effect alleles are in any respect unusual, the QTN program will fail to learn it.

The LOD that Failed: QTLs are Uninformative

Orr and Coyne (1992) noted that “no model—however, sophisticated—can answer the question of the relative importance of major versus minor genes in evolution. This is an empirical question that can only be settled with data.” As typically practiced, however, QTL analysis does not even address the question.

Why are so many of the QTNs effectively Mendelian? The simplest explanations are trait selection and publication bias. We tend to study traits that exhibit dramatic and discrete differences between populations or species, we invest in genetic studies when we have some reason to think the genetics will be tractable, and we publish our results when we have identified QTLs, or, ideally, QTNs (Orr 1998b). It is therefore fair to ask, when we find QTNs, is it because we are lucky in the traits we study, or because we are choosy (Phillips 2005)? Below, I discuss data from studies that do not suffer these biases, studies of arbitrary traits. But first, the methodological biases of QTL mapping must be addressed.

QTL mapping studies of inbred line crosses typically discover a skewed distribution of effect sizes, with a small number of large-effect loci accounting for the majority of explained variance. As Bell (2009) notes, “there may be several uninteresting reasons for this.” Unfortunately, the uninteresting reasons render the empirically determined distribution uninformative; the estimated QTL effect-size distribution is expected to be L shaped even when the underlying loci have identical effect sizes (Beavis 1998). Indeed, the same is true for any underlying effect-size distribution (Bost et al. 2001).

Many of the sources of QTL bias are very familiar: limited power means that we cannot detect loci of very small effect, we misestimate the effect sizes of the QTLs we do detect due to LD between QTNs and to environmental and sampling variance, and the maximum number of detectable QTLs is set by the size of the genetic map (McMillan and Robertson 1974; Beavis 1998; Otto and Jones 2000; Bost et al. 2001; Barton and Keightley 2002; Steinmetz et al. 2002; Cornforth and Long 2003; Johnson and Barton 2005; Palsson et al. 2005; Phillips 2005; Hermisson and McGregor 2008; Mackay et al. 2009; Huang et al. 2010). The problem of multiple tightly linked QTNs within a single gene is also widely recognized, following the landmark work of Stam and Laurie (1996) on Drosophila ADH activity and McGregor et al. (2007) on Drosophila trichome patterning, among others (see Note 9 in Supporting information).

But a deeper problem is underappreciated: the null hypothesis for most QTL mapping is the absence of a QTL, not an abundance of infinitesimal QTLs (Lander and Botstein 1989; Churchill and Doerge 1994). In simulation studies that take the infinitesimal model seriously (Visscher and Haley 1996; Noor et al. 2001; Cornforth and Long 2003), inference of a small number of large-effect QTLs is a common result. One explanation for this pattern is that chance spatial clustering of infinitesimals with effects in the same direction will appear to be a large-effect locus. The problem is compounded by nonuniform recombination rates and gene densities (Noor et al. 2001), which can facilitate such clustering in regions with elevated gene:centiMorgan ratios.

In one empirical study that directly compared a QTL model to an infinitesimal model, a large analysis of the contribution of D. melanogaster chromosome 3 to wing shape in a cross between divergently selected lines, the data could not distinguish between them (Weber et al. 1999).

In the absence of functional, QTN-resolution validation, inferences from distributions of QTL numbers and sizes is, at best, fraught. The history of QTL mapping replicates the experience 30 years ago with estimating heterozygosities: the experiments could be done, so they were done, and the failure of the data to test any hypothesis was no objection (Phillips 2005). The difference is that heterozygosities were estimated accurately.

This view may seem nihilistic, but the failings of the research program are now widely acknowledged. As Mackay et al. (2009) recently concluded, “Despite two decades of intensive effort, we have fallen short of our long-term goal of explaining genetic variation for quantitative traits in terms of the underlying genes, the effects of segregating alleles in different genetic backgrounds and in a range of ecologically relevant environments as well as on other traits, the molecular basis of functional allelic effects and the population frequency of causal variants.” They continue: “The inescapable conclusions from the past two decades of studies are that QTL alleles with large effects are rare and that the bulk of genetic variation for quantitative traits is due to many loci with effects that were individually or in aggregate (owing to tight linkage of QTLs with opposite effects) too small to detect because previous studies were underpowered.”

Theory does not Require a Preponderance of Large-Effect QTNs

QTN mappers have sometimes viewed the geometric model of adaptive fixation as casting a mathematical penumbra of legitimacy over the large-effect QTLs and QTNs that predominate in the literature. In many cases, however, the connection between theory and data is tenuous, as the theory pertains only to a limited domain of evolutionary phenomena: discrete bouts of adaptation from new mutation (Orr 1998a, 2005b). Below I discuss three contexts in which the theoretical expectation of large-effect QTNs simply does not apply: adaptation to a moving optimum, adaptation from standing variation, and nonadaptive evolution. To the extent that the QTN program proposes to answer questions in these evolutionary contexts, the geometric model is uninformative.

In Fisher's geometrical model, phenotypic changes that improve the fit of organism and environment are necessarily smaller than the diameter of the hypersphere in phenotype space centered on the optimal fit and passing through the phenotype's current position (Fisher, 1930, p. 39). Theoretical models of genetic evolution based on Fisher's geometry parameterize mutational size in terms of the proportional distance to the optimum. For the steps to be large in absolute terms, the diameter of the sphere must be large; that is, a bout of adaptation involves a genotype suddenly transposed into a new environment to which it is badly matched. A basic question for models of adaptation from new mutations, then, is how far do we expect populations to be from the optimal phenotype? For Fisher, the sphere was typically quite small, as populations would adapt by continuous pursuit of a receding optimum, which, like a carrot on a stick, draws the population forward but remains forever just out of reach (Fisher 1930; Frank and Slatkin 1992). This scenario has recently gained theoretical attention in the context of the moving optimum model (Kopp and Hermisson 2009a, 2009b), which suggests indeed that small mutational steps predominate rather than the large steps found for discrete adaptive bouts. Empirical work in Chlamydomonas corroborates these claims: the effect-size distribution of mutations fixed is shifted toward small effects under a moving optimum (Collins and de Meaux 2009). In addition, Martin and Lenormand (2008) recently found that for versions of the geometric model in which the phenotypic space has relatively few evolutionarily independent dimensions, the effect-size distribution of alleles fixed by selection for an optimum is better described by a beta distribution, which is truncated on the right and lacks the long tail of large-effect mutations, than by an exponential (at higher dimensionality, their results converge on those of Fisher, Kimura, and Orr).

As Orr has pointed out, a second context in which the exponential effect-size model does not apply is adaptation from standing variation, whether it occurs by adaptive fixations or merely by shifts in allele frequencies. Despite the centrality of standing variation to the evolutionary synthesis and the widely recognized ubiquity of heritable variation for most traits in most populations, recent models of the genetics of adaptive evolution have tended to focus on new-mutation models, which treat evolution as a series of sequential selective sweeps dependent on the appearance of new beneficial mutations. Only in the past few years have phenotypic and molecular population genetic models begun to treat adaptation from standing variation seriously (Orr and Betancourt 2001; Innan and Kim 2004; Hermisson and Pennings 2005; Przeworski et al. 2005; Barrett and Schluter 2008; Chevin and Hospital 2008). The results are clear: adaptive fixation from standing variation implicates alleles of small effect.

Standing variation has several advantages over new mutations. First, the alleles have already avoided stochastic loss immediately after arising, the process that distinguishes Kimura's result from Fisher's. Second, intermediate allele frequencies allow alleles to explain a substantial fraction of a trait's heritable (selectable) variation, even if their effects are small. Third, they have a head start toward fixation relative to new alleles. Even deleterious mutations maintained at relatively low frequencies by mutation-selection balance alter the effect-size distribution of adaptive fixations (Orr and Betancourt 2001). Although analyses of adaptation from new mutation have been successful within the domain of phenomena they cover, analyses of standing variation yield qualitatively different predictions. Figure 1 of Hermisson and Pennings (2005), which compares the fixation probabilities of alleles of different effect sizes according to their origins as new mutations or segregating variants, should be as central to discussions of adaptation genetics as any version of Fisher's spheres.

The results of recent analyses of human population genomic data reinforce the idea that much adaptation simply involves subtle shifts in the frequencies of alleles at many loci (Hancock et al. 2010a, 2010b; Pritchard et al. 2010; Hernandez et al. 2011). The genetic signatures of the last quarter million years of our species’ evolution and of the subsequent extensive local adaptation of human populations bear few of the hallmarks of selective sweeps (Hernandez et al. 2011), and fixed differences among human populations are exceptionally rare, despite sufficient time for new mutations to have fixed (Pritchard et al. 2010). Instead, much local adaptation appears to result from modest changes in the frequencies of multiple alleles of modest effect, and these alleles tend to exhibit haplotypic characteristics consistent with long histories as standing variation (Hancock et al. 2010a).

Recent empirical and theoretical studies of standing variation have also rediscovered the prevalence and potential importance of cryptic genetic variants, alleles whose effects are exposed only under environmental or genetic stresses (Badano and Katsanis 2002; Gibson and Dworkin 2004; Hermisson and Wagner 2004; Hansen 2006; Gibson 2009; Frankel et al. 2010). Conveniently, such stresses can be created by environmental changes (including changes to the genetic composition of the population) to which the organisms must adapt: a change in the selective regime can generate new additive genetic variance from existing cryptic variation. Furthermore, cryptic variation is enriched for potentially beneficial alleles relative to new mutations, because the alleles are definitively not unconditionally deleterious (Masel 2006).

One line of data often cited in support of the community's shift toward a focus on large-effect mutations comes from studies of microbial experimental evolution. The vast majority of experimental evolution studies that track genotypes begin without standing variation and instead conform exactly to the requirements of Orr's model: a bout of adaptation to a new fixed optimum entirely dependent on new mutations. One important exception is the work of Teotonio et al. (2009) studying experimental evolution of Drosophila populations. Their study found modest replicable shifts in allele frequencies after more than 100 generations of directional selection, followed by replicable partial reversion of the frequencies after 50 generations of reverse directional selection. The phenotypic responses to selection in both the forward and reverse phases were dramatic. Overall levels of molecular and additive genetic diversity showed no change during the experiment. In short, the responses to selection were driven by standing variation. This study provides molecular confirmation of what quantitative geneticists have long known: the breeder's equation works (Crow 2008; Weiss 2008; Hill 2010; Houle 2010).

The third disconnect between the exponential-effects model and the QTN program is simple: adaptive fixation is not the whole of natural selection, and as Fisher said in the first sentence of his (1930) book, “natural selection is not evolution.” For many evolutionary questions—about the origin and maintenance of variation in traits under stabilizing selection, for example, or about the molecular basis and evolutionary causes of robustness, or about the mechanisms of morphological and developmental evolution (often mistaken for questions about adaptation, e.g., by Hoekstra and Coyne [2007]; see Stern and Orgogozo [2008]), or about the genetics of trait loss under relaxed selection, or about genetic network evolution in the absence of phenotypic change (Weiss and Fullerton 2000; True and Haag 2001; Haag 2007)—the geometric model of adaptive fixation provides little or no guidance (nor is it intended to). The large-effect QTNs mapped in contexts outside of adaptive change are therefore unable to rely on the geometric theory to support claims that they are typical.

The geometric model is not the only theoretical context in which a roughly exponential distribution of effect sizes has been proposed. A thoughtful essay by Alan Robertson (1967) is now often cited to support the evolutionary relevance of large-effect QTLs. Robertson proposed, as an untested prediction, that “the distribution of gene effects will probably be of an exponential kind.” Although Robertson's point was less that we might expect some large-effect alleles than that we should expect enormous numbers of small-effect alleles, and that consequently questions about the number of loci underlying a trait are meaningless, his basic prediction is not contentious. The QTN effect-size distribution is certainly neither constant nor uniform; some effects are large and many are small and their proportions depend on all of the details of the species and traits and selection regimes under study (Orr and Coyne 1992; Stern and Orgogozo 2008). But none of the theory and data generated over the last 20 years force us to conclude that QTNs of detectably large effect are representative of the alleles that shape most variation and divergence.

Fisher Redivivus: Unbiased QTNs are Often Small-Effect Polygenes

Recent work has revealed the exceptionally polygenic basis of standing variation for many complex traits, implying that evolution from such raw material is likely to employ a different class of alleles than evolution from new mutations. I briefly review four lines of inquiry that make the case: genetics of gene expression, genome-wide association mapping in humans, genomic selection in agriculture, and population genetic studies of weak selection.


Brem and Kruglyak (2005) characterized the genetic architectures of 5727 arbitrary traits, the abundances of each gene's transcript in a cross of two Saccharomyces cerevisiae strains. Genetic analysis of transcript abundance has three major virtues. First, there are many traits, allowing for generalizations. Second, the traits are not preselected on the basis of intuitions about ecological genetics or evolutionary regime. And third, the traits integrate over the entire phenotypic state space of the organism; that is, variation in organismal phenotypes is likely reflected in transcript abundances, whether the transcripts are causes of the organismal phenotypes or effects. In their sample of 112 haploid recombinant strains, Brem and Kruglyak found that most traits (62%) exhibited very high heritabilities inline image, but more than 40% of these highly heritable traits exhibited no genetic linkage. A modeling approach to estimate the number of undetected QTLs (given the study's power) suggested that their highest complexity model, 30 additive loci of equal effect, explained the data better than less polygenic models for 45% of the highly heritable traits. Transcript abundances are, if anything, a conservative test bed for genetic complexity, because line crosses invariably treat the transcript's locus as a single QTL, although the locus may harbor many cis-regulatory and trans-acting autoregulatory variants in LD (Stam and Laurie 1996; McGregor et al. 2007); transcript abundance traits often have large-effect QTLs that map to the genomic locations of their transcripts (Rockman and Kruglyak 2006). Nevertheless, the yeast data imply that a substantial fraction of transcript abundance traits have as many QTLs as there are segregating regions of genome in the mapping cross. The prevalence of transgressive segregation in the cross, combined with the absence of detectable linkage for many of the traits, may point to a large number of small-effect genes linked in repulsion, with their effects masked by LD in the cross. Such a genetic architecture may be characteristic of traits under stabilizing selection and suggests an abundance of allelic variation available for a response to directional selection (Mather 1941; Hansen 2006).

Despite the demonstrated presence of thousands of detectable QTLs in the yeast cross, and many more that are too small to be detected, relatively few have been mapped to QTN resolution. Those that have are, to a large extent, Mendelian (Ehrenreich et al. 2009). The quantitative trait genes include LEU2, which segregates in the cross due to an engineered deletion that renders one of the parental strains auxotrophic; MAT, which controls mating type, and AMN1, a gene whose derived allele abolishes clumpy growth and was evidently selected de novo—a sport—during the laboratory domestication of yeast (Ronald and Akey 2007).


One of the objections to QTL mapping is that QTLs are typically physically large genomic regions that may harbor many linked variants. An alternative approach to mapping, genome-wide association, relies not on LD between QTNs and markers generated by controlled crosses but instead on LD generated by population history. Association mapping, like linkage mapping, pinpoints genomic regions rather than individual QTNs. However, the size of the associated region (the amount of genome in LD with an associated variant), is typically orders of magnitude smaller than the regions spanned by linkage-mapped QTLs.

The recent growth of GWASs in humans has cast serious doubt on the ubiquity of large-effect variants. Although early successes raised hopes that the QTNs underlying quantitative variation would quickly emerge, the last several years of GWAS have left the community heartbroken: although significant associations are routinely found, their individual effects are typically minute, and they cumulatively explain only a tiny fraction of the heritable variation in most traits. For example, the 10 strongest associations in a recent blood pressure GWAS with 29,000 individuals jointly explain 1% of the trait variation after taking known nongenetic determinants into account (Levy et al. 2009), leaving more than 90% of the heritable variance unexplained. The failure of enormous, multinational consortia to find phenotypically important associations has led to the great pseudocontroversy of contemporary human genetics: the case of the missing heritability (Maher 2008; Manolio et al. 2009).

The simplest explanation for the missing heritability is that it resides in small-effect alleles that GWAS are underpowered to detect. Strong support for this model comes from a remarkable and important study of the genetics of schizophrenia, a condition with heritability of approximately 80% (Purcell et al. 2009). This case-control study of more than 3000 individuals with schizophrenia and 3500 controls identified multiple, replicable associations with SNPs in the major histocompatibility complex (MHC); this result is typical of many human diseases. But the authors then asked whether a Fisherian model of a large number of alleles of very small effect could account for the remaining heritability. They used genotypes to predict risk of case status in a test population based on genotypic risk scores estimated in another population. They found that risk scores become better and better predictors as more and more independent SNPs with less and less nominal significance were included in the risk calculation. Careful model-based analysis of the fraction of population variance explained by these nonsignificant SNPs yielded an estimate of ∼34% for the genotyped SNPs, implying a much larger fraction (∼80%) for the actual causal SNPs which are likely to be in imperfect LD with the marker SNPs. Biologically realistic models that take imperfect LD into account suggest that more than 10% (and plausibly 100%) of the 74,000 predictor SNPs tag real causal variants. Moreover, models that invoke rare alleles of large effect could not account for the pattern of risk score predictive success. Importantly, the risk score proved to be predictive of schizophrenia and bipolar disorder, but not of a suite of nonpsychiatric conditions. That is, the LD-tagged causal variants represent a huge pool of common alleles of small effect that is specific to particular phenotypic traits, and not merely a signature of systemic unwellness or of population stratification. The International Schizophrenia Consortium paper is among the most important papers in empirical evolutionary genetics in years: Fisher (1918) is vindicated.

The schizophrenia study is not alone. The poster trait for missing heritability is height. Its heritability was established along with the very concept of heritability by the biometricians more than a century ago, and modern data support a heritability (within generation and nation) of about 80% (Visscher 2008). GWAS on an exceptional scale, involving upwards of 90,000 individuals, identified 44 separate height-associated SNPs across the genome (reviewed in McEvoy and Visscher 2009). The largest allelic substitution effect is about 0.06 phenotypic standard deviations, and all of the associated variants together account for much less than 10% of the heritable variation in height (McEvoy and Visscher 2009). Yang et al. (2010), applying methods similar to those of the Schizophrenia Consortium study to a panel of 3,925 unrelated individuals, found that the common alleles surveyed by GWAS, although individually insignificant, could collectively account for the entire heritability of human height.

Many of the GWAS associations with the largest effects map to loci characterized by strong recent selection, both geographically restricted positive selection (e.g., OCA2:Sulem et al. 2007) and global balancing selection (e.g., the ABO blood group locus: Stajich and Hahn 2005; Amundadottir et al. 2009). The best known among these is the MHC region, which is a major-effect locus for an enormous range of conditions (Johnson and O’Donnell 2009), including schizophrenia risk, as discussed above. Variation in MHC is incapable of contributing to phenotypic divergence in these traits, however, because its allelic variation is maintained by tremendously powerful balancing selection (Hedrick 1999) and the disease alleles may be unconditionally deleterious sheltered load (van Oosterhout 2009). The finding that many large-effect disease loci exhibit strong recent selection is consistent with the worry that the largest effect loci, the ones that we are capable of mapping, are atypical: they exhibit strong selection despite their deleterious pleiotropic side effects, like the myostatin sports selected in cattle breeding, mutations that do not contribute to long-term evolution (Stern and Orgogozo 2009). As Lewontin wrote (albeit in a slightly different context), “it is no use trotting out that tired old Bucephalus, sickle-cell anemia” (1974, p. 199). Sickling beta globin, our very first QTN (Ingram 1956), now leads a cavalry of large-effect QTNs whose relevance to our more general questions is no clearer today than it was in 1974 (see Note 10 in Supporting information).

It might be argued that human diseases are particularly poor models for adaptive genetic variation. But there are many reasons not to dismiss their relevance, not the least that the nature of genetic variation maintained by mutation-selection balance is an important topic in itself, and consequential for adaptation (Orr and Betancourt 2001). Another is that most diseases represent the tails of continuous phenotypic distributions (Dendrou et al. 2009; Plomin et al. 2009); alleles that contribute to hypertension, for example, also shape variation in the normotensive range. Moreover, many of the alleles that contribute to disease may be ancestral, suggesting that loci shaping disease risk are exactly those that contribute to ongoing adaptation to modern conditions (Di Rienzo and Hudson 2005; Gibson 2009; Hindorff et al. 2009; Hancock et al. 2010a; Pritchard et al. 2010). Finally, objections to disease models clearly do not apply to studies of human height: body size and shape are iconic examples of local adaptation, in the form of Bergmann's rule and Allen's rule, and natural selection on size is well documented in wild animal populations (Grant and Grant 2002).

A final word about GWAS: even among the significant hits in GWAS, the conversion of associations to QTNs remains problematic (Altshuler et al. 2008; Ioannidis et al. 2009), and some individual associations may themselves reflect multiple QTNs in LD with one another (Maller et al. 2006; Graham et al. 2007). Functional analyses necessary to validate small-effect variants will require methods and levels of replication not typical (and perhaps not feasible) for human or model-system genetics (Donnelly 2008). Although the mapping resolution of GWAS is an improvement over conventional QTL mapping, GWAS remains incapable of pinpointing the causal variants, which is the sine qua non of the QTN program.


The polygenic models described above for schizophrenia and height genetics are part of a broader movement in human genetics toward incorporation of all markers, not merely the statistically significant ones, into estimation of genetic architectures and prediction of trait values (Schork 2001; Wray et al. 2007; Visscher 2009; Wei et al. 2009; de los Campos et al. 2010; Province and Borecki 2008) describe an analogous method as “gathering the gold dust”. This approach is becoming a mainstay in livestock and crop genetics in the form of genomic selection (VanRaden et al. 2009). A successor to marker-assisted selection, which used QTL-linked markers to facilitate breed improvement (with little success), genomic selection skips the QTL estimation step by allowing every marker to have an effect on the trait, in keeping with the traditional (and very successful) methods of estimating breeding values from pedigrees under an infinitesimal model (Meuwissen et al. 2001; Goddard 2009). The method requires assumptions about the distribution of allelic effect size, and although implementation of genomic selection is in its early days, data from Holstein cattle offer strong support to models assuming that thousands of markers, almost evenly distributed across the genome, have nonzero effects (Cole et al. 2009; VanRaden et al. 2009). For most traits, simple models in which every marker is weighted equally perform as well as those with nonconstant effect-size distributions. Simulation studies suggest that the equal-weighting approach should only perform as well as more complex methods if there are no major loci but a very large number of loci of small effect, hundreds per Morgan of the genetic map (Daetwyler et al. 2010; Meuwissen and Goddard 2010). These results “may explain why the infinitesimal model and standard quantitative genetic theories have worked well,” note VanRaden and colleagues, in an understatement that recalls the last line of Watson and Crick (1953).


A final line of evidence for the ubiquity of small-effect alleles comes from the limiting case of a quantitative genetics experiment, natural selection. In a quantitative genetic mapping experiment, we attempt to detect alleles that affect traits by estimating the additive affect of an allelic substitution averaged across a large number of randomized genetic backgrounds. This sort of randomized, replicated, multifactorial perturbation is exactly how natural selection operates, discriminating among segregating alleles on the basis of their additive effects on fitness. In both quantitative genetics and natural selection, the randomization is mediated by meiosis and therefore results in poor discrimination among—and interference between—closely linked sites. But more important for our purposes, the power of both quantitative genetics experiments and natural selection to detect the additive effects of an allele is determined by effect size and sample size. Sites under weak selection are fitness QTNs whose effect sizes are on the order of the reciprocal of the sample size, that is, the effective population size (Ne).

Effects of weak selection differ among species as a function of effective population size (Kimura 1968; Eyre-Walker and Keightley 2007). Sites that exhibit signatures of weak selection in species with large population sizes but do not in species with small populations are, in the latter species, fitness QTNs with effect sizes beneath the detection limit of natural selection. These are fitness infinitesimals, and they are numerous; they may even govern the evolution of genome architecture (Lynch 2007).

To impact fitness, a variant must affect some aspect of performance or life history. For example, a mutation that generates a weakly advantageous preferred codon in D. melanogaster is not a bare fitness allele; it alters the translational efficiency and accuracy (among other things) of the gene in which it occurs. These effects in turn influence the functional dynamics of the network in which the gene is embedded. The phenotypic specificity of such a variant can be detected, in some cases, by mapping in a sensitized background (Dworkin et al. 2003).

The weak selection data imply that a large fraction of all segregating variants have fitness consequences, even if their magnitudes are too small for selection to detect in typical plant or animal populations (Lynch 2007). Insofar as these fitness infinitesimals segregate in most genes in most populations, and variation in specific genes impacts specific phenotypes, we may conclude that specific phenotypes have at least as many underlying QTNs as they have underlying molecular genes. For most continuous traits, this number is large. More than a third of ∼2000 viable gene knockouts in mice affect body size, for example (Reed et al. 2008). The findings of the International Schizophrenia Consortium study, discussed above, reinforce the notion that the genetic basis for a trait can be both genomically diffuse and phenotypically specific. For many human traits studied by GWAS, sample sizes already exceed estimated human Ne, implying that many of the causal SNPs underlying the associations evolve as infinitesimals even if the studied trait is perfectly correlated with fitness.

There is a Relationship Between Phenotypic Effect Size and Molecular Function

The arguments above make the case that large-effect QTNs are unusual, that QTL effect sizes are uninformative, that theory does not support claims that large-effect QTNs are typically expected or exclusively important, and that empirical data point to a nearly infinitesimal genetic basis for many traits (although by no means all). If large-effect QTNs are a random sample of QTN molecular function, however, then none of these claims matters for questions about the molecular basis for phenotypic evolution.

The evidence is unequivocal, however, that mutations at sites with different molecular functions have different distributions of effect sizes. Below, I adduce the evidence to support the hypothesis of Ayala and McDonald (1980): “there may often be a profound relationship between the regulatory genes of the molecular geneticist and the minor genes of the quantitative geneticist” (p. 2; see also Mukai and Cockerham 1977). There are many other evolutionarily important differences, aside from effect size, among QTNs from different categories of molecular function, but I will not treat these here as they have been reviewed elsewhere at length (Mitchison 1997; Stern 2000; Ohta 2003; Wray 2007; Carroll 2008; Lynch and Wagner 2008; Stern and Orgogozo 2008, 2009). Here I aim only to demonstrate that QTN effect size and molecular function are not independent classifications.

Most Mendelian mutations alter protein sequences (Mattick 2009). Summarizing the lessons learned from the 27,000 Mendelian mutations discovered in humans by clinical geneticists through mid-2002, Botstein and Risch (2003) found that fewer than 1% were regulatory mutations; 59% were missense or nonsense point mutations. Based on the present catalog of more than 85,000 mutations, these numbers are 1.6% and 66%, respectively (Stenson et al. 2009).

Botstein and Risch acknowledged that regulatory mutations might be underrepresented in part because they are hard to identify. No such discovery bias is possible in C. elegans, however, in which a catalog of point-mutation alleles generated by mutagenesis experiments (Sarin et al. 2008; see their supplementary table 2) reveals a profound paucity of noncoding variants (<5%). C. elegans researchers work primarily with a single isogenic reference strain, and recessive Mendelian mutations generated in this background are typically fine mapped by transgenic complementation with large-insert clones followed by targeted sequencing of the mutant strain. When the causal mutations are noncoding, they are found; the power of positional cloning in C. elegans is what led to the discovery of noncoding microRNAs (Lee et al. 1993; Wightman et al. 1993) (see Note 11 in Supporting information).

The nearly exclusive occurrence of protein-coding variants among Mendelian point mutations in humans and worms implies one of two things: either protein-coding variants represent the vast majority of all functional sites, or noncoding variants tend not to have Mendelian effects. Comparative genomics data refute the first possibility. Genomes contain large numbers of evolutionarily conserved—and hence functionally important—noncoding sites. In humans, there are several times as many conserved noncoding sites as coding, and in C. elegans 45% of conserved sites are noncoding (Siepel et al. 2005; Asthana et al. 2007; Oldmeadow et al. 2010; Meader et al. 2010).

Moreover, evolutionary conservation underestimates the extent of functional noncoding sequence. The ENCODE project found that roughly half the functional noncoding elements in humans exhibit no detectable evolutionary constraint across mammals (Birney et al. 2007). Other targeted studies have also shown rapid evolutionary turnover of functional regulatory elements (Ludwig et al. 2000; Moses et al. 2006; McGaughey et al. 2008).

In addition, DNA sequences with no function are susceptible to phenotypically relevant mutations. Mutations in noncoding sequence can create new functional elements de novo, and completely functionless sequence can be under selection to maintain functionlessness (Hahn et al. 2003). Transcription factor binding sites, miRNA binding sites, and miRNA genes arise spontaneously at quite considerable rates (Stone and Wray 2001; Dermitzakis and Clark 2002; MacArthur and Brookfield 2004; Chen and Rajewsky 2007; Lu et al. 2008). Function also resides in the higher level organization of DNA sequences, including intrinsic DNA topology (Parker et al. 2009). Even the humble microsatellite, a classic neutral marker, is no such thing: short-motif tandem repeat polymorphisms are widely implicated in transcriptional regulation (Rockman and Wray 2002; Vinces et al. 2009; Hannan 2010). In short, noncoding DNA is packed with functional sequences that fail to yield Mendelian effects when mutated.

If Mendelian mutations are disproportionately protein coding, does that imply that QTLs tend to be noncoding? A common theme in reviews of human GWAS findings is that a substantial fraction of associations fall in noncoding regions (Altshuler et al. 2008; McCarthy et al. 2008; Frazer et al. 2009; Manolio et al. 2009). Although GWAS does not permit direct identification of QTNs, the physical interval containing a QTN is restricted by the pattern of LD around the associated marker. Visel and colleagues (2009) conducted a meta-analysis of 1200 SNPs called significant in GWAS papers published through February 2009. Using what they describe as conservative parameters for LD, they concluded that protein-coding sites could be excluded as potential causative variants for 40% of the associations. This represents a lower bound on the contribution of noncoding sites to quantitative-effect alleles (see Note 12 in Supporting information).

Molecular population genetics provides us with access to another specific slice of the effect-size spectrum, nearly neutral mutations. Several studies have cataloged, genome-wide, mutations that are neutral in species with small populations but deleterious in species with large populations. For any given set of species, these genomic comparisons provide an unbiased glimpse at the functional nature of small-effect mutations, with the specific slice of effect-size distribution defined by the difference in long-term population sizes among the studied species. In comparisons of hominids and murids, for example, the smaller populations of the former leave a signature in higher rates of substitution in classes of functional sequence that are preserved by negative selection in murids. The signature is stronger in noncoding than in coding sequence (Keightley et al. 2005a, 2005b; Kryukov et al. 2005; Eyre-Walker and Keightley 2007), pointing to enrichment of noncoding mutations within this nearly neutral sliver of effect sizes.

An exciting recent addition to the discussion is the work of Goode et al. (2010), who analyzed the derived-allele frequency spectrum of polymorphisms falling within evolutionarily conserved sequences in humans. Their definition of evolutionary conservation is quantitative; these sites are not invariant across mammalian phylogeny, but they exhibit lower rates of substitution than freely evolving sites. Goode et al. found that the vast majority of variants segregating at evolutionarily conserved positions in a global panel of 432 individuals were common in the population and noncoding. In another analysis, treating the whole genomes of three individuals, they found that roughly 90% of the inferred-functional polymorphisms were noncoding; this high proportion is due to both the greater number and higher average heterozygosities of noncoding relative to coding polymorphisms. These data reinforce the notion that the population holds an enormous store of common small-effect QTNs whose molecular function is disproportionately noncoding.

What Now?

Many possible ways forward—explicit tests of polygene effects (Le Rouzic et al. 2010; Yang et al. 2010), molecular population genetic studies (Goode et al. 2010), new approaches to higher order systems genetics (Chen et al. 2008; Rockman 2008; Mackay et al. 2009)—constitute abandonment of the QTN program in favor of statistical genetics. Although it is clear that statistical genetics, particularly in the remarkable new era of inexpensive population genomic data, has the potential to reanimate long dormant questions in evolutionary genetics, giving up on QTNs is difficult. It feels like a cop-out, as Bateson (1909) argued a century ago, approvingly quoted by Orr (2005a): “By suggesting that the steps through which an adaptive mechanism arises are indefinite and insensible, all further trouble is spared. While it could be said that species arise by an insensible and imperceptible process of variation, there was clearly no use in tiring ourselves by trying to perceive that process. This labor-saving counsel found great favor” (see Note 13 in Supporting information). We have now, following Bateson's advice, tired ourselves trying to perceive the genetic basis of trait variation, and I have argued here that our exertions availed us little. The epistemological paradox is real. But still, should we really throw up our hands?

My claim is not so desperate: we should employ experimental methods suited to our research questions. A less desirable alternative is to tailor our research questions to our experimental methods. The least desirable outcome is the present one, in which we ignore the mismatch between question and method.

At the first instance, it must be reiterated that the large-effect QTNs that are amenable to discovery are informative about the genetics of evolution (Stern and Orgogozo 2008; Streisfeld and Rausher 2011). My critique of their generality in no way diminishes their reality. Such alleles may or may not underlie complex trait evolution, but they do segregate and fix and legitimately demand our attention (Watt 1994). Nevertheless, it is as clear today as it was in 1974 that large-effect QTNs cannot answer general questions about the maintenance of variation, the evolution of form, adaptation from standing variation, or the mechanisms of speciation.

One appealing solution to our dilemma is that proposed by Stern and Orgogozo (2008): the answers to our questions depend on biological context, so we must develop a more meaningful classification of those contexts. We have different predictions about QTNs depending on the details of our study species: their population sizes and ecological histories, their mutation rates and spectra, and their genome sizes, and of our study traits: their mutational target sizes, the patterns of selection they experience, and the histories of those patterns. The effects of variation in these parameters are vast, and in many regions of this parameter space, large-effect alleles will typify the genetic variation we seek. In those cases, and after explicit assessment of QTN ascertainment biases, the QTN program may flourish and provide genuine insights—about those specific and constrained regions of evolutionary parameter space (Stern and Orgogozo 2008).

A second possible approach is to devote resources to the discovery of QTNs of smaller and smaller effect. Fine mapping arbitrary traits within small, arbitrary regions of genomes holds promise for revealing “typical” landscapes of genetic variation (Kroymann and Mitchell-Olds 2005). In fast-breeding model organisms, recombination mapping within near isogenic lines (NILs) is a natural approach, eliminating the genetic variance contributed by all but a tiny region of the genome, which can then be dissected to any desired resolution (Eshed and Zamir 1995; Darvasi 1998; Cicila et al. 2001; Monforte et al. 2001; Shao et al. 2010). The subtlest allelic effects, which may be quite important for evolution, are those revealed only under perturbation, and these can be mapped in sensitizing backgrounds or environments (Dworkin et al. 2003; Carbone et al. 2006; Paaby and Schmidt 2008). The combination of sensitization and NIL-based fine mapping with high levels of replication can, in principle, answer the question of whether a particular molecular variant is capable of influencing a particular phenotype. In a fortunate few model organisms, precise allele replacements could achieve the same aims more simply (Deutschbauer and Davis 2005). However, these NIL-based methods are not true solutions to the problems of the QTN program: they face all the same problems of detection power, merely to a lesser degree (Darvasi 1998; Keurentjes et al. 2007; Jeuken et al. 2008). Truly small-effect variants will continue to demand levels of replication not currently practiced, and perhaps not practicable. Even were it to work, it would come at a cost: the focal genomic interval of a NIL is isolated in a foreign genetic background, preventing any understanding of its potential epistatic relationship to its native genome. As Lewontin wrote in the book that defined our epistemological paradox, “context and interaction are not simply second-order effects to be superimposed on a primary monadic analysis. Context and interaction are of the essence” (1974, p. 318).

Lewontin described the crisis of evolutionary genetics in terms of theoretical machinery incapable of dealing with emerging data:

“For many years population genetics was an immensely rich and powerful theory with virtually no suitable facts on which to operate. It was like a complex and exquisite machine, designed to process a raw material that no one had succeeded in mining. Occasionally some unusually clever or luck prospector would come upon a natural outcrop of high-grade ore, and part of the machinery would be started to prove to its backers that it really would work. But for the most part the machine was left to the engineers, forever tinkering, forever making improvements, in anticipation of the day when it would be called upon to carry out full production.

“Quite suddenly the situation has changed. The mother-lode has been tapped and facts in profusion have been poured into the hoppers of this theory machine. And from the other end has issued — nothing. It is not that the machinery does not work, for a great clashing of gears is clearly audible, if not deafening, but it somehow cannot transform into a finished product the great volume of raw material that has been provided.” (1974, page 189)

The mother-lode of Lewontin's metaphor was allozyme data, and 25 years later such data had gone from milestone to millstone (Lewontin 1991). It seemed that allozymes were not the ore that the machine was designed to accommodate. The goal of the QTN program has been to generate the appropriate data, the identities of the alleles that underlie variation and divergence. I have argued that, 22 years into the interval-mapping era, the QTN program is incapable of collecting these data in the systematic manner that its questions require.

In the end, an embrace of polygenic evolution, whose molecular basis we cannot describe in particulate detail, is not the cop-out that Bateson suggested: it is cold empiricism. We should recognize the data that depart from our current narratives and we should unashamedly adopt macroscopic, statistical descriptions of genetic architectures when they are required. The QTN program is motivated by a commitment to collecting the empirical data that test alternative models, and we should abide by that commitment even when the empirical data do not take the form of QTNs.

Associate Editor: J. Hermisson


I thank many critical readers, including H. Seidel, A. Paaby, S. Rankin, D. Pollard, P. Phillips, and B. Gaertner. I am very grateful to the editors and referees at Evolution for their valuable advice; the manuscript has benefited greatly from their input. I thank the National Institutes of Health (R01GM089972), the Ellison Medical Foundation (AG-NS-0615), and the Human Frontier Science Program (RPG0045/2010) for supporting my research.