Seeing the forest through the gene-trees

What is the pattern in the human genome and what does it mean?

Authors


The human genome is a dense forest of biological information for us to find our way through. In the past, we could view it as a forest, comfortably assuming the nature of its unseen trees. But new technologies have generated masses of genomic data that raise unexpected challenges to a prevailing view that grew from a theory that melded Darwinian selection and Mendelian genetic causation. Both rested on direct, largely deterministic, and highly simplified concepts of the relationship between genes and what they do.

Darwin believed that natural selection was a fine-tuning mechanism that screened competing individuals to detect even the smallest difference among them.1, 2 The causal elements weren't known, but one could assume their existence, as Darwin did, and study the organisms they produced. If selection were universal, then biological functions must have adaptive explanations.

Meanwhile, the inheritance that Mendel documented was probabilistic, but in a very limited and rigid way, with fixed probabilities and a few genetically determined outcome states. The discovery of Mendelian determinism led to an extremely effective genetic research program that discovered the nature, location, and arrangement of genes and their protein-coding function, whose legacy we are reaping today.

Although Darwin's and Mendel's work developed independently and seemed to be addressing separate questions, by the 1930s these had been connected into a single, simple genetic understanding of life. However, the new data are revealing how complex life's genetic underpinnings actually are. Within each cell, more than six billion nucleotides of DNA encode countless thousands of functional elements. Each of our hundreds of types of cells uses these elements differently, in different contexts; each element can vary among individuals and even among cells, because mutations occur in them during life.

The challenge to understand this complexity is daunting, and some rethinking is in order. The data are revealing deep but subtle unity of genetic and evolutionary causation. Some of these similarities are summarized in Figure 1. For example, most alleles (variants at a given position in the genome) have low frequency in the population, a relatively local geographic distribution, and small effects on phenotypes, while common, geographically widespread, large-effect alleles are rarer.

Figure 1.

General trail map of the genome. Schematic distribution of characteristics discussed in the text and their general functional or epistemological nature. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

If we digest this new knowledge, findings that have been seen as mysterious are easily explained. But sometimes what we can detect contributes less, and what we can't detect contributes more to our understanding of life than has generally been thought.

INFERRING ANCESTRY: THE CONCEPT OF COALESCENCE

To both Darwin and Mendel, a single concept was fundamental: common ancestry. To Darwin, common ancestry was the cornerstone of evolutionary theory. If all life comes from a common ancestor, its diversity today is due to divergence from that ancestry. In that sense, a species is a unitary phenomenon. This is clear in Darwin's sole figure in The Origin of Species, where he spoke of the image we still use, the Tree of Life. Mendel made a similar implicit assumption: that all the individuals in his pea strains were from a common ancestor. All of the “yellow” or “wrinkled” elements were identical as a result of their inbred history.

It is because of evolution that, looking backward in time, today's variation appears to coalesce to a common ancestor. We may think of evolution in terms of change but, in fact, conservation, or homology, is essential to understanding life, and Darwin used patterns of conservation of traits as vital support for his theory. It might be seen as a strikingly satisfying confirmation that although Darwin had no understanding of genetics, when we now look at sequence data we see the kind of conservation he would have predicted. Indeed, Darwin's ideas were about the functionally adaptive nature of traits, but we confirm his theory with nonfunctional, nonadaptive DNA regions, such as introns, pseudogenes, and intergenic sequence, using the clock-like degradation of conservation due to mutations to reconstruct trees of ancestry.

According to textbook treatments of evolution, which still repeat classical theory, natural selection is seen as so specific in picking favored variation that it reduces variation between populations and would not produce reliable phylogenies. But that's wrong. Species phylogenies, if not the timing of their branching, can be constructed from data in adaptive regions of DNA, like protein-coding genes. This is because selection picks on locally extant variation, which diverges between populations in tree-like ways.

This goes further in an important way. Sequence data have unambiguously confirmed a 40-year-old idea3 that genome architecture—the nature and arrangement of its functional units—is the result of duplication events. This finding is vital to understanding the evolution of new functions.4 Periodic duplication creates gene families. The numbers and arrangement of gene family members also reflect phylogeny and, since after duplication, mutations accumulate in the individual genes, their sequence divergence also confirms the tree structure in a clocklike way. Thus, in multiple ways the evidence from DNA sequence provides independent, indeed striking confirmation of Darwin's ideas of divergence from common ancestry. Independent confirmation is among the most convincing evidence in support of a theory in science.

Furthermore, we find exceptions that prove the rule. Phylogenetic relationships can be problematic in some gene families, such as the antibody genes, olfactory receptors, or the SCPP biomineralization genes.5 In these families, it can be difficult to identify specific homologues (that is, differentiating duplicate paralogs from directly descended orthologs). These families exist in adjacent multigene clusters that rapidly accumulate sequence variation because of their function: to recognize as many different pathogens or odorants as possible, or simply to capture calcium ions. These functions do not depend on high degrees of sequence conservation, so the genes accumulate variation randomly or even aided by diversifying natural selection. Also, in dense tandem clusters misalignment during meiosis leads to frequent deletion or duplication. The result of this variation fogs phylogenetic, tree-like relationships.

Phylogenetic signal is altered in another way that we had long understood in principle, but that has become much clearer with sequence data. At the nucleotide level, evolution guarantees there must exist a coalescent, of which all copies today are descendants. A perhaps surprising, but fundamental, implication is that while each nucleotide has a coalescent, a single path back to a single common ancestor, this is not so simply true of the genes that make a human or a pancreas.6 At each time, each segment of each gene has passed through differing genomic environments. Each gene has its own complex path to common ancestry, and the coalescent times, places, and individuals differ greatly. There was never a single ancestral human or pancreas.

There was never a ‘mitochondrial Eve’ or ‘Y-chromosome Adam’, either. That cute marketing device has led countless students and professionals to misperceive how evolution works. For nonrecombining sequences such as these, we presume there really was a single ancestral copy, but copies of the other genes in those individuals are unlikely to be here today. In any case, the coalescent mtDNA and Y-chromosomes would have been at very different times and locations.7 There were always populations.

HOW CLOSE ARE OUR EVOLUTIONARY COUSINS?

Although human genetic data have been accumulating for many years, we have only recently had whole genome sequences from specific individuals. Table 1 summarizes some of what we've seen so far.8 Probably the most important single finding is the huge number of unique or at least very rare sequence variants (single nucleotide polymorphisms, or SNPs). Representative individuals from three continents were found to have 736,261 previously known SNPs, but an average of 754,443 variants were unique to each individual.

Table 1. Variation in individuals whose whole genome has been sequenced. Numbers of known and newly discovered variants and protein-changing variants found in sequenced individuals. Khoisan and Bantu are from southern Africa. Schuster and coworkers9
IndividualGenomic SNPsNovel SNPsCoding SNPs
Khoisan4,053,781743,71422,119
1,181,663181,42719,593
125,84825,48517,739
136,98530,96319,226
3,624,334412,75417,342
Bantu3,624,334412,75417,342
Nigerian2,639,169115,84316,431
3,586,490216,96817,268
European2,060,54498,92611,868
3,074,574160,37015,079
2,968,31233,57513,375
2,972,12036,12013,317
Asian3,074,06184,78615,759
3,439,097130,56616,637

This must be so: A mutation arises about once every 40 million nucleotides per parent-offspring transmission, so each newborn infant carries roughly 155 new mutations in its 6.2billion nucleotides. New mutations start out as single copies, but even successful ones will take many generations to reach substantial frequency in our slow-reproducing species. As Figure 1 suggests, there will always be vast numbers at low frequency, recently arisen, geographically local, and unlikely to be included more than once, if at all, in random samples of individuals. These countless sites reflect new or very recent mutations, which should be roughly similar in amount and uniqueness anywhere in the world. Hence, they are not so useful in reconstructing population history. They are the leaf litter on the genomic forest floor.

Among the other noteworthy findings from human whole genome sequences is the many thousands of protein-coding variants found in each person. The donors have been healthy people, usually middle-aged. This suggests that amino acid changes are not as uniformly or strongly deleterious as is often assumed in textbook Darwinian theory.

To reconstruct population history, we often rely on older alleles, because the older an allele is, the more geographically widespread it is. SNP alleles found on multiple continents reflect mutations that occurred before the human expansion out of Africa some 100,000 years ago. These are useful in reconstructing our species' global as well as local history. Because humans typically exchange mates from neighboring groups, this gene flow means that allele frequencies have geographic coherence; that is, they change gradually, if sometimes irregularly, over space. Genetic analysis shows that genetic similarities roughly correspond to trees of language and cultural evidence from the same populations because culture also reflects population history. But genetic variation is subtle.

We can use the frequencies of globally present alleles to examine genetic differences within and between sampled groups. For a person with a given genotype, say AA, at some locus, the probability may be substantial, or even greater, that a random individual from a different continent, rather than from the same continent, has the same AA genotype. For example, if the A allele frequency is, say, 0.1 in the first continent, but 0.6 in the second, the probability of an AA genotype is only 0.01 in the person's same continent, but 0.36 on the other continent. This might seem to suggest that we're all the same worldwide, except for a few genes like those responsible for skin color. However, if many loci are considered genome-wide, the multi-locus genotype similarities are much greater among people from the same continent than among those from other continents.10–12 The continent of indigenous origin is unambiguous, even if no two people from the same continent have exactly the same genome-wide genotype. Genome-wide, humans carry polygenic genotypes that differ probabilistically much as many phenotypes are polygenic.

Genome-wide geographic affinity is even stronger at loci that have been affected by natural selection. This is because selection affects the frequencies of alleles that are found locally, and they usually differ from place to place. The picture becomes more complex, but ancestry is clear in the expected ways in populations, such as that of the United States, in which there has been recent admixture among peoples moving there from distant continents.

These geographic relationships must be so if our understanding of evolution as a phenomenon of population history is accurate.13, 14 But the ability to use such data for unambiguous identification of individuals' place of origin depends on how much data are included and the location of the samples one chooses to analyze. At a more detailed local level, continent of origin may be clear, but local group affinity less so.10 Also, nothing in genetic data suggests categorical “race” divisions. It is obvious that individuals from the same geographic area are far from identical.12, 15, 16

This is strange! If races exist according to the usual notion, mustn't there be genetic variation common on one continent but absent elsewhere? In fact, few variants are highly common in one continent yet absent elsewhere. That's what we know to expect from human population history. Alleles not essentially fixed within one continent but absent elsewhere cannot be the basis of a categorical “race” in that continent.

The flood of DNA sequence data provides excellent information for reconstructing human history in an increasingly fine-grained way, using a variety of analytic approaches.17–20 But probably the most important point is that these new data raise no conceptual challenges to our understanding of human history. That hasn't changed substantially for decades with perhaps one major exception.

Genetic data increasingly suggest that anatomically modern humans expanded out of eastern Africa around 100,000 years ago and somehow replaced the hominins who had been resident across the Old World, adapted to all its ecological diversity, for roughly a million years. That challenges the alternative “multiregional” hypothesis. There is still active debate over when or whether, later on, Neandertals admixed with contemporary “modern” humans. Extensive sequence data from fossil specimens are now available, and although they are somewhat ambiguous, they currently suggest that there may have been some admixture before Neandertals disappeared.21 At least as interesting as detecting such admixture from ancient DNA is the challenge to develop a convincing explanation of how the replacements actually happened and whether they were based on cultural differences alone or involved genetic differences.

MAPPING GENETIC CAUSATION

As we wander through the thicket of our genome, it is natural to ask what all that DNA is doing. What are the phenogenetic connections between genes and traits? There are many ways to identify genetic causation. The easiest cases for us are the same as Mendel's. When there are two very different true-breeding states, such as a serious disease involving a known protein, we can identify and sequence the gene to find the responsible variants. Hundreds of such traits are known (see, for example, www.ncbi.nlm.nih.gov/omim), though once the gene is identified, much more allelic and phenogenetic complexity is usually discovered. There can be many alleles; their penetrance, or the probability of manifesting the trait, can be low.

More interesting and more challenging are the complex traits having variation that is of primary interest to both evolutionary anthropology and public health. When the underlying biology is largely unknown, as is the case for many psychiatric or behavioral traits, or too complex to understand from physiological studies alone, as in diabetes or obesity, various approaches are used. These are known as mapping methods. Their objective is to search the entire genome to find genetic variation that is statistically associated with variation in the trait.

The favored mapping approaches today are called genome-wide association studies (GWAS). Sampled individuals such as cases and controls for some diseases state are genotyped at large numbers of genetic markers, or variable sites of known locations spanning all the trees in our genome forest at regular intervals. The idea is that the gene or genes having variation that is responsible for our trait's variation must lie chromosomally near to, and thus be statistically associated with, at least one of the markers. The chromosomal region can then be explored to identify the causal elements.

A remarkable feature of mapping is that it can be done for any trait, normal or otherwise, even if nothing is known about its biology. In this sense, genome-wide mapping is free of specific hypotheses about the nature of the trait, except that genes somehow affect it.

Mapping involves only present-day variation, but it is actually an evolutionary approach because it relies on the assumption of identity by descent. It assumes that specific nucleotide changes rarely recur within the same population, so that a marker allele found in two different individuals (say a G rather than an A in a given genome position) are descendant copies of the same ancestral mutation, that is, today's copies coalesce to that event. The same assumption is made regarding the unseen sought-for causal variant responsible for the phenotype (for example, affected versus unaffected status) that is chromosomally near the typed marker.

The history of joint transmission of marker and causal allele generates a statistical association between them, which is why the typed marker allele points to the unknown causal one. Fortunately, humans are a young species, with major recent expansion from small ancestral populations. Rapid, recent expansion preserves association among chromosomally nearby alleles. We are now awash in mapping results. For obvious funding reasons, most of the data are from studies of human disease, though the picture is the same for variation in normal traits that have been studied. The results are rather telling.

Assessments of the success of extensive GWAS vary. Some, especially those with the greatest vested interest in the approach, give a very positive assessment,22, 23 while others are more circumspect.24, 25 Nobody disputes the typical findings: a few chromosomal locations generate statistically believable evidence of effect (Fig. 2), but each such effect typically accounts for only a fraction of the overall genetic effects as measured by its heritability; that is, by the degree to which the traits cluster in families.24, 26 What is disputed is how well various technological adjustments and augmentations might raise the explained fraction, or whether the small fractions generally accounted for to date are “important,” as in potential clinical applications. For example, it is argued that even if a low-penetrance gene's contribution is too weak to be directly important, it may at least identify unsuspected causal gene networks that can be investigated.

Figure 2.

Representative sample of GWAS results. Large case-control study of around 2,000 cases for each of seven major chronic diseases in Britain and about 3,000 controls. Each row portrays the aligned entire genome (except the Y chromosome); the chromosomes are numbered and identified by alternating dark and light bands. For a given trait, moving along the genome, each dot reflects, by its vertical position, the statistical significance of marker alleles at its location; the plot looks mainly solid because a total of ∼500,000 markers spaced across the genome are crowded into each row, and most sites generate very low significance. Only a few statistically significant “hits” are found for any trait (three examples indicated by arrows). Reprinted by permission from The Wellcome Trust Case Control Consortium.26 [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

There are evolutionary issues here. Much of the infrastructure for GWAS was based on the assertion that, in general, common variants would account for common disease.27 Predictably, this was wishful thinking for evolutionary reasons that anthropologists, if not biomedical geneticists, should have recognized.28, 29 Given the heterogeneous, stochastic nature of evolution, which generates the kind of causal spectrum illustrated in Figure 1, there will be traits for which a few relatively common variants do account for much of at least the pathologically interesting variation. Age-dependent macular degeneration and Factor V Leiden clotting factor are examples. For most traits, however, many genes, even hundreds, appear to contribute in aggregate, but individually only very slightly.30–33 Yet to date, after many studies with dense markers, hundreds of these minor contributing genes typically remain unidentified.34

Complicating this picture is that in searching a chromosomal region implicated by GWAS mapping, we are drawn to genes because it is easy to identify alleles that change an amino acid or disrupt the protein code. But for most normal traits and most complex diseases, with which individuals can live normally for decades, altered timing and level of gene expression may be more important than altered gene structure. GWAS hits to date generally support this expectation. Unfortunately, identifying regulatory sites is still an art form. It remains a major challenge to identify the specific causal variants in regions implicated by mapping.

Given the statistical vagaries of complex effects, usually rare and weak, mapping hits can be quixotic, appearing to have an effect in one study but not in the next.25 The statistical significance criteria for identifying hits in GWAS typically lead to upward bias in estimates of effect strength.35 Even if there is no genetic effect, if you search hundreds of thousands of markers you will find many that seem significantly associated with your trait. To account for this, replication is critical. Because of the cost and difficulty of GWAS, meta-analyses are undertaken, pooling data from existing studies to attempt to increase sample size and find the truly genuine effects.36, 37 The idea is that a real effect should be found in different samples.

However, an allele's effect will be consistent only to the extent that the background of environmental and other genomic effects are reasonably similar among study samples. Referring again to Figure 1, what we know of evolution warns that this is a problematic assumption except for major effects with allelic cause that is old enough to be present with sufficient frequency in different samples or populations, and strong enough to be visible against locally specific genome-wide variation in other contributing genes, not to mention environmental exposure differences. This means that even true findings from one study need not be replicable in other studies.

At least as important as the fact that most mapping hits are not replicated is that the few that are, even in total, usually account for only a fraction of the variation. Human stature is perhaps an archetype, because it is one of the most highly heritable traits known. At least 80%, and in many estimates over 90%, of the variation in height, adjusted for cohort, is genetic as measured by various data such as parent-offspring correlations.28, 38 Large GWAS have found that the roughly 100–200 most statistically significant genome locations, out of hundreds of thousands tested, account in aggregate for only 10% of stature variation, less than 15% of the overall genetic contribution.38–40

These results frustrate the often-claimed hopes of a bonanza of easily identified genes with major impact.41 But what we have seen so far is absolutely expected on evolutionary grounds, and it is not difficult to see why. The multilocus nature of complex traits has been known for decades from statistical studies of phenotype correlations among relatives and measures like heritability.42, 43 Complex traits have been assembled bit by bit over millions of years, involving a highly intricate fabric of cooperation among many different developmental signaling and homeostatic gene networks.44 Regulation of even a single gene involves tens of transcription factor proteins, which are coded by other genes that themselves need to be regulated. Such regulation also involves comparable numbers but more complex DNA-based transcription factor-binding sites flanking the regulated gene. Alteration of the coding or regulatory sequence in any of the participating genes can generate phenotypic variation. Mapping approaches are designed to detect those effects. However, when there are tens, hundreds, or even thousands of contributing genes, as some estimates from various mapping approaches estimate, it is no surprise that we are not finding much, even when a trait really is highly genetically controlled.

We know from protein and gene-regulatory structure that mutations have a distribution of relative effect in the genotypic ecology of traits. There are exceptions to almost every generalization about life but, as shown in Figure 1, the relative effects of known alleles are usually inversely related to their frequency in the population.24, 45 Most nonlethal mutations have minimal effect, muted by complexity, and are contextually dependent on the environmental and genomic background of individuals carrying them. These contextual effects can be of the same order of magnitude as that of the allele under consideration. Indeed, recent estimates are that around 10% of known serious-disease-causing alleles in humans are the normal allele in other mammals.46, 47 The fact that effects found by mapping depend on the genomic background has also routinely been shown by the fact that an allele with a major effect in humans has similar effects only in some strains of laboratory mice, and sometimes no effect at all.

A consequence of the very low, rather than high, frequency of alleles that do have independently strong effects is multiple unilocus control, in which each individual or family with an unusual trait value is so because of a different rare mutation. There are many examples of this, such as hereditary deafness and retinitis pigmentosa (an eye disease). Such case-specific effects are naturally difficult to replicate. Things may be even cloudier if, as seems likely, instances of unusual trait values are due not to single rare alleles, but to combinations of them, which means that each case will be a unique genotype.25, 48–50 This is just what we expect evolution to generate: major effects will be rare and eliminated if harmful, or quickly raised to high frequency if helpful. But most will be recent and rare (Fig. 1).

From a biomedical point of view, these issues are important to those who believe that the future major advances in health depend on personalized genomic medicine, in which the idea is to predict a trait, especially a disease, from an individual's genotype. And if it works for disease, designer children will be next. But the complexity of genetic causation, as well as its evolutionary explanation, are clear. Genes do not act alone. Thus, there is more in the forest to make our way through than just individual genes.

DNA is inert by itself, and the effect of a gene depends on its context, which includes the rest of the genome, the cells in the organism, and the external environment.51–53 The environment even includes the genomes of other species, such as symbiotic bacteria in our gut. In utero gestational conditions can affect an individual's lifetime phenotypes, including level of body fat, diabetes, cancer, and aging.51, 54 These can in turn be imprinted by means including epigenetic modification of the DNA that affect gene expression but not DNA sequence, and then inherited by the subsequent generation.51, 55

Complicating all of this environmental underbrush is a serious but unappreciated fact, that estimating phenogenetic effects is necessarily retrospective: We observe phenotypes of individuals today and relate those to the sampled individuals' genotypes. Yet what we want in the drive for personalized genomic medicine is to make prospective phenotypic predictions for genotypes for individuals in the next generation. Selection only works on the manifestations of genotypic effects in the environments at any given time; the past is not always prologue. Nonetheless, we may be able to do better at clearing the path than we have done so far.

SIGNIFICANCE BEYOND “SIGNIFICANCE”

There may be few giant oaks in our genomic forest, but we should also be able to find the smaller trees. More intense and clever mapping approaches will help, but it seems clear that this will largely yield more, even smaller effects than we already know of. But we can gain a better understanding of genetic causation in a different way, taking a hint from the observation that criteria such as parental trait values—Francis Galton's original criteria for the heritable effects of quantitative traits—currently yield better predictions of offspring trait values than do genes identified by conventional GWAS.56 This is easy to understand. Correlations among relatives aggregate all genetic effects without the need for them to be enumerated.

The problem is simple. We have been rooted by tradition into using statistical significance tests as the criterion for discovery. But if we test hundreds of thousands of markers at the usual p-value of 5% as the significance cutoff for a marker's effects, we may detect not only real effects, but also thousands of false positives (5% of 100,000 means 5,000 false-positive tests). Such numbers would be impossibly costly to follow up. So a typical approach has been to insist on a more stringent cutoff criterion, such that there is only a 5% chance of falsely finding any genome-wide signal. Such a revised significance cutoff, called the Bonferroni correction, is often applied essentially by dividing 5% by the number of tests. So for 10 tests one would only accept an individual test having a p-value of 0.5%. However, when thousands of tests are done, such a correction is so stringent that minor truths are almost inevitably missed. Attempts to ameliorate this problem adjust in the opposite direction, using weaker cutoff criteria for “suggestive” significance or a more forgiving false discovery rate (FDR) criterion.57 But any significance cutoff criterion is not only subjective, but intentionally tolerates the omission of weak but true effects. What if we just ask what the data tell us overall?

In fact, if the stringency of hypothesis testing is relaxed, it is possible to be more inclusive and to identify many more of the contributing genes, and even to use them to predict phenotypes of individuals much as classical parent-offspring regression analysis does. Instead of concentrating on the few “significant” results by an a priori standard, one can apply a well-established statistical classification approach, called a receiver operating characteristic, or ROC.58 That approach gradually relaxes a cutoff criterion such as the p-value or some other measure of effect, which is applied to each tested marker site across the genome, and asks how well the set of sites included by the relaxed criterion predicts the sampled individuals' phenotypes. At some cutoff level, the accuracy of prediction, or fewest misclassifications, will be optimized, greatly increasing the predictive power of a GWAS sample.59–62 Similar inclusive approaches can help GWAS results identify gene networks that contribute to a tested trait.25, 32, 63–65

This approach has been applied to human stature. As noted earlier, statistically significant stature-mapping hits account for only about 10% of the heritability. An inclusive approach did much better, accounting in the same data for much more of the heritability.40 Many different kinds of genes were in the mix of contributing genes, but there was some statistical clustering of hits in genes related to skeletal biology.39

This directly confirms the classical model of polygenic inheritance as articulated by Fisher in 1918.66 A key feature of Fisher's model is phenogenetic equivalence, according to which, when many genes contribute to a trait, different genotypes can produce the same phenotype, such as a given height. However, the fact that we can confirm this classical theory does not lessen the problems we face, which are both practical and evolutionary. For example, the authors of the largest stature study to date39 estimate that it would require a sample of nearly 500,000 people to identify an estimated 700 loci that could account for 15% of the total variation. However, even that is only about 20% of the overall genetic contribution as measured by the heritability. Only a tiny fraction of these loci have individual significance, much less useful predictive effects. The rest have predictive value only in combination, which is unique for each individual.

While confirming classical polygenic theory this theory, combined with what we know of human population history, implies that the genotype cannot be inferred from the phenotype. The set of contributing variants and their frequency will vary from sample to sample and from population to population. Many, if not most of these genes, will have many alleles.39 Also, phenotypes cannot reliably be predicted by genotypes. GWAS-based estimates of an allele's effects may help account for variation in that sample, but will do so to a lesser and unknown extent for other samples even from the same population. This knowledge turns our attention back to evolution, because if we cannot infer individual genetic causation with all our genotyping technology, natural selection cannot work directly at the individual gene level either.

THE SEARCH FOR EVOLUTIONARY MEANING

If the genomic data have shown us the problems in inferring gene function in contemporary samples, what can we say about the role of natural selection in molding that function, especially as it applies to our own species? In principle, selection leaves various kinds of signatures in DNA sequence.67–70 Each has an optimally informative time depth, as shown in Figure 3. For example, adaptive functional changes are expected to be few relative to all changes, so that time must elapse before enough changes so as to be detected can accumulate. Heterozygosity (sequence diversity) in and around the favored gene will be reduced by selection. There may also be more derived (new) alleles relative to the ancestral alleles if selection has been favoring those new alleles.

Figure 3.

Some DNA sequence-based tests for selection and the approximate time-depth for which they are informative relative to human settlement history, Human geographic history is shown on the bottom, based on 1 generation = 20 years. Laid onto that history above are the optimally informative time depths of various aspects of sequence data that may reflect a history of natural selection. For examples, see text. Redrawn after Sabeti and coworkers.72 [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Figure 3 also shows that under selection populations will diverge in the region of an allele favored in one population but not another, and that the evidence of this differential selection can persist for a considerable time. When selection increases the frequency of an allele, the haplotype the allele is on—the particular sequence variants in the surrounding chromosome region—will be longer and increased in frequency. That is a signature of recent selection, because over time recombination and mutation will erase the evidence of this hitchhiking effect in the sequence flanking the favored allele. Related to this, the coalescent of a gene's sequence will be unusually recent if it has been affected by selection.71

Natural selection generally reduces variation in affected genome regions relative to neutrally evolving regions. The easiest reduction to detect is from purifying selection, which rejects harmful mutations, presumably because it is easier for mutation to disrupt well-established function than to improve it. Purifying selection is reflected in the evolutionary conservation of protein-coding (exon) or known regulatory regions. But such conservation is generic, affecting all genes. What we most want to find are the fewer nonconservative changes that reflect positive, adaptive selection that has built new or modified function.

In this context, an obvious question to ask of our new genome-scale data is what are the genetic changes that made us human? We can address that question by comparing our genome sequence to those of our closest ape relatives. Aligning the two sequences is easy, but the interpretation is not. We are only 6-7 million years apart from our closest ape relatives and our genomes are 95% or more identical in any pairwise comparison (in nonrepetitive DNA). That's a lot of similarity, but 5% of 3+ billion nucleotides is a typical difference of over 150 million. Under the slow pace of most natural selection, it is difficult to detect the additional divergence in functionally adaptive DNA relative to neutrally evolving divergence.

In genes affected by adaptive selection, we can expect relatively more amino-acid-changing mutations than synonymous mutations. A standard statistical test called the McDonald-Kreitman or MK test73 can be applied, but even with selection the number of altered amino acids required for adaptive change in a given gene without causing more harm than good would likely be one or a very few. These can be very difficult to detect statistically relative to the few synonymous changes in the same gene. Moreover, most adaptive changes have probably involved gene regulation (level, timing, cell-specific location) rather than protein structure, which is consistent with the GWAS findings in contemporary variation. The reason is that most genes are pleiotropic; that is, they have many functions, which are often unrelated. An amino acid change is unlikely to be helpful for all of these functions and might usually be rejected by selection. But expression is controlled by short modular regulatory transcription-factor binding sites flanking a gene, which partition a gene's use in context-specific ways and can be easily altered by mutation.

Unfortunately, detecting signatures of selection in short regulatory regions is much more difficult than in coding regions. We simply are not yet good enough at identifying regulatory regions, which are complex and not located in fixed positions, to achieve effective identification and comparison.

However, it is possible to slide a “window” along the aligned chimpanzee and human genomes to search for regions that are much more divergent than average, regardless of known function, hence freeing our attention from the restriction of protein-coding regions. Such regions have been found (Wikipedia: human-accelerated regions). Attention has naturally concentrated on genes with brain-related function, but the proposed explanations to date have been speculative at best.

What about adaptive changes that may have occurred within humans since our separation from other primates? Although genetic data reveal the vagueness of racial classification, obvious human phenotypic differences such as skin color are geographically patterned and are often attributed to selection. Can we find the genetic evidence for that?

The easiest examples to find are adaptive responses involving only one or a few genes. The classical example is the globin gene variation, which provides resistance to malaria. Selection has been recent and very strong although even here many mutations in different components of hemoglobin, differing within and among continents, have been found. Many genes related to skin pigmentation are known. Signatures of selection in these genes have been found, most likely reflecting geographic variation in exposure to ultraviolet light, but again with different genes involved on different continents.74 Another classic case, perhaps the simplest, involves the adult ability to drink milk, which seems to have resulted from selection involving expression of the lactase (LCT) gene, independently in European and African populations with a long history of dairying.75, 76 Also, recent evidence implicates genes in the HIF oxygen responses system in adaptation to high altitude.77, 78,90

More problematic are the results of general genome-wide searches for selection in which the objective was analogous to GWAS mapping: to let genome-wide data show us where selection has occurred so we can then identify the gene and try to understand the reason. On example is change in the frequencies of existing alleles in response to environments changed, for example, by climate.79, 80 Despite many searches, I think it is fair to say that only a modest number of convincing signatures of selection have been identified.69, 81–83 Most studies have involved comparison of only a few samples, usually representative of only a part of a continent (for example, one sample each from west Africa, northern Europe, and east Asia). The results are similar to those of GWAS in that few hits were found, and they did not always include the known cases such as those mentioned earlier. There are many reasons for this. Even when selection is presumably clear, samples from west or south Africa cannot detect evidence of selection for adult lactase persistence at LCT that occurred in eastern Africa. But the problem is worse than this.

Figure 4 shows some of the results of a more fine-grained geographic sampling. About 80 positive signals were found scattered across the genome. A few were found globally, but most were detected only in samples from a restricted geographic region. This is an improvement, but the result still seems strange. Even including just six world regions, with our 23,000 protein-coding genes, that's over 120,000 tests, not counting the many-fold that many tests were done over other functional regions, like regulatory sequences, which the genome-spanning markers also queried. Yet from Tierra del Fuego to Cape Town, we vary in almost every trait inside and out, from lowland to highland, wetland to dry land, continent to island, and tropics to ice-land. If life is as relentlessly Darwinian as its popular image, where is the evidence?

Figure 4.

Geographic patterning of statistical evidence for selection. Each row represents a chromosome location labeled left and a candidate gene labeled right (where known). Columns are geographic regions: Middle East, Europe, Central/South Asia, East Asia, Oceania, Americas. The gray scale denotes relative statistical significance. The identity of the genes is unimportant for the points being made here. See Lopez Herraez and coworkers69 for details.

The answer is that the same problems challenge selection mapping that challenge the GWAS trait-mapping discussed earlier, and for the same reason. Most traits are affected by variation in large numbers of genes. Different genotypes at these loci can generate the same phenotype, and they will be selectively equivalent to each other. Selection is usually weak, only trimming away the worst or favoring the best tip of the tail of the phenotypic distribution. The net selective coefficient favoring the individual alleles at a given locus will be weak to very weak. Under these conditions, the fate of most individual alleles is largely determined by drift rather than selection.4 A few have stronger effect and respond faster to selection, and these are the ones we detect. Selection can push a trait in some direction, just as Darwinian models posit, but we would still not expect to identify most of the contributing genes.

This is just what we find. There is a high correlation between the frequency of selectively favored alleles and geography, as would be expected under drift, and as we find in the data described earlier.84 This has recently been described as “soft” selection rather than strong selective “sweeps.”84, 85 But we don't need these artificial terms because we're just observing the kind of directional adaptive selection on polygenic traits that is what we should expect. There is no more surprise here than in the widely proclaimed mystery of the failure of GWAS to account for the heritability of complex traits. The faulty expectation was not in the stars, but in ourselves, that we have been understating the problem.

Following again the trail-guide in Figure 1, most alleles are rare and, if viable, have small effect, if any, on a trait, and hence small individual effects on fitness. We expect occasional alleles with nontrivial effects and/or higher frequency to be present at any given time. If it is an old allele, it can be frequent and widely dispersed enough to be replicated in different studies. But the signature of a local selection history is detected only in the appropriately geographically restricted sample.69, 70, 72, 84 All of this is just what we see.

The genetics of stature illustrates the connections between trait mapping and selection mapping in another revealing way. Stature was measured in the Swiss canton of Schaffhausen in the 1880s and again in the 1980s.86 Because of dietary and other life-style changes, the distribution shifted to the right, toward taller mean stature, over this century. Just as selection for increased stature would favor alleles with a strong effect in that direction, environmentally induced stature increase should lead to a greater contribution by the most responsive alleles. If causation were simple, with only a few such genes, we should find them as major mapping signals today.45 But we don't.

As with genome mapping, searches for selection have tended to rely on statistical cutoff criteria. But this is a subjective decision, an artifact that need not be applied to the evidence. In the same way that relaxing statistical cutoff criteria identifies more genome regions that contribute to phenotypes, relaxing significance criteria can also identify more of the regions contributing to adaptive change (unpublished work in progress). But its power to assess fitness will lie in the aggregate rather than individual genes.

THIS IS THE FOREST PRIMEVAL

We have all been trained to a gene-centered view of life. Mendel's experiments provided a powerful research approach to identify and understand aspects of genes and their function under clear-cut conditions. But that lured us into expecting that simple control and adaptive evolution were more general characteristics of traits. “Mendelian” diseases were carefully chosen instances of tractable inheritance to study genetic causation in which there were few strong effects. Traits like sickle-cell hemoglobin and malarial resistance gave a similarly simplistic impression of Darwinian evolution by a few very strong adaptive effects. But these were always illusory simplifications.

Evolution is a flow-through of variation added to by mutation, recombination, and gene flow, and lost to selection and drift. But causal and hence evolutionary specificity are far more fluid than we had thought, and hence less tightly connected. Evolution works by phenotype, not genotype.87 Even when evolution is affected by selection, if many genes are involved in a trait there can be phenogenetic drift.88 The trait can persist while its underlying genetic basis changes. Among populations and over time, the same trait can come to be produced by different genotypes, with different relative contributions from different variants at the same genes or even from entirely different genes.89 Phenogenetic drift is the evolutionary equivalent to the multiple genoptypes that generate the same phenotype in complex traits, and that means many paths to the same fitness. To a considerable extent, natural selection may rule the phenotypes, but drift rules the underlying genotypes. Even if there were no environmental effects and every instance of every trait were strictly controlled by genes, the connection between specific genes and specific phenotypes would be quite fluid.

As hundreds of known “Mendelian” diseases show, some mutations in critical genes can cause serious diseases, but most genetic variation has small, subtle, contingent effects on traits. This is why GWAS do not find them. For the same reason, allele's with major effects usually reduce fitness greatly, so that it is the variation with minor effect that may be the basis of most adaptive evolution. This is why searches for selection can't find them either. This is no surprise, but is quite different from the usual image of natural selection.

Overall, complex genetic architecture with the general attributes shown on the left of Figure 1 is a common or even predominant characteristic of life. That means that some signals will be found, but may be over-interpreted as being more important than they are because so much of the signal is undetected or changeable. Searches that find little will under-interpret that as no evidence because of a lack of single genes that, in a given study, happen to have statistically detectable effect. Interpretations of GWAS and searches for signatures of selection alike have tended to overstate the few positive findings and to wring hands over the common failure to find more.

As things look today, these are facts of nature, not reflections of inadequate technology or sample sizes. Pleiotropy and multilocus causation are, in a sense, fundamental to the way nature has assembled complex traits over the eons of history. Even if the screening eye of natural selection is ever-present, it is not all-seeing in gene-specific terms. And if, as the evidence suggests and as makes theoretical sense, drift vies with selection in determining the fates of alleles, a very different picture of evolution emerges at the phenotype versus genotype levels. That picture requires some rethinking on our part. Our simple Mendelian-Darwinian world view is wearing thin as a theoretical basis for evolution and for interpreting the causal forest that is our genome.

The challenge to rethink may apply nowhere so much as it does to anthropology. This is because of the complexity of our cultural environments, resulting in behavior that is not transmitted as genes are, has only loose relationships to the “objective” environment, and cannot be predicted by wiring diagrams or brain scans. But anthropology has, as a rule, not been very deeply aware of modern genetics or even evolutionary theory. It has been easier, and acceptable, for us to live in a land of speculative story-telling. But the new data are showing us that telling stories is not enough.

Instead, we're learning the limitations of a focus on the genetic trees rather than the organismal forest. This is the legacy of the relatively little genetic knowledge that was available in the past and the research history that was enabled by Mendel's discoveries and Darwin's simple ‘law’ of natural selection, both of which led us to focus on the tail of the casual distribution that easily fits those expectations. But that leaves the rest of the distribution, the bulk of what the genome does and how we evolve, poorly understood and sometimes hardly even acknowledged. Even with a trail map such as that given in Figure 1, the gene trees are elusive and rapidly changing. They may not even be enumerable as we try to grasp the nature of the forest that is our genome.

NOTES

I welcome comments on this column: kenweiss@psu.edu. I co-author a blog on relevant topics at EcoDevo Evo.blogspot.com. I thank Anne Buchanan and John Fleagle for critically reading this manuscript.

Ancillary