What has QTL mapping taught us about plant domestication?


  • Andrew H. Paterson

    Corresponding author
    1. Center for Applied Genetic Technologies; and Departments of Crop and Soil Science; Botany; and Genetics; University of Georgia, Athens GA, USA
      Author for correspondence: Andrew H. PatersonEmail: paterson@dogwood.botany.uga.edu
    Search for more papers by this author

Author for correspondence: Andrew H. PatersonEmail: paterson@dogwood.botany.uga.edu


The aim of this paper is to survey the general area of quantitative trait locus (QTL) mapping, and its specific impact on current understanding of plant domestication. Plant domestication is not only of historical interest, but is also of ongoing importance as changing human needs and availability of nonrenewable resources impel continuing (and perhaps even accelerated) investigation of prospective new crops. New genomic tools applied in conjunction with now-established approaches such as QTL mapping are opening new doors into searches for the ‘footprints’ of domestication, and promise to accelerate and streamline the identification of specific genes integral to domestication(s), building on early successes. Better understanding of plant domestication promises to enhance knowledge about the developmental basis of some of the more striking evolutionary events known, to guide efforts to catalog plant biodiversity, and to accelerate progress in improving existing and new crops to sustain humanity.


Summary 591

I. Introduction 592

II. A backdrop: QTL mapping basics 593

III. The tempo of domestication 596

IV. Domestication and polyploidy 601

V. New approaches to identifying the footprints of

domestication 603

VI. Perspectives 605

Acknowledgements 606

References 606

I. Introduction

‘… how great is the power of man in accumulating by his Selection successive slight variations.’ (The Origin of Species, Charles Darwin 1859).

Although it is a very recent aspect of our history, improvement of crops is inexorably tied to human social evolution (Raven et al., 1992). Plant domestication surely ranks among the key events in determining our ability to sustain modern human populations. It is a testament to the productivity of agriculture and argiculturalists that the importance of agriculture to human well-being too often goes un-noticed, with only infrequent reminders of its central role in our existence when it occasionally fails. Humans sequester an astonishing 40% of the entire terrestrial primary production of the earth for their own use, and this percentage is increasing (Vitousek et al., 1986; Tilman, 2000). Global population growth (for example, see http://www.census.gov/cgi-bin/ipc/popclockw) impels continuing research into improvement of agricultural productivity under both current and alternative (Jackson & Jackson, 1999) production systems. It confers a false sense of security that the urgent need for invigorated agricultural research is currently masked, in the nations best positioned to offer technological leadership, by historically low inflation-adjusted commodity prices and growing dependence on imported crops.

A good working definition of domestication is ‘… a coevolutionary process by which human selection on the phenotypes of … plant populations results in changes in the population’s genotypes that makes them more useful to humans and better adapted to human intervention’ (Clement, 1999). Varying degrees of change can be recognized among the world’s crops, ranging from ‘incidentally coevolved’ to ‘modern cultivars’, reflecting the duration, intensity and genetic gain realized from human selection. Genetic change is an enduring consequence of plant domestication – therefore, the genomic revolution offers new insights into its history and new opportunities for crop improvement, in much the same manner that it offers new insights into human history and new opportunities for biomedicine.

The past two decades have witnessed unprecedented growth in our understanding of the structure and function of plant genomes, our ability to dissect the heredity of specific traits (especially those that are controlled by many independent genes), and our facility to manipulate genes and genomes to produce crops better suited to human needs. The first complete sequence of a flowering plant genome, that of Arabidopsis thaliana, has recently been published (Arabidopsis Genome Initiative 2000), and rapid progress (both public and private) is being made toward the first sequence of a crop genome, Oryza sativa (rice). While much has been learned in the past decade about plant heredity, this paper is written with the tacit assumption that rapid progress in the next few years will impel another author to take up the (silicon) quill.

A tool that has contributed singularly to advances in our understanding of domestication in the past decade has been the ‘genetic map.’ The notion of using discrete traits as ‘genetic markers’ to determine the number, chromosomal locations, and phenotypic effects of genes that determine either simple or complex traits is nearly a century old (Sax, 1923). However, outside of a few favorable models such as Drosophila (Thoday, 1961), the comprehensive ‘molecular dissection’ of the genetic control of phenotypes only became feasible with the advent of DNA-based genetic markers in the late 1970s (Botstein et al., 1980). Application of such methods to plants has been energetically pursued, and there now exist detailed molecular maps for most of the world’s major crops as well as selected wild relatives and botanical models. Fortuitously for students of domestication, plant genome mapping was often most efficient if one first studied the progeny of crosses between crops and their wild relatives, due to the tendency of such crosses to segregate for large numbers of DNA polymorphisms. This led to the detailed genetic analysis of many traits that distinguish cultigens from their wild ancestors. While many of these crosses are interspecific, at least in the manner that species have been defined in many plant taxa, most are euploid and give rise to relatively normal (but occasionally transgressive) progeny. In some cases, domestication itself may have involved a sufficiently large suite of phenotypic changes that morphological classifications suggest specific status for the cultigen.

As a DNA-level demonstration of Vavilov’s (1922) ‘law of homologous series in variation’, molecular markers have also provided a direct conduit for comparative genetic analysis of taxa that have been reproductively isolated for millions of years. ‘Comparative mapping’, the study of similarities and differences in gene order along the chromosomes of taxa that cannot be hybridized, has shown that surprisingly few major chromosomal rearrangements (inversions and/or translocations) often distinguish among many major crops and/or botanical models. Comparative maps permit information gathered during study of one taxon to be quickly applied to related taxa. This capability has added new dimensions to comparative biology, with special significance for studying the genetic consequences of independent domestication(s) of diverse crops, by different peoples and on different continents, for similar products.

Recent progress notwithstanding, much about the genetics of domestication remains to be demonstrated, and caution remains warranted regarding many of the inferences that are possible based upon present data and tools. This is largely in view of two factors: the statistical power of genetic maps to precisely locate genes, and the extent to which multigenic traits and the effects of environmental factors plus measurement errors reduce precision of genetic mapping; and the demonstration that localized genomic rearrangements are much more frequent than we realized, even distinguishing among taxa that are closely related. The latter problem will be resolved as more and more of the specific DNA sequences that are directly responsible for key aspects of domestication are identified, obviating our need to rely upon diagnostic (i.e. indirect) DNA markers, and permitting us to build on the messages that are emerging from early successes in the identification of domestication genes (Doebley et al., 1997; Frary et al., 2000). The former problem will endure, but may be ameliorated by new tools such as comprehensive ‘gene maps’, and new high-throughput approaches to scanning gene pools for the ‘footprints of selection’.

And finally, a brief disclaimer. It was not my intention, nor is it the goal of the Tansley series, to provide an exhaustive review of the literature on QTL mapping, plant domestication, or the intersection thereof. There exists much excellent research of potential relevance to the subject area that I have not covered due to limited space, time, and energy. It is my opinion, and hope, that the sampling of literature I have presented, albeit biased in favor of the taxa with which I am most familiar, provides a representative picture of some new messages emerging from the intersection of QTL mapping and plant domestication. If your work was among that which I failed to cover, please forgive my omission.

II. A Backdrop: QTL mapping basics

Quantitative trait locus, or ‘QTL’ mapping is a means to estimate the locations, numbers, magnitude of phenotypic effects, and modes of gene action, of individual determinants that contribute to the inheritance of continuously variable traits. The fundamental task is to extricate genetic ‘signal’ at an individual locus, from the ‘noise’ that results from the collective effects of nongenetic factors such as edaphic and microclimatic variations as well as measurement error in assessment of continuous traits. Although, sensu stricto, a ‘quantitative trait’ can be under monogenic control, in most cases the task of QTL mapping is further complicated by the need to resolve the effects of one genetic locus from the noise created by segregation at other genetic loci that also influence the trait, together with the effects of nonlinear interactions among loci.

Although it is broadly applicable to research across the life sciences, QTL mapping is of singular importance in agriculture. For lack of more precise measures, many aspects of agricultural quality and productivity are ‘end-point’ measurement such as ‘yield’ that integrate many aspects of growth and development throughout an organism’s life cycle. It is highly desirable, and an area of much effort, to dissect such end-point measurements into more specific components such as disease resistance, stress tolerance, photosynthetic efficiency, etc., that can be directly related to specific biochemical steps. Nonetheless, one can readily apply QTL mapping to such complex end-point measurements in the absence of biochemical data, to identify DNA markers that are diagnostic of traits that are difficult to measure either due to fundamentally low heritability or due to the cost and complexity of the assay. Thus, QTL mapping is a means to render more tractable phenotypes for which little biochemical/molecular information exists, requiring only that: genotypes containing different alleles at phenotypically important loci can be crossed to produce populations of progeny that are in linkage disequilibrium; and that there be sufficient divergence among the parents to identify discrete genetic markers that sample the genome at sufficiently dense intervals to infer the parental origin of the intervening chromatin with a high degree of confidence. This second requirement is met to varying degrees by different crosses, and through the use of different types of tools. Partial failures to meet this requirement, for example ‘gaps’ in the genetic map as a result of low levels of DNA-level divergence in particular regions of the genome, are not crippling but do reduce accordingly the thoroughness with which the genome is searched for possible determinants.

1. Tools and informativeness between genotypes

Three general types of molecular tools feature prominently in the genetic toolbox for QTL mapping, each including a number of variations on a common theme. These are described below.

Evolutionarily conserved loci suitable for establishing orthology across taxa The most widely used type of DNA marker in plant genomics has been the ‘restriction fragment length polymorphism’ (RFLP; Botstein et al., 1980). Detailed RFLP maps have taught us much about genome organization, and were used in most of the early QTL mapping efforts in major crops. Many of these involved interspecific crosses between crops and their wild ancestors, and have made important contributions to understanding the genetics of domestication. RFLPs remain especially important for learning about the comparative arrangement(s) of genes along the chromosomes of divergent plant taxa, as they are amenable to direct use of cDNAs or hypomethylated genomic DNA clones that tend to retain sufficient sequence similarity to hybridize to orthologous loci in different taxa. However, RFLPs are cumbersome, requiring large amounts of DNA, tedious blot hybridization and autoradiographic methods, and detecting only a small fraction of the DNA polymorphisms that occur between genotypes. A modification, the ‘cleaved amplified polymorphic sequence (CAPS)’ method (Konieczny & Ausubel, 1993), uses locus-specific PCR primers to amplify corresponding loci in different genotypes, then cut the amplification product with a restriction enzyme (often with a 4-nt recognition sequence), retaining the locus-specificity of RFLP but reducing the amount of genomic DNA needed and eliminating autoradiography.

Rapidly evolving loci suitable for discerning variation among closely related genotypes Simple-sequence repeat (SSR)’ markers have become popular in the past decade. SSRs are tandemly repeated arrays of simple sequence motifs typically 2–4 nt in length (for example, …CACACA…), often comprising 5–50 tandem units of the motif. These elements rapidly evolve new ‘alleles’ in the form of arrays with different numbers of units. By amplification of genomic DNA with PCR primers that immediately flank the arrays, alleles are visualized as size variants in acrylamide or agarose gels. By attaching fluorescent dyes to the primers, several loci can simultaneously be assayed on vertical acrylamide gels. Apparatus for rapid size separation and laser-detection of data are commercially available. SSRs have been essential in the primary molecular mapping of taxa such as soybean (Akkaya et al., 1995) which have few wild relatives that can be crossed with, and have been especially important in the detailed characterization of elite crop gene pools comprised of closely related individuals. However, SSRs are relatively are costly to develop, and have proven to detect only modest levels of DNA polymorphism in some recently formed polyploids such as groundnut (Hopkins et al., 1999). SSRs are typically found in genomic DNA that is rapidly evolving and therefore not well-conserved between taxa, hindering the ability to align the genetic maps of divergent taxa. Nonetheless, SSRs do also occur at low levels within coding sequences and more frequently in nearby untranslated regions, and populations of publicly available cDNAs are growing sufficiently large to provide a valuable resource for the discovery of SSRs that may be more amenable to establishment of cross-taxon orthology.

Arbitrary or semiarbitrary-sequence-based PCR, suitable for rapid assembly of data with a minimum of a priori information Large quantities of genomic segregation data can be accumulated with a minimum of a priori information, using any of several methods that rely upon PCR primers that are sufficiently short to occur by chance in a large number of locations in a higher plant genome. While several such methods have been described including AP-PCR (Welsh & McClelland, 1990), and RAPD (Williams et al., 1990), the most widely used is presently the AFLP method (Vos et al., 1995). By this approach, very large numbers of genetic loci can be monitored for segregation at relatively low cost and in a short timeframe. Some difficulties are that mapped loci tend to be un-evenly distributed across the chromosomes (requiring more effort to obtain complete genome coverage); that alleles tend to be population-specific (making comparative analysis difficult for different crosses and nearly impossible for different taxa), and that the vast majority of alleles are ‘dominant’ (failing to discern homozygotes from heterozygotes).

A recent development, ‘transposon display’, uses dispersed repetitive DNA to provide one primer, and an arbitrary sequence as the second primer (Van den Broeck et al., 1998). A priori determination of the genomic distribution of individual transposable element (TE) families enables one to choose a sample that is likely to represent near-complete coverage of the genome, or possibly even to target specific chromosomes if warranted (Casa et al., 2000). This method has special appeal in that the priming sequence itself may be an important agent of genetic change, and in some cases is preferentially associated with genes (Zhang et al., 2000).

2. Population structures and experimental designs

A wide range of population structures can be used for QTL mapping, the minimal requirement being the establishment of linkage disequilibrium between defined genotypes. Plants that tolerate inbreeding enjoy especially great latitude for different population structures. Some of these are listed below.

Backcross Traditionally, genetic linkage mapping relied heavily upon intercrossing two homozygous genotypes to produce a heterozygous F1, and crossing the F1 ‘back to’ one of its inbred parents to create 1 : 1 segregation for polymorphic alleles from the donor parent. This population structure remains useful for a number of purposes, especially introgression of exotic germplasm from a wild relative into a domesticate. Recently, the analysis of ‘advanced-backcross’ or ‘backcross-self’ populations in combination with molecular mapping has revealed cryptic alleles from wild relatives that exert un-expected desirable effects when uncoupled from background factors that suppress their expression (Paterson et al., 1988; Tanksley et al., 1996; Xiao et al., 1996, Ming et al., 2001).

F2 Selfing or intercrossing of heterozygous F1s creates populations that segregate in the traditional 1 : 2 : 1 ratio, and enjoy the advantage of permitting the geneticist to see the consequences of all possible ‘dosages’ of an allele (assuming bivalent pairing). This permits estimation of the mode of gene action (dominant, recessive, additive, or most frequently somewhere in between). A traditional argument against the use of F2 populations in basic genetic studies is the difficulty in distinguishing whether heterozygotes at consecutive marker loci represent double-parentals or double-recombinants, but the implementation of maximum likelihood algorithms in a number of excellent software packages obviate this.

‘Recombinant Inbred (RI)’ or ‘Single-Seed Descent (SSD)’ These population structures are important in that they are comprised of homozygous genotypes which can be replicated in different environments or experimental treatments, either to ask specific experimental questions, or simply to quantify the ‘sensitivity’ of individual QTLs to year-to-year fluctuations in climatic conditions. In plants, most RI or SSD populations are produced by selfing, and individuals are derived from two gametes that are identical by descent, so therefore only offer the information content of one gamete in (for example) an F2 individual. However, this gamete has been through several (usually 6–8) generations of selfing, so there have been many additional opportunities for recombination in those regions of the genome that remained heterozygous. Since there is less heterozygosity with each passing generation, the result of this tradeoff is that RI or SSD individuals derived by selfing have similar ‘information content’ to F2 individuals (that are comprised of two different gametes that have been through only one meiotic cycle). By contrast, in animal populations, RI populations must be produced by sib-mating, with the result that heterozygosity persists longer and RI individuals offer more ‘information content’ for resolving linkage than F2 individuals.

A variation on these methods is the interjection of several generations of random intermating into the breeding scheme, followed by selfing, that permits one to obtain plant populations with higher information content but at a cost of time (Liu et al., 1996).

Doubled haploid (DH) This population structure is an attempt to combine the advantages of homozygosity with the speed at which an early generation population can be made. A heterozygous F1 is used to produce gametes that are artificially doubled in chromosome number by means such as colchicine, and cultured to yield plants. The necessary culture conditions, and therefore the utility of the approach, are highly taxon-specific – nonetheless, DH populations exist for barley, rice, Brassica oleracea, and a number of other major crops. Genetically, individual plants within a DH population contain two identical gametes, and therefore each contains only one uniquely informative gamete – therefore, DH individuals are equivalent to first-generation backcross individuals, in terms of their information content about recombination. Because DH individuals are homozygous, one cannot determine the mode of gene action that accounts for a particular QTL, just as is true in a backcross. Like a backcross, one can see the effects of additive or dominant alleles in a DH population – but one cannot see overdominant alleles (which can be seen in a backcross), and one can see recessive alleles (which cannot be seen in a backcross, if from the donor genotype).

In taxa that are not tolerant of inbreeding, fewer options are available. This includes many autopolyploid plants that are produced for biomass-related products, such as potato, sugarcane, alfalfa, and many forage and turf grasses. The most common approach in mapping of these taxa is the production of F1 hybrids from crosses between highly heterozygous, unrelated individuals. By selection of the subset of DNA polymorphisms that exhibit ‘simplex’ segregation in the progeny, one can infer that there must have been only a single copy of the polymorphic allele in the source parent, and construct a map in exactly the same manner as in a backcross (Wu et al., 1992). By this approach, one is mapping the segregation and recombination that occurred simultaneously in each of two heterozygous nuclei (one from each parent), and therefore the result is two maps, one of each parent. The use of at least a framework of RFLP or SSR markers permit one to align the different members of a homologous series of chromosomes within each map (see below).

3. Ability to detect QTLs (phenotypic effects)

An inherent limitation of QTL mapping is that not all QTLs can be detected with statistical significance. The ability to resolve a QTL depends upon the magnitude of its phenotypic effect, the extent to which its effect is modified by nonlinear interactions with other QTLs, the degree to which environmental factors and/or measurement errors obscure the resulting phenotype, and the density of genetic markers available to scan the genome for QTLs. The experimenter can influence these factors by choice of a population structure best suited to the goals of the experiment (see above), the number of unique plant genotypes (i.e. segregating progenies) used in the study, and the type (dominant vs codominant), and spacing of genetic markers used. In a wide range of studies of different taxa and phenotypes, typically using 200–300 individuals, BC1 or F2 populations, and markers spaced at 10–20 cM intervals, detectable QTLs are often those that explain more than 4% of the phenotypic variance.

The experimenter can improve the resolution of specific QTLs by breeding efforts, especially using backcrossing. For example, a QTL identified in an early generation study (BC1, for example), can be subjected to a series of additional backcrosses to reduce the genetic component of variance, while maintaining the QTL by marker-assisted selection (Paterson et al., 1990). In principle, one can develop a near-isogenic line that contains donor chromatin only in the region containing the QTL. This reduces the genetic component of variance to solely that which is contributed by the introgressed chromosome segment. Such an approach contributed (Dorweiler et al., 1993; Alpert & Tanksley, 1996) to each of the two early successes in cloning of genes related to crop domestication (Dorweiler & Doebley, 1997; Frary et al., 2000).

QTLs that cannot be resolved in early generation studies may be detected by using large-scale backcrossing strategies. For example, production of a complete set of near-isogenic introgression lines in tomato revealed a substantial number of QTLs that had escaped detection in earlier studies (Eshed & Zamir, 1995).

Significance tests for declaring QTLs are an area of some controversy. While most agree on the need for statistical thresholds that account for ‘multiple comparisons’ (searches of hundreds of independent loci for associations with a trait), different approaches have evolved for setting significance thresholds. The approach of Lander & Botstein (1989) and also Lander & Kruglyak (1995), establishes a common threshold for all normally distributed traits, based on the length of genetic map and density of markers (above). Nonnormal traits need to be subjected to normalizing transformations. An alternative, permutation testing, utilizes each parameter itself to establish trait-specific thresholds that are arguably independent of the distribution of the trait (Churchill & Doerge, 1994). However, different journals, and indeed different editors and reviewers, prefer different significance thresholds, and different methods for establishing the thresholds.

4. Map resolution and its consequences – evaluating correspondence among QTLs

A QTL is described as a ‘likelihood interval’, similar to the more familiar statistical concept of a confidence interval, a chromosomal segment in which one or more determinants of a trait can be asserted to locate with a high likelihood. Since cloning of individual QTLs remains a difficult undertaking that to date has been successful in very few cases, most inferences about properties of QTLs derive from evaluation of such likelihood intervals. A single such interval typically spans 10–30% of the length of a chromosome. Using the estimated gene number of c. 25 000 for Arabidopsis thaliana (Arabidopsis Genome Initiative 2000) as a conservative estimate of the number of genes in a typical crop genome of 1000–2000 cM, an average interval of 20 cM would be expected to contain c. 250–500 genes. While a QTL may be caused by as little as a single gene, clearly a QTL likelihood interval does not equate to a single gene or even a tractable number of genes.

One formal approach has been described for the comparison of different sets of QTLs, whether they are QTLs in the same population that affect different traits, or QTLs in different populations (including different taxa). This approach uses the hypergeometric probability distribution function (‘sampling without replacement’; Larsen & Marx, 1985), as follows:

image(Eqn 1 )

where n = the number of intervals which can be compared. (We usually define an interval as 30 cM, conservatively approximating a QTL likelihood interval); m = the number of matches declared between QTLs. (When 1-LOD likelihood intervals for two taxa overlapped); l = the total number of QTLs found in the larger sample; and s = the number of QTLs found in the smaller sample.)

Conceptually, this approach divides the genome (or genomes) into ‘bins’, and asks the question ‘Given one sample of l QTLs and a second sample of s QTLs, each distributed over n intervals, what is the likelihood that m matches would occur by chance? For example, the likelihood that chance could account for the observed number of matches among seed mass QTLs found in sorghum, maize and rice ranged from 0.1% (rice vs maize, l = s = 8, m = 5, n= 50) to 0.8% (maize vs sorghum, l = 8, s = 7, m = 4, n= 50) (Paterson et al., 1995).

III. The tempo of domestication

New data from QTL mapping is having an especially large impact on views about the tempo of domestication. Earlier thinking about this issue suffered the sorts of handicaps that have fostered debates about the tempo and mode of other aspects of evolution, specifically the lack of fine-scale historical/archaeological data. Classical thinking on the tempo of domestication closely tracks Neo-Darwinian evolutionary theory, summarized (Harlan, 1975) as follows:

‘Domestication is an evolutionary process operating under the influence of human activities. Since it is evolutionary, we would expect a relatively slow and gradual progression from the wild state to a state of incipient domestication …’.

With all respect to the late Dr Harlan (who, incidentally, this author numbers among the leading agricultural scientists of the 20th century), a clear message emerging from QTL mapping is that many domestication events need not necessarily have been either slow or gradual. Several lines of evidence point to the possibility that an individual domestication event may have been rapid, including the relatively small number of genes that appear to control many domestication-related traits, the identification of tightly linked ‘clusters’ of genes that may enjoy a strong selective advantage once favorable alleles occur in coupling phase, the levels and patterns of DNA-level variation in contemporary gene pools of crops and their ancestors, and the high degree of correspondence among the locations of genes related to independent domestication(s) of diverse cereal crops on different continents.

1. How many genes are involved in domestication events?

Many QTL data shed light on the number of genes that distinguish crops from wild or weedy relatives. Among many excellent studies conducted in a wide range of important taxa, I have limited my consideration to a small sampling, most in the cereals, that either exemplify particular principles, or which I enjoy detailed personal knowledge of. A much more extensive review of this topic alone, by energetic and knowledgeable colleagues in each of the individual taxa, and also across taxa, would be well worthwhile.

The broad question of ‘how many genes are involved in a domestication’ will be looked at from three perspectives. First, in two detailed examples, how many genes are involved in the various morphological transformations that contribute to a domestication? This is explored in sorghum and rice, in each case focusing on crosses between the cultigen and a wild relative.

Second, to what extent do genes associated with domestication coincide in different populations and congeneric species? While studies of individual crosses (such as in sorghum and rice) are informative, many such investigations use similar population structures, similar numbers of individuals, and therefore have similar statistical power to resolve QTLs. Comparative evaluation of multiple populations grown in multiple environments helps to discern between the possibilities that relatively simple genetic control is an inherent property of domestication events rather than an artifact of similar experimental designs. Maize and tomato provide excellent examples.

Third, to what extent are domestication events reversible? If indeed domestication can happen quickly, it should also be true that naturalization can happen with similar speed. One example has been investigated by QTL mapping, in rice, and is illustrative (although more examples would be well warranted).

1(a) Two examples of domestication events and their genetic control

Rice Arguably, the world’s most important food crop, many studies have shed light on the genetic control of traits related to rice domestication. Especially informative about domestication is a recent study (Xiong et al., 1999) of a cross between an Asian cultivated rice (O. sativa ssp. indica) and an accession of the wild rice species (O. rufipogon) that is thought to have given rise to the Asian cultigen.

Some of the most important differences between wild and cultivated rices relate to panicle and spikelet structure. In the chosen study, four QTLs explained 36.2% of phenotypic variation in panicle neck length, one of which was in the same interval as the largest QTL for tillers per plant. Two QTLs explained 18% and 7.8% of phenotypic variation in the number of secondary branches per panicle, one of which coincided with QTLs for plant height and panicle length. Three QTLs explained small portions (6.9–8.0%) of variation in spikelets per panicle, each in chromosomal regions that had previously been associated with this trait in crosses between more closely related genotypes. Two QTLs accounted for a total of 15.9% of phenotypic variation in spikelet density, one in an interval that had previously been associated with variation in an elite rice hybrid. Five QTLs were detected for shattering, three of which each explained more than 15% of phenotypic variation. The presence vs absence of extruded stigmas, lax panicles, and awns, were each accounted for by single genetic loci.

Wild rice tended to have a larger and bulkier plant than cultivated rice, reflected in the size and number of most plant organs. Four QTLs explained 72.4% of variation in plant height, with one of these alone accounting for 59.8% of variation, and coinciding with the sd1 locus that has previously been reported for dwarfism (Cho et al., 1994). Two QTLs explained 21.2% of variation in the number of tillers per plant, neither of which had been previously reported. Two QTLs were detected for panicle length, one of which overlapped with the major locus for plant height, suggesting the possibility of a pleiotropic effect. Seven QTLs explained a total of 47.3% of phenotypic variation in anther length, each with relatively small effects. Five QTLs explained a total of 52.9% of variation in culm circumference, one of which explained 28.6% of the variation. Three of the QTLs for culm circumference comapped with QTLs for plant height.

Wild rice tended to be photoperiod-sensitive, flowering much later than cultivated rice in the test environment. Four QTLs accounted for 67.5% of phenotypic variation, all of which fell in regions that had previously been associated with variation in heading date in elite or subspecific crosses. One of the QTLs alone explained 52.3% of variation in the trait, suggesting that this locus was primarily responsible for photoperiod sensitivity. Curiously, this was a QTL that had never been discovered previously.

Sorghum The Sorghum genus includes both a major crop (Sorghum bicolor L. Moench.), and one of the world’s most aggressive weeds, ‘Johnsongrass’ (S. halepense), a polyploid descendant of S. bicolor, of African origin; and S. propinquum (Kunth.) Hitch c. 2n = 2x = 20, native to south-east Asia, Indonesia, and the Philippines. S. bicolor is cultivated primarily as a grain crop, while S. propinquum exemplifies many attributes of wild grasses, including small seeds, abundant tillers, narrow leaf shape, and well-developed rhizomes. Their common ploidy, and normal chromosome pairing in F1 hybrids (Doggett, 1976), suggested genetic study of interspecific S. bicolor × S. propinquum crosses – detailed analysis of 370 F2 individuals has yielded much data relevant to sorghum domestication (Chittenden et al., 1994; Lin et al., 1995; Paterson et al., 1995; Paterson et al., 1995, A. H. Paterson et al., in preparation).

As was true of rice, inflorescence morphology is a primary means of classifying Sorghum bicolor into five basic ‘races’ or ‘subspecies’ that form informative groupings of intraspecific diversity (Harlan & deWet, 1972). Panicle length (PL) was associated with four QTLs that collectively explained 29.2% of phenotypic variance. Branch length (BL) was associated with three QTLs that collectively explained 25.0% of phenotypic variance. Whorl Number (WN) was associated with five QTLs that collectively explained 25.6% of phenotypic variance. Seed weight (SW) was associated with nine QTLs that collectively explained 51.7% of phenotypic variance. Spikelet number (SN) was associated with four QTLs that collectively explained 19.1% of phenotypic variance. Disarticulation (shattering) mapped to a single locus, that has subsequently been mapped to fine-scale, and circumscribed to a small genomic region that is under intensive study (M. Wise & A. Paterson, pers. comm.)

S. propinquum and ‘Johnsongrass’ both have abundant tillers and aggressive rhizomes, traits that are not compatible with row crop cultivation. The loss of rhizomatousness may predate domestication, as no wild S. bicolor genotypes are known to be rhizomatous. Nonetheless, several different measures of rhizomatousness have been evaluated, and 21.8% of phenotypic variance in the primary measure ascribed to three QTLs, with six additional QTLs shown to contribute to other measures of rhizomatousness. Tillering, by contrast, varies widely not only among Sorghum species, but between wild and cultivated forms of S. bicolor– four QTLs account for 23.7% of phenotypic variation in tiller number at 8 wk after seeding. One tillering QTL coincided with a QTL influencing the number of rhizomes, and has been postulated (Paterson et al., 1995) to influence the abundance of axillary buds available to form tillers or rhizomes (respectively), while additional QTLs are involved in the commitment of a particular bud to one developmental path or the other.

Like rice and other cereals, most wild sorghums are photoperiod-sensitive, flowering only under ‘short’ days, typically 12–13 h or less. In the S. bicolor × S. propinquum cross, one QTL alone explained 85.7% of phenotypic variance, with two additional QTLs discernible after adjustment for the effects of this major QTL. Plant stature (‘height’) was closely correlated with flowering time (r = 0.79), explicable in that the single QTL of largest effect (explaining 54.8% of phenotypic variation alone, with additive effect of 87.9 cm) mapped to the same chromosomal region as the major flowering QTL. One of the additional flowering QTLs also corresponded to one of the five additional height QTLs. The six height QTLs collectively accounted for 71.0% of phenotypic variation in height.

To summarise briefly, domestication of both rice and sorghum appears to have involved selection against major genes that confer photoperiodic flowering, a mixture of major genes and small-effect QTL that collectively confer substantial reductions in height, and smaller but significant changes in tillering, inflorescence architecture, and other aspects of plant architecture. A curious discrepancy is the complexity of ‘shattering’ (disarticulation) of the mature inflorescence – while inheritance of shattering of rice and many other grasses (Young, 1986) is complex, the discrete inheritance of the trait in sorghum together with its small genome size makes sorghum an especially favorable taxon in which to clone the ‘shattering gene.’ Such efforts are in progress.

Having reviewed this sampling of QTL data regarding numbers of genes that distinguish cereals from their wild or weedy relatives, one is tempted to conclude that relatively large portions of the phenotypic variance (typically 50% or more) are often explained by relatively small numbers of genes (perhaps five). However, this conclusion must be tempered with the reminder that these and many other such investigations used similar population structures, similar numbers of individuals, and therefore had similar statistical power to resolve QTLs. That many studies suggest that only a few QTLs of large effect explain many phenotypes, is at least partly an artifact of the experimental designs, rather than an inherent property of domestication events. Stronger support for this notion might be garnered by evaluation of the extent to which different genotypes, populations, and environments lead one to the same conclusion. The next subsection investigates this in two exemplary cases.

1(b) To what extent do domestication genes coincide in different populations and congeneric species?

Maize Comparison of QTL mapping results in two crosses involving different races of maize and different subspecies of teosinte shed light on the inheritance of changes to inflorescence morphology and other domestication-related traits (Doebley et al., 1990; Doebley & Stec, 1991, 1993). The teosinte parents of each cross were phenotypically similar in that each had fully disarticulating (‘shattering’) ears, with only a single spikelet per cupule, 4–5 cupules along each of the two ranks, highly indurate (hard) glumes, long (17–22 mm) internodes in the lateral branches, 8–9 small secondary ears along each lateral branch, and primary lateral inflorescences that are staminate (male). By contrast, both maize parents possessed typical nondisarticulating maize ears with two spikelets per cupule, numerous (> 37) cupules per rank, five or more ranks, soft glumes, very short (< 1 cm) internodes in the lateral branches, few or no secondary ears along the lateral branch, and primary lateral inflorescences that are pistillate (female).

For these key traits that distinguish the inflorescences of maize and teosinte, a total of 50 independent significant associations were found in one population. Each trait showed between four and seven QTLs, with individual QTLs explaining up to 49% of phenotypic variation, and the sets of QTLs for a trait typically explaining 50% or more of phenotypic variation. Most of the QTLs coincided in five regions of the genome that appeared to control most of the differences between maize and teosinte.

The five genomic regions of primary importance were the same in each of the two populations, accounting for 48% and 58%, respectively, of all QTLs, and 65% and 80%, respectively, of the QTLs explaining more than 10% of phenotypic variance. Further, QTLs of largest effect agreed very well across the two populations – 13 (81%) of 16 QTLs explaining 20% or more of phenotypic variance were detected in the same genomic regions in both populations. QTLs of smaller effect were less congruent – for those explaining 10–20% of variation, 16 (55%) of 29 cases coincided, and for those explaining less than 10% of variation, 15 (28%) of 53 cases coincided.

Tomato Perhaps the most extensive available test of the coincidence of QTLs among different genotypes within a genus is provided by a recent summary of efforts to study changes in the size and shape of tomato fruits (Grandillo et al., 1999), especially important hallmarks of domestication in this taxon. Fruit size or ‘weight’ (mass per fruit) has been mapped in seven wild species of tomato, with DNA or isozyme markers providing genome coverage ranging from 18% to virtually 100% (Grandillo et al., 1999). The various studies also differed in population structure (seven different types), progeny number (50–1200), and significance thresholds (several using thresholds too lenient to provide adequate experiment-wide error control) – therefore it is little surprise that the results ranged from 1 to 18 QTLs. A total of 28 nonoverlapping genomic regions were associated with genetic differences in fruit size in at least two independent studies, providing a minimal estimate of the number of QTLs affecting the trait (but remember that QTL likelihood intervals are large, contain many genes, and are compared to one another only roughly). It is at least interesting, and perhaps significant, that this number is roughly consistent with suggestions based on quantitative genetic theory that the number of genes influencing tomato fruit size may be ‘possibly as many as 20’ (Ibarbia & Lambeth, 1969); and that estimates of 10–11 genes using the method of Wright (1968) may actually be on the order of 1/3 of the actual number of genes (Zeng et al., 1990). Among the 28 ‘conserved’ QTLs, six (22%) explained 20% or more of the phenotypic variance in at least one study.

Fruit shape was also measured in six of these studies, with the estimated number of QTLs detected ranging from 2 to 16. A set of 11 fruit size QTLs could be identified that were segregating in at least two independent studies. Six of these explained more than 20% of phenotypic variation (ranging up to 45%) in at least one study.

To summarise briefly, comparative studies of both maize and tomato both continue to support the notion suggested above, that there exist a population of about 6 QTLs that explain a disproportionate share of phenotypic variance in the morphological transformations that were central to domestication. More such information is needed about the distributions of individual QTL alleles across gene pools and congeneric species – however, support from two such divergent taxa, and genetic control of different organs, permits one to assert with some confidence that the notion of a few QTLs of large effect playing a predominant role in domestication events is a biological reality and not merely a statistical artifact.

1(c) To what extent are domestication events reversible?

Many crops are associated with related weedy forms, and in a growing number of examples these weeds appear in regions where no naturally occurring crop relatives are known. Some examples, such as ‘Johnson grass’ (S. halepense) have been dispersed as a result of intentional human introduction, possibly combined with unintentional introductions as contaminants in seedlots of other species. However, in other cases the weedy forms do not closely resemble naturally occurring forms but rather, appear to be of recent descent.

A well-studied example is ‘red rice’– a new emerging form of Oryza in many different areas of the world where no wild relatives of rice occur (for example, the USA, Brazil, and Europe). ‘Red rice’ shows many intermediate characteristics between wild rice, Oryza rufipogon, and cultivated indica or japonica forms of O. sativa (the cultigen), and is especially well adapted to disturbed habitats (Oka, 1988). A variety of molecular data have shown that these weedy forms are not O. rufipogon, but are differentiated into indica and japonica types that correspond to the two cultivated subspecies of rice (Cho et al., 1995; Suh et al., 1997).

Bres-Patry et al. (2001) have investigated the genetic control of weediness traits in 151 DH lines derived from a cross between an O. sativa cultivar and such a ‘red rice’ (Bres-Patry et al., 2001) collected in France. The 104 loci mapped in the cross provided fairly good coverage of the genome, although the study of only 151 plants is something of a weakness in the study. A total of 29 QTLs were identified at appropriate LOD thresholds, explaining from 9.2% to 85.9% of phenotypic variation, but with most accounting for less than 20%. Comparison with the QTLs found in the study of Xiong et al. (1999) that was described above, of a cross between an Asian cultivated rice (O. sativa ssp. indica) and an accession of the wild rice species (O. rufipogon), showed that 12 of the 29 QTLs were common to the two crosses and the remaining 17 were new to this cross.

An especially important finding was that only 1–3 QTLs could be detected for most traits, and further that these QTLs were largely concentrated in four regions of the genome. In other words, the genetic control of phenotypic differences between the cultivated and weedy forms was very simple. Comparing this study to some of the QTL studies cited above, the number of QTLs per trait in this study is remarkably low, but the small number of individuals in the study may partly account for this.

The origins of ‘red rice’ or such weedy forms, in regions far beyond the natural range of rice relatives, remains unknown. A fascinating and testable possibility, is that these forms might represent ‘revertants’ to wild-type alleles at a small number of key domestication loci. Proof that these wild forms are revertants would probably require access to the specific genes that were responsible for the QTLs, however, a number of alternative hypotheses could be tested immediately. For example, detailed phylogenetic analysis of DNA marker alleles along chromatin segments in the QTL regions could be used to test the extent of affinity of these specific DNA segments with local cultigens, and perhaps to falsify the possibility that these segments were introgressed from weedy forms and had arrived in these geographical regions as a result of long-distance dispersal (whether intentional or not). The emergence of these weedy types, seemingly in association with rice cultivation, may perhaps suggest that the genetic differences leading to domestication might be lost very quickly when they are no longer preserved by selection.

2. Parallels in domestication of diverse crops

One means by which we can avoid some of the problems associated with using QTL mapping per se to estimate the number of genes involved in domestication events, is to take a comparative approach. Consider, for example, a fictitious case in which domestication as measured by some parameter (such as seed size) involved a very large number of genes each with equal effects. QTL mapping in one taxon would be expected to reveal a subset of five or so QTL likelihood intervals for a trait (depending on the experimental design, and population size used). If one were to compare the QTL maps of two divergent taxa, such as sorghum and rice, there would be no reason to expect the QTL maps to be similar – even if the populations of genes that could potentially be involved in domestication were orthologous, with many possible genes involved the two samples of five detectable QTLs would be likely to be different just due to sampling variation.

By contrast, what if only a very small number of genes (perhaps 10 or fewer) explained most of the phenotypic variation in traits related to domestication? Then, the mapped QTLs in a taxon would be expected to yield a much more comprehensive picture of the genetic variation implicated in a domestication-related trait. If orthologous genes were implicated in the domestication of different taxa, then the locations of QTLs for common traits in diverse taxa might coincide more often than could be reasonably explained by chance (testing this by using methods illustrated above).

This notion, of ‘comparative QTL mapping’, has been especially well-utilized in the Poaceae crops. This is due to the finding that DNA probes from one Poaceae taxon often hybridize to other taxa in the family (Hulbert et al., 1990) – coupled with the independent but convergent domestication(s) of many Poaceae crops for the production of carbohydrate-rich seeds (grains). We have investigated this notion in some detail, using crosses between cultivated and wild sorghums, maize and teosinte, and divergent subspecies of rice as the basis for our comparison (Paterson et al., 1995). The clear and recurring message from several different traits, including seed size, shattering (disarticulation of the inflorescence), flowering, and plant height, was that QTLs for both simple (shattering, day-neutral flowering) and complex traits (seed size) were found in corresponding locations in different taxa far more frequently than could be explained by chance.

Convergent domestication of sorghum, rice, and maize appears to have resulted from mutations at corresponding genetic loci, suggesting that few genes with large effects determine the phenotypes studied. This result supports punctuational evolutionary models proposed for other taxonomic lineages, such as transformation of the berry-like ovary of wild nightshades into the tomato ‘fruit’ (Grandillo et al., 1999). Once again, however, I must qualify the interpretation of this QTL-based result. Correspondence in location of QTLs in different taxa does not prove identity of the underlying genes – it merely proves that they occur in orthologous genomic regions more often than would be expected by change. Correspondence does, however, suggest the possible identity of some of the domestication-related genes, a suggestion reinforced by the tendency of corresponding QTLs to show similar gene action.

3. Levels and patterns of allelic diversity

The cloning of teosinte branched1 (tb1) provides the first clues about levels and patterns of allelic diversity in domestication genes (Doebley et al., 1997). This gene is especially important in accounting for the difference between the morphology of maize and teosinte, acting as a repressor of lateral branch growth in maize that results in the formation of short branches tipped by ears. Across a 2.9-kb region of the gene that includes most of the predicted transcriptional units, differences in DNA sequence diversity yielded important clues as to the nature of the mutation(s) that contributed to domestication (Wang et al., 1999). Specifically, in the transcriptional units, a sampling of 17 maize genotypes showed 39% of the diversity found among 22 teosinte genotypes – but in the 5′ nontranscribed region, the maize genotypes were much more uniform showing only 3% of the diversity found in teosinte. This evidence from polymorphism analysis, together with previous work on tb1 mRNA levels, strongly suggests that the short ear-tipped lateral branches of maize evolved from the long, tassel-tipped branches of teosinte by human selection for novel regulatory elements in the 5′ nontranscribed region of the gene. However, no fixed differences between maize and teosinte were found in the region studied – suggesting that either the selected site lies further upstream, or that the differences between maize and teosinte are complex and not fully explained by a single site.

The striking difference in allelic diversity between the transcriptional unit and the nontranscribed region of the tb1 gene, together with prior data on recombination rates in maize, permitted a rough estimate of the selection coefficient associated with fitness of the mutant genotype during maize domestication, and in turn the duration of selection (TF) needed to bring the maize allele to fixation. Considering population sizes representative of ‘garden’ (1000) and ‘agricultural’ (100 000) cultivation, respectively, it was estimated that fixation of tb1 may have involved 315–1023 yr, respectively.

Other lines of evidence, while not necessarily contra-indicating this tempo, point to the possibility that domestication events need not have been either slow or gradual (Eyre-Walker et al., 1998). Based on the levels and patterns of DNA sequence variation in the Adh1 locus (and also glb1; H. Hilton & B. S. Gaut, unpublished), the breadth of diversity in maize is consistent with a founding population of only 20 individuals if the domestication event was as short as 10 generations in length. The authors do not assert that maize domestication was actually limited to 20 individuals and 10 generations, noting that the data also fit alternative models ranging up to 5600 individuals and 2800 yr. However, the notion of very rapid domestication events in remarkably small populations appears plausible.

4. Domestication and linkage

A recurring theme in the QTL mapping of domestication (and other) traits in many taxa has been the observation of concentrations of QTLs affecting apparently different traits in common genomic regions (see examples above). Computer simulations have suggested that domestication may confer a selective advantage to the de novo evolution of tightly linked combinations of genes or ‘supergenes’ (D’Ennequin et al., 1999). One can envision a number of possible explanations of this suggestion at the molecular level. In the past few years it has become clear from many fine-scale synteny comparisons and analyses of long stretches of genomic sequence (Tikhonov et al., 1999), that there is more fluidity of individual genes and small genomic regions than was envisioned by most geneticists, including myself. However, if such recent and localized reshuffling of genes were to account for the formation of ‘domestication supergenes’, then the genomic locations of these complexes in the various crops that were independently domesticated in the past 10 000 yr or so would be likely to be different. It has already been observed that the locations of individual domestication traits (Paterson et al., 1995), and in some case ‘domestication complexes’ (Lin et al., 1995), appears to correspond to a greater degree than could be explained by chance in divergent taxa.

Several alternatives to the de novo evolution of tightly linked combinations of domestication-related ‘supergenes’ may be plausible. First, the notion of genetically linked coadapted gene complexes is not limited to domestication, but is often associated with the balancing selection that is thought to predominate in large natural populations living in environments to which they are well-adapted. The evolution of gene clusters that regulate, for example, both plant height and flowering time (Lin et al., 1995), could be an ancient event – and subsequent domestication could provide the impetus to fix mutations in each of two (or more) closely linked genes that then appeared to be a domestication-related ‘supergene.

From a different perspective, a growing body of data show that allelic variation is by no means uniformly distributed across plant chromosomes. In a wide range of organisms, it is now clear that levels and patterns of DNA variation are markedly influenced by the chromosomal context of the underlying genes, that is that some chromosomal regions are more likely than others to harbor allelic variation (Begun & Aquadro, 1992; Dvorak et al., 1998; Hamblin & Aquadro, 1999, Draye et al., 2001). Apparent concentrations of QTLs might therefore simply represent mutations in different genes that fall in allele-rich chromosomal regions, and cannot presently be distinguished because of the relatively coarse resolution of QTL mapping.

5. Section summary

This section has examined, from several standpoints, the tempo of domestication, with emphasis on the number of genes that may be necessary to account for the key morphological and physiological (flowering, fruit size) transitions associated with domestication. Virtually all of the data herein, including QTL analyses within taxa, comparisons of QTL populations within and among taxa, levels and patterns of allelic diversity in wild and cultivated gene pools, and linkage relationships among domestication genes, all indicate that the ‘… relatively slow and gradual progression from the wild state to a state of incipient domestication …’ envisioned by Harlan and others is not necessary to account for domestication(s). Rather, domestications may be quite rapid, and perhaps even rapidly reversible. More generally, a growing body of QTL data based on progressively more detailed analysis of genomes (Sax, 1923; Thoday, 1961; Paterson et al., 1988; Tanksley, 1993) cumulatively points to the conclusion that the tempo of quantitative trait evolution may be much more rapid than was anticipated.

IV. Domestication and polyploidy

Prominent among plants in general, and crops in particular, are polyploids, that combine in a common nucleus genomes with independent selection histories. Molecular genetic studies in the past decade have blurred the line between ‘diploid’ and ‘polyploid’, with the discovery of nonrandom patterns of arrangement of duplicated genes that are best explained by ancient duplication or polyploidization events even in the simple genome of Arabidopsis (McGrath et al., 1993; Kowalski et al., 1994; Blanc et al., 2000; Paterson et al., 2000; Vision et al., 2000), long argued to be an ideal botanical model in part due to a lack of gene duplication. These findings tend to support earlier assertions that most angiosperm genomes have incurred one or more polyploidization events (Stebbins, 1966; Masterson, 1994). It is important to note, however, that taxa such as Arabidopsis are in an advanced state of ‘diploidization’, with only c. 50% of genes retaining a discernible paralog and 19% in a location that is consistent with a chromosomal or segmental duplication event (Paterson et al., 2000).

Geneticists have long debated whether the prominence of polyploidy in plants simply reflects ‘promiscuity’, or if a selective advantage is conferred by polyploid formation. Among the best-studied polyploids are many of the world’s leading crops, including cotton, wheat, oat, soybean, peanut, canola, tobacco, coffee, and banana, each of which evolved by the joining of divergent genomes in a common nucleus. In many of these, only polyploid taxa are cultivated, although an abundance of diploid taxa remain extant.

Recent QTL mapping efforts suggest that there may exist a direct relationship between polyploidy and crop productivity, that is that nonlinear changes in phenotype are made possible as a direct result of polyploid formation. While other aspects of the consequences of polyploidy for genome structure and organization have been studied in many taxa, the relationship between polyploid formation and the genetic control of key domestication phenotypes has been especially well studied in two taxa, cotton and sugarcane. I will consider these two examples in detail.

1. Cotton as an example of the consequences of allopolyploid formation

The evolution of the genus Gossypium (cotton) has included a very successful experiment in polyploid formation. World cotton commerce of about $20 billion annually is dominated by improved forms of two (among five extant) ‘AD’ tetraploid (2n = 4x = 52) species, G. hirsutum L. and G. barbadense L. Tetraploid cottons are thought to have formed about 1–2 million years ago, in the New World, by hybridization between a maternal Old World ‘A’ genome taxon resembling G. herbaceum (2n = 2x = 26), and paternal New World ‘D’ genome taxon resembling G. raimondii (Wendel, 1989) or G. gossypioides (Zhao et al., 1998), both 2n = 2x = 26. The antiquity of this New World event precludes human involvement in polyploid formation.

A-genome diploid and AD-tetraploid Gossypium taxa each produce spinnable fibers. Although the seeds of D-genome diploids are pubescent, none produce spinnable fibers. There is no evidence that domestication of D-genome Gossypium taxa has ever been attempted, although their geographic distribution overlaps that of most tetraploids.

Intense directional selection by humans has consistently produced AD-tetraploid genotypes that have superior yield and/or quality characteristics than do A-genome diploid cultivars. Selective breeding of G. hirsutum (AADD) has emphasized maximum yield, while G. barbadense (AADD) is prized for its fibers of superior length, strength, and fineness. Side-by-side trials of 13 elite G. hirsutum tetraploids and 21 G. arboreum diploids (AA) adapted to a common production region (India) show average seed cotton yield of 1135 (±90) kg/ha for the tetraploids, a 30% advantage over the 903 (±78) kg/ha of the diploids, at similar quality levels (Anonymous, 1997). Such an equitable comparison cannot be made for G. barbadense and G. arboreum, as they are bred for adaptation to different production regions. However, the fiber of ‘extra-long-staple’G. barbadense tetraploids, representing c. 5% of the world’s cotton, commands a premium price due to c. 40% higher fiber length (c. 35 mm), strength (c. 30 g per tex or more), and fineness over leading A-genome cultivars, at similar yield levels. Obsolete G. barbadense cultivars reportedly had up to 100% longer fibers (50.8 mm; Niles & Feaster, 1984) than modern G. arboreum (25.5 ± 1.6 mm; Anonymous, 1997).

To further investigate cotton fiber evolution, a detailed RFLP map was used to determine the chromosomal locations and subgenomic (A vs D) distributions of QTLs segregating in a cross between a high-fiber-quality G. barbadense cultivar, and a high-yielding G. hirsutum cultivar (both AADD) (Jiang et al., 1998). Among the 14 QTLs affecting fiber-related traits that met the stringent LOD 3.0 threshold, 10 (71%) fell on D-subgenome linkage groups, from the nonfiber-producing ancestor. The D-subgenome bias of QTLs is not explained by the subgenomic composition of the overall genetic map, levels of genetic variation in the two subgenomes, or patterns of distribution of chromatin introgressed from G. hirsutum into G. barbadense.

Lack of correspondence between QTLs in the A and D subgenomes, diverged by c. 10 million yr, is in striking contrast to the extensive correspondence among QTLs in other genomes diverged by as much as 65 million yr (Paterson et al., 1995).

The joining in a common nucleus of A- and D-genomes, with very different evolutionary histories, appears to have created novel avenues for response to selection in AD-tetraploid cottons. The presence of fiber on wild A-genome diploids suggests that when polyploid formation occurred, many A-genome loci relevant to fiber development may already have contained ‘favorable’ alleles as a result of natural selection. When human selection was ‘recently’ imposed, there may have been little selective advantage to new mutations at ‘major’ fiber loci in the A-subgenome of tetraploid cotton. By contrast, human selection may have conferred a new fitness advantage to mutations at D-subgenome loci, which had presumably rarely if ever been under selection for seed-borne fiber, as its diploid progenitors show inadequate promise to warrant domestication. Mutations that enhanced fiber development may have become favorable only after the D-genome was joined in the same nucleus with the fiber-producing A-(sub)genome.

Nonrandom distribution of QTLs across the subgenomes of cotton has also been discovered for other traits such as bacterial blight resistance, in which the two subgenomes have very different selection histories (Wright et al., 1998). Again, the D subgenome that shows little evidence of favorable variation at the diploid level, is responsible for the majority of novel alleles at the tetraploid level.

2. Sugarcane as an example of the consequences of autopolyploid formation

Among the world’s leading crops with an annual production at a projected record 97 million metric tons in 1999/2000 (http://www.fas.usda.gov/htp2/sugar/1999/november/world.html), sugarcane is a classical example of a complex autopolyploid. Cultivated sugarcane varieties have about 80140 chromosomes, comprising 8–18 copies of a basic set thought to be ×= 8 or × = 10 (Irvine, 1999). Most chromosomes of cultivated sugarcane appear to be largely derived from Saccharum officinarum (Irvine, 1999) however, in situ hybridization data suggest that c. 10% may be derived from S. spontaneum (D’Hont et al., 1996). S. officinarum commonly has high sucrose content, low fiber content, thick stalks, little pubescence, rare flowering, and limited tillering. S. spontaneum does not accumulate sucrose, and is fibrous, thin-stalked, pubescent, profusely flowering, and abundantly tillering.

The recent development of a detailed map for sugarcane, drawing heavily on the molecular tools available for its close relative sorghum (Ming et al., 1998), has fostered much activity in the dissection of the genetic control of complex traits in sugarcane. The efforts we described focused on two different F1 populations, each derived from a cross between a high-sugar genotype of S. officinarum, and a low-sugar S. spontaneum (Ming et al., 2001). By necessity, the simplex or ‘single-dose restriction fragment’ mapping approach was used.

The autopolyploidy of sugarcane was reflected in a high level of apparent duplication of QTLs, reflected by both correspondence of QTLs from different genotypes, and by segregation for QTLs at multiple, apparently homologous locations in individual genotypes. For example, the 36 genomic regions that showed significant association with variation in sugar content, correspond to only eight nonoverlapping regions of the sorghum genome. This suggests that the observed QTLs may be accounted for by a much smaller number of ancestral genes that have been multiplied by the rapid duplication of chromosomes that has characterized sugarcane genome evolution since its divergence from a common ancestor shared with sorghum (Ming et al., 1998).

In four cases, two or more independently segregating loci detected by the same DNA probe were each associated with QTL alleles for sugar content from the same parent. Such associations may reflect multiplex segregation at orthologous genetic loci, or perhaps just coincidence of different QTL alleles at nearby loci, but in either case permit us to evaluate the net consequences of stacking multiple copies of a genomic region each associated with common phenotypic effects. Multiple doses of chromosomal segments containing favorable QTLs consistently yielded diminishing effects on phenotype, especially in cases where high-order duplication could be tested (Ming et al., 2001). This is similar to the results reported from stacking unlinked QTLs in a diploid, tomato, which were attributed to epistasis (Eshed & Zamir, 1996).

Nonadditive gene action in multiple dose QTLs may confer evolutionary opportunities. If a single copy of a gene/QTL is physiologically sufficient, the extra copies are free to collect mutations, often becoming nonfunctional, but perhaps occasionally resulting in a distinctive new function which improves fitness.

An important future investigation regards the contribution of multilocus QTL genotypes to stability of performance across different environments. Sugar content is a trait of relatively high heritability (Kang et al., 1983) – however, a role of multiple-dose QTLs in enhancing environmental stability would be of potentially great importance for less heritable traits. Detecting this type of phenotypic buffering provides strategic information for marker-assisted selection in autopolyploid crops. Although diagnostic DNA markers enable us to pyramid multiple QTLs in a polyploid, incorporating any one copy of the multiple alleles may obtain most of the desired effect in the breeding population.

In each case, cotton and sugarcane, we find evidence in support of the notion that polyploid formation creates unique avenues for response to directional selection toward the breeding of genotypes that better suit human purposes. This represents a ‘potential’ for improvement rather than a de novo superior phenotype – consistent with Clement, 1999) view that domestication includes development of genotypes that are ‘… better adapted to human intervention.’ It remains to be demonstrated whether such a potential actually contributes to the prominence of polyploids among cultigens.

V. New approaches to identifying the footprints of domestication

Genetic change is an enduring consequence of plant domestication – therefore, genomics offers new insights into its history and new opportunities for crop improvement, in much the same manner that it offers new insights into human history and new opportunities for biomedicine. The few-thousand-generation (in terms of annual plants) history of agriculture is sufficiently short that discernible linkage disequilibrium may very well persist in genomic regions under selection. The recent discovery of linkage disequilibrium in human populations thought to trace to 27 000–53 000 yr ago (Reich et al., 2001), bodes very well for the possibility that by using high-throughput genomic approaches, we can identify ‘footprints’ of domestication in crop gene pools.

1. Linkage disequilibrium in crop gene pools

High-density DNA marker maps, and their resolution by genetic and/or physical approaches into ordered sets of markers, opens new doors into searches for the ‘footprints’ of domestication based on analysis of levels and patterns of linkage disequilibrium. For many crops, molecular maps are now well beyond the density of markers that can be resolved by the number of recombinants that comprise the primary mapping populations. Growing integration of genetic and physical maps provides a means by which to resolve the order(s) of closely linked markers, to a resolution of perhaps 10–20 kb, a fraction of the length of a BAC clone. Using sorghum as an example, a map of > 2600 loci (J. Bowers et al., in preparation), together with high-coverage BAC libraries that have been completely fingerprinted and are being interleaved with the genetic map using a high-throughput hybridization approach (Draye et al., 2001), provides a valuable framework to scan well-defined gene pools for regions that show unusual features such as depletions of genetic variation that might be consistent with a recent epoch of selection.

Such approaches may be especially powerful in predominantly self-pollinating crops. For example, wild sorghums are largely (> 90%) self-pollinating, and improved types have traditionally been bred by selfing methods (hybrids being used only recently). Under breeding methods tailored to ‘selfing’, a breeding cycle usually only permits the accumulation of about two meiotic cycles worth of recombination. Crop breeding by selfing methods typically involves an initial cross between ostensibly divergent genotypes, producing a heterozygous F1 that is then selfed to generate divergent lineages that are repeatedly selfed. One cycle of recombination is realized in selfing of the F1, and additional recombination accumulates at a progressively diminishing level with each additional generation of selfing, as heterozygosity diminishes (Liu et al., 1996). At most, in one ‘breeding cycle’ using selfing methods, the equivalent of two meiotic cycles of recombination is realized, and such a cycle has typically involved 10 yr or more (notwithstanding the very recent use of measures such as DH development or acceleration by glasshouses, winter nurseries, or other methods). Only a few hundred ‘effective meiotic cycles’ may have passed during the few-thousand-year history of sorghum domestication, or that of many cereals. One can estimate from this (Wright, 1968) that 2–3 cM (c. 1 Mb) chromosomal regions are likely to remain in linkage disequilibrium. This suggests that mutations with large effects on domestication-related traits, that may have been quickly fixed early in the domestication process, may be surrounded by small genomic regions which are characterized by very low levels of DNA-level diversity as a result of ‘linkage drag.’ Given an average 0.4 cM density of DNA markers along the sorghum map, and the possibility of identifying additional markers using BACs as a template, there exists the means at least in principle to scan the genome for such genomic regions. While recent findings have suggested that the levels of linkage disequilibrium that persist in some predominantly outcrossing crops such as maize are much less than these predictions for selfing plants, indeed much less than was expected even for maize, it still appears possible to relate at least some sequences to phenotypes (Remington et al., 2001; Thornsberry et al., 2001). Indeed, low levels of linkage disequilibrium may permit more precise delineation of the target(s) of selection.

2. SNPs and association approaches

Searching for the footprints of selection is an effort that derives much benefit from the attributes of SSR markers (see above) – but these attributes may eventually be countered by the sheer numbers of single-nucleotide polymorphisms (SNPs) that can be discovered using contemporary techniques (Sachidanandam et al., 2001). SSR loci with their high mutation rates and large numbers of alleles per locus, are a more sensitive marker for genome scans of selection (Schlotterer et al., 1999), in which one must reveal the recent development of population structure (Remington et al., 2001). The fixation of a particular mutation leads to a dramatic reduction in the number of alleles at linked loci (as a function of the strength of selection and the rate of recombination between the selected site and the SSR locus being surveyed). This can lead to a ‘locus-specific bottleneck’, which can be detected by a skew in the frequencies of SSR alleles compared to those expected under an equilibrium neutral model. By contrast, nearly all SNPs are bi-allelic – most variant SNP alleles are relatively rare, and a single SNP alone generally provides less insight into the recent evolutionary history of a gene region than does an SSR.

The possibility of scanning vast numbers of loci for SNPs, especially complete ‘unigene’ sets of virtually all exons in known physical arrangement along a genome (Deloukas et al., 1998), may soon make it possible to directly identify genes under positive adaptive selection. Early examples are drawn from searches of well-defined gene pools for levels and patterns of variation in populations of candidate genes. For example, functional candidate genes from Arabidopsis (Kempin et al., 1995) have been used to identify a ‘stop’ mutation that is strongly associated with curding (formation of a cauliflower-like inflorescence) in the domesticated Brassica oleracea gene pool (Smith & King, 2000).

Because of the relatively strong linkage disequilibrium that may persist in some crop gene pools, SNP-based approaches may be most productively employed in partnership with background data from SSRs to identify ‘functional nucleotide polymorphisms’ in plants. While ‘association approaches’, that directly relate SNP diversity to phenotypes such as genetic diseases have been widely embraced in human genetics, the structuring of genetic variation in crop plants due to founder effects, linkage disequilibrium, and other factors creates opportunities for many false positive associations. Recently, methods have been described to obviate these problems, using SSR markers to develop a ‘genetic background matrix’ against which the distributions of SNPs could be compared to identify the subset that might implicated in selection. These methods have been applied to sequence data for 92 maize inbreds to implicate a previously cloned gene, ‘dwarf8’ in the genetic control of plant height and flowering time (Thornsberry et al., 2001).

Advances in ‘re-sequencing technologies’, methods to investigate variation across taxa or gene pools in the prototype sequences that result from a large-scale genomics project (often from just one genotype), promise to bring the cost of assaying sequence variation down dramatically in the next few years. One could envision the exploration of core collections of a few hundred individuals chosen from the gene pool of a major crop, for variation in large numbers of genes and analysis of the relationship of individual genes to phenotypes. In the long view, such approaches might obviate the need for ‘proxy’ markers such as SSRs and RFLPs – if one can search all the genes, one no longer needs such proxies. A curious corollary to this notion is that the limiting factor in relating genes to their functions would no longer be at the molecular level – but would be the amount and quality of phenotypic data available for large germplasm collections. For many taxa, present investment in genetic resources is inadequate to do anything more than maintain the germplasm, and much phenotypic characterization will be needed to ‘catch up’ to the information levels that would empower this new wave of functional genomics.

3. Identifying genes directly responsible for domestication

To date, few specific DNA sequences can be directly implicated in the domestication of a crop. The two clearest cases, the tb1 gene of maize (Dorweiler & Doebley, 1997) and fw2.2 gene of tomato (Frary et al., 2000), each were initially located on a genetic map using QTL methods, and were subsequently subjected to genetic dissection using well-established methods (Paterson et al., 1990) followed by transposon mutagenesis (tb1; Dorweiler & Doebley, 1997) and mutant complementation (fw2.2; Frary et al., 2000), respectively. These approaches are being pursued in the study of a number of other genes, aided by ever more detailed genetic and physical maps, and burgeoning EST databases. A growing number of examples exist in which possible domestication-related functions such as ‘shattering’ are now well-understood in botanical models such as Arabidopsis (Liljegren et al., 2000), providing valuable clues, and in some cases high-likelihood candidates (Smith & King, 2000), for the corresponding genes in crops.

What is the ongoing role of QTL mapping in genetic dissection of plant domestication? Many basic functions that are shared by many plants will be more quickly and efficiently elucidated in botanical models. However, virtually all plants that are domesticated are themselves models for some key aspect of plant growth and development – for example, this author knows of no Arabidopsis mutant with a structure that resembles the 1–2 inch-long single celled fiber of cotton, the leaf-enveloped ear of maize, or the enlarged ‘berry’ (fruit) of tomato. A particularly powerful advantage of QTL mapping is that it requires no a priori information other than the existence of variation and a means to quantify it – as such it is likely to continue to provide an important starting point for dissection of the many unique features of domesticated plants (and for that matter, animals). Detailed ‘gene encyclopedias’ built up from EST and genomic sequence data for major crops and botanical models will empower the application of association approaches in scale. In taxa with a level of linkage disequilibrium that is ‘just right’, this may lead to the direct identification of phenotypes for some genes, obviating the need for marker-based approaches – however, it seems likely that QTL mapping will remain an efficient prologue to association approaches even in well-studied genomes (Thornsberry et al., 2001). And, since crops represent only a tiny fraction of botanical diversity, most of which remains unexplored at the genomic level, positional data about interesting phenotypes is likely to remain an early line of investigation for a long time to come.

VI. Perspectives

The message that many plant domestications need not have been either slow or gradual is highlighted by several independent results of QTL mapping and related studies, including the finding that small numbers of QTLs often account for large portions of phenotypic variation in domestication traits, that these QTLs tend to be clustered in the genome (or represent pleiotropic effects of a smaller number of genes), that these QTLs tend to lie at corresponding locations in different taxa (further attesting to their singular importance), and that the levels and patterns of variation in crop genes and gene pools may be explained at least in principle by brief periods of intense selection in small populations.

The continuing importance of new plant domestication(s), seems inevitable in view of growing human populations, loss of arable lands (Pimentel et al., 1995), and the ever-changing climate of agricultural production economics. Simmonds (1979) asserts that domestications have proceeded at a relatively constant rate for thousands of year, yet modern humans rely on domesticates of only a small sampling of plant biodiversity. Simmonds (1976) lists 230 major and minor crops, representing only 6% (180) of genera and 20% (64) of angiosperm taxonomic families roughly estimated to exist. Broader exploration of the collective outcomes of approx. 200 million yr of flowering plant evolution, is sure to impel new domestications. High-throughput methods such as EST and genomic shotgun sequencing provide one approach to rapidly explore plant biodiversity – but many valuable discoveries may lie in rapidly evolving ‘orphan genes’ (Yang, 1998, 2000; Swanson et al., 2001a, 2001b) the function of which is not apparent based on their sequence alone.

Together with greater breadth of exploration of plant biodiversity, recent results have hinted at the near-term benefits that may accrue to greater depth of exploration of taxa that include cultigens. For example, the ‘re-synthesis’ of allopolyploid crop genomes, or sexually compatible relatives, has long been an area of interest for the extraction of valuable traits from exotic germplasm. However, like other introgression efforts, such enterprises are often hindered by the need to eliminate mal-adaptive traits that are often linked to desirable genes from exotic sources, and also by the aberrant transmission genetics that often occurs in wide hybrids. The tools of QTL mapping permit one to identify diagnostic markers useful for deterministic gene transfer, retaining valuable chromatin even if it transmits only infrequently (Jiang et al., 2000), and eliminating extraneous chromatin more quickly than would occur by traditional means (Paterson et al., 1988). Especially great benefits may be realized by this approach in recently formed polyploids such as wheat and peanut, in which there are low levels of variation among naturally occurring genotypes.

The recent discovery of favorable transgressive alleles, from apparently unfavorable genotypes, is a second avenue that warrants invigorated exploration of nondomesticated genotypes. The prospects for finding valuable genes from seemingly unlikely sources gains strong support from discoveries in other taxa, such as genes conferring salt tolerance from intolerant genotypes (Breto et al., 1994), genes conferring red fruit from green-fruited tomatoes (Tanksley et al., 1996), genes conferring high yield from low-yielding rices (Xiao et al., 1996), and genes conferring high sugar content from low-sugar relatives of sugarcane (Ming et al., 2001). The use of backcross-self approaches in conjunction with DNA markers promises to provide not only basic information about quantitative inheritance, but improved germplasm suitable for incorporating new variation into crop gene pools.

Recent successes in isolation of the specific genes that partially underlie domestication events (Doebley et al., 1997; Frary et al., 2000) provide the first glimpses into the developmental steps that have been modified by domestication(s), and the types of mutations that distinguish cultigens from their wild relatives. Accelerating growth in genomic tools and methods available for relating genes to their functions promises that a few years hence, the views expressed herein will require re-examination in light of much more data. Truly, it is an exciting time to be a plant biologist!


The author thanks many coworkers, colleagues, and collaborators who have contributed much to his research and views on domestication, and the US Department of Agriculture, National Science Foundation, International Consortium for Sugarcane Biotechnology, and Georgia and Texas Agricultural Experiment Stations for financial support of research in his lab.