II. A brief history of domestication in grasses 275
III. Domestication genes 278
IV. Models of the domestication process 280
V. Evolutionary consequences of domestication for grass genomes 281
VI. Mating systems and the evolutionary dynamics of domestication in grasses 284
VII. Conclusion 286
Crop grasses were among the first plants to be domesticated c. 12 000 yr ago, and they still represent the main staple crops for humans. During domestication, as did many other crops, grasses went through dramatic genetic and phenotypic changes. The recent massive increase in genomic data has provided new tools to investigate the genetic basis and consequences of domestication. Beyond the genetics of domestication, many aspects of grass biology, including their phylogeny and developmental biology, are also increasingly well studied, offering a unique opportunity to analyse the domestication process in a comparative way. Taking such a comparative point of view, we review the history of domesticated grasses and how domestication affected their phenotypic and genomic diversity. Considering recent theoretical developments and the accumulation of genetic data, we revisit more specifically the role of mating systems in the domestication process. We close by suggesting future directions for the study of domestication in grasses.
Plant domestication dramatically affected the fate of human societies by playing a crucial role in the shift from hunting and gathering to agriculture. Domestication is an evolutionary process whereby a population adapts, through selection, to new environments created by human cultivation, most of these adaptations being deleterious in the wild. A number of domesticated plants, at least in seed-propagated species, experienced dramatic genetic and phenotypic changes through many reproduction–selection cycles (Zohary, 2004). Domestication is a unique example of rapid evolution by selection, and was a central metaphor in Darwin's theory (Darwin, 1859, 1882). Indeed, as an evolutionary process, domestication has specific characteristics making it especially suitable for studying adaptation on a short timescale (Gepts, 2004). Over the course of a few hundreds or thousands of years, drastic environmental changes driven by human cultivation selected for striking new adaptations. These are easy to recognize at the phenotype level and to study (Purugganan & Fuller, 2009).
Among plant crops, grasses are by far the most important species. Wheats (Triticum sp.) and barley (Hordeum vulgare) were the first domesticated plants (Zohary & Hopf, 2000), and major cereals – bread wheat (Triticum aestivum), rice (Oryza sativa), maize (Zea mays), sorghum (Sorgum bicolor) and pearl millet (Pennisetum glaucum) – are current main staple crops for humans. Many other grasses were domesticated as cereals, although most of them are currently minor crops (Simmonds, 1976) (Figs 1, 2, Supporting Information Table S1). Many grasses are also used as fodder, such as fescue (Festuca sp.) and rye-grass (Lolium sp.), or for specific industrial production, such as sugarcane (Saccharum officinarum). The domestication process led to a suite of striking phenotypic changes, collectively referred to as the domestication syndrome, which many cereal grasses share, through convergent evolution (Harlan et al., 1973). In this review we focus on major cereal crops, for which recent genomic data provide new tools to investigate the genetic basis of domestication changes, and more generally how crop genomes were moulded by the strong selective pressures involved in the domestication process. After sketching briefly the history of domestication in grasses, we review and discuss how domestication affected phenotypic and genomic evolution in these species. Previous reviews mainly addressed specific aspects of plant domestication (e.g. Paterson, 2002; Ross-Ibarra, 2005; Doebley et al., 2006; Doust, 2007; Ross-Ibarra et al., 2007; Burger et al., 2008) or detailed the domestication process in one or a few species (e.g. Salamini et al., 2002; Doebley, 2004; Kovach et al., 2007). In this review, we use grasses to illustrate key patterns and mechanisms in domestication. We also use a comparative approach to discuss more specifically the role of genetic systems in the domestication process, comparing verbal arguments, predictions based on explicit population genetics models, and available data.
II. A brief history of domestication in grasses
1. Timescale of grass evolution and domestication
The grass family (Poaceae) includes approx. 10 000 extant species (Watson & Dallwitz, 1992). It originated in the late Cretaceous c. −80 Mya according to both recent fossil data (Prasad et al., 2005) and molecular calibrations (Janssen & Bremer, 2004). The core Poaceae split into two major clades, the BEP and the PACCMAD clades (Fig. 1; GPWG, 2001; Bouchenak-Khelladi et al., 2008), which diverged around, or even before, −55 Mya (Bremer, 2002; Leebens-Mack et al., 2005; Prasad et al., 2005). Domesticated and cultivated species belong to these two clades within four subfamilies, two in each clade: Ehrartoideae (rices) and Pooideae (wheats, barley, rye and oats –Avena sp.) in the BEP clade, and Panicoideae (maize, sorghum, pearl millet and sugar cane) and Chloridoideae (finger millet –Eleusine coracana– and tef –Eragrostis tef) in the PACCMAD clade (Fig. 1). All these species except rices come from the three most diverse subfamilies of the 13 belonging to the Poaceae (Fig. 1). Pooideae diversified in cool temperate and boreal regions and include C3 species only, while Panicoideae and Chloridoideae diversified in the tropics and subtropics and include many C4 species. Among these subdomesticated and cultivated crops belong to a few genera or tribes, forming clusters of closely related species, such as Triticeae (wheats, barley and rye), which emerged c. 12 Mya (Gaut, 2002; Huang et al., 2002), or Andropogoneae (maize, sorghum and sugarcane), which emerged between 9 and 16.5 Mya (Gaut & Doebley, 1997; Gaut, 2002).
Compared with the evolution of the grass family, the timescale of the domestication process is much more recent. The transition from foraging to agriculture began c. 12 000 yr ago in the Fertile Crescent (now spanning Israel, Jordan and parts of Turkey) and more recently in other parts of the world. It was probably associated with warmer and dryer climatic conditions following the Younger Dryas period, a brief cold climate period at the end of the Pleistocene, c. 11 500 to 13 000 BP (Wright, 1976), and with the hunting to extinction of large mammalian game (Diamond, 1999, 2002). While it happened rapidly on an evolutionary timescale, domestication was still a gradual process involving a stage of cultivation of wild plants that preceded morphological domestication (Weiss et al., 2006). The rate of transition from initial cultivation to full domestication is still debated. Domestication experiments in einkorn wheat (Triticum monococcum ssp. monococcum), the first domesticated wheat, and barley as well as population genetic simulations suggest that acquisition of a tough rachis, a key trait for domesticated cereals (see II.3), could potentially evolve very rapidly in 20 to 100 yr, even with unconscious selection (Hillman & Davies, 1990). However, archaeological data indicate that the establishment of this trait probably took over 1500 yr in wheat and 2000 yr or more in barley (Tanno & Willcox, 2006; Fuller, 2007). Similar timescales of c. 1000–1500 yr have also been proposed for Asian rice (Oryza sativa) (Fuller, 2007). The domestication process was thus probably slower than previously assumed, involving a rather long pre-domestication cultivation stage and a protracted transition towards full domestication (see Section V) (Wright, 1976; Fuller, 2007; Allaby et al., 2008).
2. Origin of domesticated species
A limited number of domestication centres have been described (Fig. 2; Diamond, 2002). Within these centres, the distribution of domesticated species can be linked to the biogeographical distribution of grass subfamilies. In tropical and subtropical centres, species from the Panicoidoideae and the Chloridoideae were domesticated while Pooideae species were domesticated in the more temperate Fertile Crescent (Simmonds, 1976; Frankel et al., 1995). Interestingly, in America, domestication of cereal grasses was mainly confined to maize, and agriculture originated with staple crops other than grass cereals in two or three out of four centres. Some minor cereals were also probably domesticated in other areas such as the Indian peninsula and Japan, usually not considered as major domestication centres (Fig. 2).
For most domesticated species, the wild progenitor has been identified by combining data on the morphology, biogeographical distribution and ecology of extant wild species and archeobotanical data (see Table S1). The origin of maize (Zea mays ssp. mays) was initially much debated because, unlike most crops, it has no morphologically equivalent wild form (reviewed in Doebley, 2004). Molecular evidence now clearly shows that teosinte (Z. mays ssp. parviglumis), while it differs strikingly from maize in lateral branching and the female inflorescence, is the direct ancestor (Doebley et al., 1984; Matsuoka et al., 2002).
Knowledge of the wild progenitors offers a unique opportunity to assess the phenotypic and genetic consequences of the domestication process by contrasting initial and final states. For most species, extant wild populations are still known. Comparison of domesticated grasses with their wild relatives is the basis for identifying the set of traits adapted to human cultivation, collectively referred to as the domestication syndrome. Most of these traits are shared across species through convergent evolution in response to similar selective pressures. Below, we detail the domestication syndrome associated with cereal grasses; some of these traits are shared with other grain crops.
3. Morphological changes during domestication: the domestication syndrome
Archaeological evidence suggests that wild grasses were harvested before cultivation (Weiss et al., 2006). Under harvest alone, selection favours wild-type traits as only seeds escaping the harvester contribute to the next generation. The situation was reversed as soon as humans started to sow what they had harvested: harvested seeds are now those contributing to the next generation. Strong disruptive selection then occurred between wild and cultivated forms (see Fig. 3b). This transition was associated with selection-targeted traits related to harvesting conditions, seed production, and seedling competition (Harlan et al., 1973).
Traits associated with harvesting Nonshattering of seeds at maturity is often regarded as the most important domestication trait and it is often diagnostic of domesticated forms for archaeologists. Most seeds that do not shatter are harvested, while shattering seeds are dispersed and lost. This makes crop propagation fully dependent upon farmer harvesting and sowing so that conscious selection is not necessary to explain the evolution of such a trait (Harlan et al., 1973; Zohary, 2004). Selection for other traits associated with seed dispersal was also relaxed. For instance, domesticated wheats are less hairy than wild forms, and awns are reduced or absent.
Under cultivation, space became the limiting factor, while time was limiting for foraging (Fuller, 2007). This selected for more determinate and compact growth habit. This is especially striking for crops belonging to the PACCMAD clade (maize, sorghum and different millets). Vegetative branching differs between the PACCMAD and the BEP clades. In the BEP clade, tiller production is favoured over axillary branching, whereas in the PACCMAD clade, in addition to tillers, plants produce axillary branches under nonlimiting space and light conditions (Doust, 2007). During domestication, the outgrowth of multiple axillary branching was strongly selected against, as exemplified by the difference between teosinte and maize (Doebley et al., 1997).
Harvesting also selected for synchronization of maturation. The life cycle of domesticated forms is less plastic than that of wild plants, and vernalization and control by day length can be weaker or even lost. Selection for synchronization may also lead to a shift from perennial to annual, as in Asian rice (Cheng et al., 2003) or Kodo millet (Paspalum scrobiculatum) (de Wet et al., 1983).
Traits associated with seedling competition Cultivation also selected for a general increase in seedling vigour through an increase in seed size and in carbohydrate relative to protein content. The increase in seed size was mostly a result of an increase in the size of the endosperm, which is richer in carbohydrate but lower in proteins than the embryo (Harlan et al., 1973). Rapid germination was also selected for through the reduction or loss of dormancy and the reduction of glumes. In wild oats, einkorn and emmer (Triticum turgidum ssp. dicoccum), the erratic rainfall of the Mediterranean region selected for an interesting pre-adaptation to cultivation under competitive conditions. Each spikelet contains a dormant seed and a nondormant, much bigger one, which germinates with the first rains in the autumn and which can compete with dense populations of other annual plants (Harlan et al., 1973).
Traits associated with seed production and use In addition to unconscious selection associated with seedling competition, seed size might have been selected consciously by early farmers, together with other traits increasing seed production. For instance, changes in inflorescence structure were selected to produce higher yields. In crops from the PACCMAD clade, where branching is reduced, larger inflorescences were selected for, especially in maize, sorghum and pearl millet. In crops from the BEP clade, such as wheats, barley and rice, the number of inflorescences was preferentially selected for through denser tillering (Doust, 2007). More compact spikes with more fertile flowers were also selected for. In maize, the female inflorescence is strikingly different between teosinte, with two ranks of single spikelets, and maize, with multiple ranks of paired spikelets (Doebley, 2004). The transition from two-rowed to six-rowed barleys resulted from similar selective pressures (Komatsuda et al., 2007).
Finally, a naked kernel became a desirable trait as it allowed free-threshing. In the first domesticated cereals, grains were hulled and the naked grain was sometimes selected later. No naked form was selected in einkorn wheat. In emmer wheat, which is hulled, selection for naked grain yielded durum (pasta) wheat. Similarly, hulled spelt wheat led to naked bread wheat (Zohary & Hopf, 2000). However, hulled varieties of wheat and barley were sometimes preferred, maybe for improved storability or for specific uses, such as beer brewing (Purugganan & Fuller, 2009).
Crops then spread out of their centre of origin. While minor crops remained cultivated close to their centre of origin, major crops spread world-wide, mostly after the advent of world travel in the 16th century. This came along with diversifying selection and adaptation to new habitats, leading to locally and culturally adapted landraces, which phenotypically diverged from the initial crop. Here, we will not discuss further post-domestication evolution (see, for instance, Doebley et al., 2006; Burger et al., 2008).
III. Domestication genes
Recent genetic and genomic studies allow identification of the genes underlying the phenotypic evolution described above. Here, we review strategies that have been used so far to identify domestication genes and how these studies shed new light on the genetic basis of the domestication process in grasses.
1. How to identify domestication genes: from phenotypes to genotypes, and back
The traditional method of identifying domestication genes is to examine co-segregation of genetic markers and phenotypes in crosses between cultivated and wild genotypes. Modest genetic divergence between a crop and its wild relatives rarely prevents such types of crossing design being used. This approach, combined with the wealth of molecular markers that have recently become available, has yielded a very large number of quantitative trait loci (QTLs) or regions harbouring genes controlling the variation of interest (reviewed in Paterson, 2002). Going from QTL mapping, typically involving mapping at best within 1–2 centiMorgans, to actual genes is often a heroic task, especially in genomes of grasses such as polyploid wheats, which may have huge genomes and many intergenic regions filled with repetitive sequences. In this respect, the availability of the genome sequences of at least two rice varieties, the indica and japonica cultivated types, is a formidable tool for going from QTL mapping to the gene level. Combining traditional QTL studies with rice transgenic transformants has allowed rapid progress in identifying domestication genes (see Table 1). An alternative to strict linkage mapping is to perform so-called ‘association mapping’, which relies on the statistical associations between marker loci and QTLs in large populations. One theoretical advantage of the latter approach is the exploitation of patterns of linkage disequilibrium generated by deeper genealogical links between individuals, resulting in the accumulation of more recombination events than in traditional crossing designs (Nordborg & Tavare, 2002). This approach has so far mostly been employed for cultivated populations (for maize see Thornsberry et al., 2001) and remains to be implemented in heterogeneous populations mixing cultivated and wild ancestors.
Table 1. A selection of domestication genes in grasses
A radically different approach is to rely solely on patterns of polymorphism and seek genomic regions exhibiting footprints of selection in samples of genotypes contrasting the extant diversity of crops and their wild ancestors (Wright et al., 2005). Following the advent of cheap high-throughput sequencing and genotyping, this population genetics approach can be seen as an attractive alternative to the forward genetic strategies. In practice, several factors may limit the efficacy of such a strategy. One limiting factor is the power of tests to detect selection footprints and the need to specify a plausible null hypothesis for the distribution of these tests that incorporates a possibly complex demographic history (see below). The second limiting factor is that, whenever a selection footprint is detected, it may be hard to date unambiguously the selection event and whether domestication or subsequent artificial selection triggered the selection. To date, this approach has only yielded a few new candidates for domestication genes and has mainly identified already known domestication genes or QTL segments (Wright et al., 2005). However, use of such genome scans may be a powerful way to narrow the search to candidate regions that can be further investigated for their phenotypic effects (Ross-Ibarra et al., 2007).
2. The nature of domestication genes and the nature of selective changes
Inspection of the actual domestication genes and the molecular basis for phenotypic changes reveals a wide variety of molecular changes, including almost ubiquitous changes in gene expression, amino acid changes, and so far only two single ‘loss of function’ mutations (frameshift deletions in the Rc–brown pericarp and seed coat– and vrs1 – six-rowed spike 1 –genes; see Table 1). This contrasts with earlier views of domestication proceeding mostly through recessive loss-of-function mutants. Patterns of dominance of mutations have been found to be quite variable both at the gene and the QTL levels (see Burger et al., 2008 for a detailed account at the QTL level). However, an interesting pattern emerges across the genes cloned so far. In Table 1, seven genes out of 10 are regulatory genes, mainly transcription factors. If this tendency is confirmed, it would support the view that morphological variations between wild and crop species can be easily created through the regulation of development pathways. The pleiotropic nature of these genes could also explain how few genetic changes may have such strong pleiotropic phenotypic effects.
3. Number of domestication genes and convergent evolution
Theoretical (Le Thierry d’Ennequin et al., 1999) and QTL studies (Burger et al., 2008) suggest that few gene regions of large effect are likely to underlie trait changes under domestication. However, a recent genomic scan in maize and rice suggested that many more genes could have been involved. Wright et al. (2005) suggested that 2–4% of genes, that is, c. 1200 genes, experienced artificial selection in maize. In rice, selection affecting polymorphism throughout the genome was also invoked to explain the peculiar pattern of polymorphism (Caicedo et al., 2007): many derived single nucleotide polymorphism (SNP) alleles segregate at high frequency, a molecular signature of positive selection.
The issue of the number of genes needed to ‘build’ a crop is also related to the issue of convergent evolution at the molecular level. More than a decade ago, Paterson et al. (1995) proposed that the convergent phenotypic evolution observed for a variety of domesticated grasses had occurred through independent selection of mutations at orthologous genetic loci. Since then, the accumulation of QTL data has not strongly supported this hypothesis. One should note, however, that the imprecision typically associated with QTL location, 5–10 cM and sometimes more, makes a direct comparison less straightforward. More precise analyses using molecular mapping did not support Patterson et al.'s hypothesis. For instance, the comparative mapping of seed shattering genes in wheats, barley, rice, maize and sorghum suggests multiple genetic routes for the selection of the nonshattering phenotype (Li & Gill, 2006). In barley, different nonorthologuous loci control the nonbrittle rachis phenotype (Komatsuda et al., 2004), which suggests that different nonorthologous loci can be recruited, even within a single species.
Recently, the comparison of genes cloned in different species has become possible. So far, the findings of such studies also partly contradict Patterson et al.'s hypothesis. tb1 (teosinte branched1), the major gene controlling branching phenotype in maize, has only a minor and variable effect in foxtail millet (Setaria italica). In this species, other candidate genes seem to be involved, which suggests that orthologous loci may not be involved in phenotypic convergence (Doust et al., 2004). The evolution of the glutinous or waxy phenotype is another compelling example of convergent evolution, associated with long-standing cooking practices. Evidence of convergent selection on the same waxy gene in rice (Olsen et al., 2006) and maize (Fan et al., 2008) supports Patterson et al.'s hypothesis. However, the selection of the same phenotype in foxtail millet is a result of multiple insertion and deletion of a transposable element in the granule-bound starch synthase 1 (GBSS1) gene (Kawase et al., 2005). Finally, analysis of the gene controlling the number of rows in barley, vrs1, shows that at least three independent mutations in this same gene underlay the convergent transition from the two-rowed to the six-rowed phenotype (Komatsuda et al., 2007).
IV. Models of the domestication process
In combination with archaeological studies, genomic data are powerful tools that can be used to unravel domestication scenarios. Supposedly neutral markers, such as microsatellites, or anonymous gene fragments were initially used, but tracing back the history of the very genes underlying domestication changes is pivotal to determining the tempo and the mode of the domestication process. This helps to elucidate not only the demographic history of the domesticated traits but also the dynamics of the selection acting on them.
1. Single versus multiple domestications
Whether wild progenitors were taken into cultivation only once or more than once is still debated in many species (for instance Zohary, 1999; Allaby et al., 2008; Ross-Ibarra & Gaut, 2008). A single origin is consistent with a rapid emergence of domesticated crops, associated with the selection of nonshattering, commonly viewed as the very first domestication trait, followed by diffusion from their centre of origin. This scenario was often considered as the default scenario for most species. Maize is the most convincing example of a single domestication event, as was clearly proved by microsatellite analyses (Matsuoka et al., 2002). Genetic studies also indicated a precise location for the early diversification of maize in Mexico, in the highlands near Oaxaca close to the oldest known archaeological remains (Piperno & Flannery, 2001). Pearl millet is another example of a species that was probably domesticated only once, in an area ranging from the interior delta of the Niger to the Aïr mountains (Oumar et al., 2008). The history of domestication in a number of major grass crops, such as rice and barley, is still a subject of debate and extensive investigation.
In rice (Oryza sativa), two main groups, currently called japonica and indica, were recognized as early as the Chinese Han dynasty (approx. AD 100) (reviewed in Sweeney & McCouch, 2007). Phylogenetic analyses of these groups suggest a polyphyletic origin from the wild species Oryza rufipogon (Cheng et al., 2003; Zhu & Ge, 2005; Londo et al., 2006). The indica and japonica groups may have diverged 0.4 Mya, long before domestication (Zhu & Ge, 2005). However, indica and japonica genotypes share the same haplotype at two important domestication loci, controlling the nonshattering (Li et al., 2006) and the white pericarp phenotypes (Sweeney et al., 2006), suggesting a common origin for these genes. Under a multiple domestication scenario, these results provide evidence for a single origin of these domestication alleles, followed by introgression driven by strong directional selection (Kovach et al., 2007; Sang & Ge, 2007; Sweeney et al., 2007). A genome-wide microsatellite survey also indicates that completely independent domestication is less likely than a domestication scenario involving partial sharing of an ancestral population or recent gene flow (Gao & Innan, 2008).
Domesticated barleys are morphologically diverse, spikelets having either two rows (Hordeum vulgare ssp. distichum) like the wild form (Hordeum vulgare ssp. spontaneum) or six rows (Hordeum vulgare ssp. vulgare). A single domestication from the Jordan–Lebanon region was initially inferred through AFLP (Amplified Fragment-Length Polymorphism) analyses (Badr et al., 2000). However, more recent analyses suggested the possibility of multiple domestication events, one of them perhaps located east of the Fertile Crescent (Morrell & Clegg, 2007; Saisho & Purugganan, 2007), in agreement with distinct genetic control of the nonbrittle rachis phenotype between oriental and occidental lines (Komatsuda et al., 2004). The western/eastern separation is also compatible with the morphological distinction between the two-rowed and six-rowed spikelet phenotypes (Saisho & Purugganan, 2007), in agreement with their distinct genetic clustering (Kilian et al., 2006). Finally, the independent mutations in the vrs1 gene that led to the six-rowed phenotype (see III.3) also suggest a recurrent domestication process (Komatsuda et al., 2007).
2. Towards a protracted model of domestication
These examples suggest that multiple domestications and complex scenarios seem to be more frequent than initially thought. The slow rate of the domestication process inferred on the basis of recent archaeological data (see II.1) also challenges the rapid transition model (Tanno & Willcox, 2006; Fuller, 2007). Under a protracted model with a long timescale, multiple local domestications with gene flow between localities of cultivation are likely, making the geographical origin of the crops more diffuse and the origins of the different cultivated pools nonindependent (Allaby et al., 2008), as exemplified by rice and barley. In rice, the scenario proposed to resolve the paradox of multiple domestications with the sharing of the nonshattering haplotype (Kovach et al., 2007; Sang & Ge, 2007) also suggests that the domestication process began before the acquisition of the nonshattering traits, at least in one of the two rice varieties. This is consistent with new archaeological data suggesting that grain size and shape evolved before nonshattering in rice, but also in einkorn wheat and barley (Fuller, 2007).
Einkorn wheat (Triticum monococcum ssp. monococcum) also offers a good example of the evolving view of the domestication process. The first genetic analyses suggested a monophyletic origin from the wild subspecies Triticum monococcum ssp. boeticum, localized in the Turkish Karaçadag mountains (Heun et al., 1997) close to some of the earliest sites of agricultural settlements in the Near East (Zohary & Hopf, 2000). However, a recent analysis, based on a very wide sample of wild populations, modified the scenario (Kilian et al., 2007). The wild subspecies was found to be genetically structured into three groups, one of them, called β, being the sister clade of the domesticated form. Because the domesticated form is as polymorphic as this β form, the authors suggested a protracted ‘dispersed-specific’ model to explain their genetic data. The wild β form would have been initially cultivated in the Karaçadag region, then dispersed and domesticated locally several times. If a single origin scenario appeared to be a good approximation of the data on a rough scale, more detailed genetic studies recently showed that a more complex and protracted model is more appropriate.
V. Evolutionary consequences of domestication for grass genomes
1. Loss of diversity ...
The domestication scenarios discussed above imply that the amount of genetic diversity in a crop is a variable fraction of the initial diversity present in its wild ancestor's gene pool. As expected, studies comparing gene nucleotide polymorphism in crops vs wild relatives document substantial loss of diversity during domestication (Table 2, Fig. 3). Bottleneck effects on crop genetic diversity have often been quantified using a simple demographic model featuring an instantaneous change in population size at the time of domestication. This model is, in our opinion, not to be taken literally or meant to be historically correct but aims merely to quantify the net effect of the cumulative impact of domestication (and subsequent selection) on current crop diversity. It can also serve subsequently as a null model to locate genome fragments that exhibit atypical loss of diversity suggestive of selection. In this context, the intensity of the bottleneck can be quantified by the ratio of (long-term effective) population size before (Nwild) and after domestication (Ndom; see Table 2), given that in principle only scaled mutation rates (product θ= 4Neµ, where Ne is the effective population size, and µ the mutation rate) are estimable from the data (Table 2). One exception is the situation where patterns of polymorphisms are surveyed using loci with known mutation rates (Thuillet et al., 2005). This allows direct estimation of the long-term effective size realized in each gene pool and may also provide a more accurate picture of the intensity of successive bottlenecks during domestication and subsequent selection.
Table 2. Loss of genetic diversity in some grasses during domestication
Inspection of Table 2 reveals wide ranges of bottleneck intensities. Variation in this ratio can be caused by several nonmutually exclusive factors, including the intensity of selection associated with domestication and with subsequent breeding, and mating systems. Although einkorn shows the lowest loss of diversity, which is consistent with its limited use and nonintense breeding, the claim that einkorn underwent no reduction of diversity during domestication seems overstated (Kilian et al., 2007). The domesticated form is as diverse as the β form from which it arose; however, the β form contains only a fraction of the whole wild gene pool diversity (Table 2). With the exception of einkorn, it is worth noting that selfing species suffer from the greatest loss of diversity. While the number of species surveyed is too low to allow definitive conclusions to be drawn, this finding is consistent with two predictions from population genetics theory. First, selection acting on domestication genes might affect a bigger fraction of the genome in selfing species because of stronger genetic linkage (e.g. Caicedo et al., 2007). Secondly, restoration of genetic diversity after domestication through wild-to-crop pollen gene flow is more likely in outcrossing than in selfing species.
While levels of diversity have often been analysed using coalescent simulations and the simple bottleneck population model described above, a protracted model, featuring mild bottlenecks and recurrent gene flow that can mitigate the loss of diversity on a local or regional scale, seems closer to the historical truth (Brown et al., 2009). Future research should explore whether parameters of a coalescent model incorporating a realistic gene flow component can be estimated independently from the bottleneck intensity and whether such gene flow contributes substantially to current observed levels of diversity. To go further, a systematic comparison of several scenarios ranging from single short bottleneck models to a fully protracted model to explain current patterns of diversity in many species would be enlightening.
2. ... and regeneration
Another open question regards the amount of diversity regained after domestication. Studies probing genetic diversity solely by using nucleotide polymorphism in genic regions are not surprisingly finding that virtually all nucleotide variation currently segregating in crops is merely a subset of the variation present in the ancestral wild gene pool (see the expected theoretical patterns of differential regeneration in Fig. 3c). It is thus difficult to distinguish between ancestral wild variation retained in the crop and allele reintroduction through gene flow from the wild populations. In barley landraces, photoperiod response follows a latitudinal cline, with the responsive forms, which flower early under long days, in the Mediterranean region, and nonresponsive forms predominating in the north of Europe (Jones et al., 2008). The nonresponsive form is associated with an SNP present in some wild Iranian and Israeli populations. This suggests that the nonresponsive form evolved after domestication through introgression of a wild haplotype during the spread of agriculture in Europe, or that two distinct domesticated pools spread through the north and the south of Europe (Jones et al., 2008).
Studies focusing on microsatellite variability (Thuillet et al., 2002; Vigouroux et al., 2002; Thuillet et al., 2005) or structural variation (Morgante et al., 2005) can document ongoing production of new mutations in both maize and wheat. Estimates for the rate of genome-wide mutation affecting quantitative traits are scarce in grasses but the few studies available suggest that new mutations have the potential to increase heritable phenotypic variation by as much as 0.1–1% per generation (Sprague et al., 1960; Lynch, 1988; Houle et al., 1996; Bataillon, 2000). Whether such variation is primarily attributable to unconditional deleterious alleles or can provide heritable variation for adaptation to a variety of conditions is still not resolved. We expect that new beneficial mutations affecting phenotypes are exceedingly rare in well-adapted populations. However, new mutations can have a sizable probability of being beneficial when adapting to a new phenotypic optimum (Martin & Lenormand, 2006). There is thus hope for a resolution of the apparent paradox of variation in crops relative to their wild ancestors: at the molecular level variation is typically low in crops relative to their wild relatives, while the reverse is found at the phenotypic level, as emphasized by Darwin (1878). A recent study suggests that this scenario may have occurred in sorghum (de Alencar Figueiredo et al., 2008). Of six candidate genes analysed, two displayed novel variations derived from post-domestication mutations, suggesting that neo-diversity contributed to new adaptations for human uses.
3. The genomic cost of domestication
The evolutionary history of domestication in grasses is characterized by a genome-wide loss of diversity attributable to rather intense selection on a subset of wild genotypes exhibiting desirable phenotypic changes. This means that the genome-wide effects of random genetic drift have probably been magnified relative to the wild populations. Moreover, strong selection on domestication genes is likely to have swept away pre-existing variations in neighbouring regions in a window spanning 10–100 kb (see review by Purugganan & Fuller, 2009). Theory predicts that selection at one locus can greatly diminish the efficacy of selection at neighbouring loci (Kimura, 1962), especially in regions of the genome that are poorly recombining. Although there are wide variations in the rate of recombination we expect that centromeric regions and chloroplast and mitochondrial genomes will be particularly affected (Gordo & Charlesworth, 2001). One way to document such effects is to look for signs of relaxed selection against slightly deleterious mutations. This is typically done by examining the frequency of nonsynonymous (amino acid changing), and thus most often deleterious, SNP variation. The rationale is that, in populations where natural selection is maximally efficient, new slightly deleterious mutations at nonsynonymous sites will be either weeded out or kept at very low frequency. A genome-wide survey of patterns of non-synonymous versus synonymous divergence between two rice subspecies crops (Oryza sativa ssp. indica and Oryza sativa ssp. japonica) and a wild relative (Oryza brachyantha) suggests acceleration in protein evolution between rice cultivars (Lu et al., 2006). A total of 15 406 genes were compared between cultivars and 4640 genes between Oryza sativa ssp. japonica and Oryza brachyantha. The authors then binned genes in fragments of 2 megabases and found that the ratio of amino-acid versus synonymous divergence values, KA/KS, for each 2-Mb fragment between O. sativa ssp. japonica and O. sativa ssp. indica was negatively correlated with the average recombination rates of that fragment (r = −0.192, P = 0.008). A detailed analysis of the site frequency spectrum of SNPs in 111 gene fragments also found a very high frequency of derived SNPs (Caicedo et al., 2007). The author explored the fit of various demographic scenarios that were either neutral or included some form of selection. Patterns of variation could not be explained by a neutral model featuring a simple bottleneck. Other studies have also documented the accumulation of slightly deleterious mutations in domesticated species, such as in the mitochondrial genome of the domestic dog (Cruz et al., 2008) as well as in industrial and laboratory yeast strains (Gu et al., 2005). In several grass species, including maize, sorghum and wheat, growing data on nucleotide polymorphism could be used to investigate systematically this phenomenon and provide additional insights into the evolutionary history of crops that will complement our knowledge about patterns of diversity loss during domestication.
VI. Mating systems and the evolutionary dynamics of domestication in grasses
Mating systems are thought to have played a key role in domestication. It is usually assumed that self-fertilization made domestication easier (for instance Zohary & Hopf, 2000), and it was argued that mating systems could partly explain differences in domestication rates between the Old and the New Worlds (wheats/barley vs maize; see Diamond, 1999). However, these ideas have not really been tested. Grasses present a wide range of breeding systems (Connor, 1981), and, while rather few outcrossing rate estimates are available, they are expected to be strongly bimodal according to the distribution found in anemophilous species in general (Vogler & Kaliz, 2001). Grasses are thus ideal for comparisons of the effects of strongly contrasted mating systems on the domestication process.
1. Mating systems and the tempo and mode of domestication
Theoretical predictions As discussed above, a key step in the domestication process is the selection, and eventually fixation, of domestication alleles. If alleles enabling domestication are mainly recessive (a point now much debated; see VI.2), we expect domestication rates to strongly depend on mating systems. In outcrossing populations, recessive alleles can be selected for only once they reach sufficiently high frequencies to be revealed in homozygotes. In large populations, the probability of fixation of a single recessive mutation is thus extremely low, about , where s is the selection coefficient in favour of homozygotes and N the population size (Kimura, 1962). By contrast, in selfing populations, the probability of fixation is independent of the dominance level, and is equal to s (Charlesworth, 1992). Advantageous mutations also go to fixation more quickly in selfing than in outcrossing populations; the time to fixation may even be reduced by an order of magnitude (Damgaard, 2000). Initial domestication steps should thus be easier and faster in selfing than in outcrossing species. However, the picture is less simple if adaptation proceeds from standing variation attributable to (possibly slightly deleterious) alleles already available in the wild populations. In contrast to the case of a single mutant, adaptation from standing variation is easier for recessive than for dominant mutations, simply because recessive mutations are expected to be found initially at higher frequencies (Orr & Betancourt, 2001).
Because of disruptive selection between wild and cultivated populations (Fig. 3c), gene flow from the wild is expected to bring genotypes that are poorly adapted to cultivation (Lenormand, 2002). In particular, once a suite of domesticated alleles has been selected from the wild (through mutants or deleterious variants segregating at low frequency), further gene flow would recurrently bring wild maladapted alleles, limiting the efficiency of human selection. In that context, selfing also ‘protects’ cultivated populations from (pollen) gene flow from wild populations or from different cultivars (Zohary & Hopf, 2000).
Empirical evidence According to theoretical predictions, we expect to find more selfing species among domesticated grasses than among wild species. Indeed, among major cereals, only maize, rye, and pearl millet are outcrossing, while wild ancestors of Asian and African rice (Oryza rufipogon and Oryza barthii, respectively) have mixed-mating systems. However, cereals are annual and many annuals are self-fertilizing in grasses (Barrett et al., 1996). So, it is not clear if domesticated grasses are truly exceptional as far as mating systems are concerned or merely reflect the association between annuality and selfing.
We compared the distribution of mating systems in domesticated and wild species using information available in the literature (see Table S3 for the full list). Quantitative estimates of selfing rates reveal a shift towards selfing from wild perennials to wild annuals, as generally documented in seed plants (Barrett et al., 1996), and from wild to domesticated annuals (Fig. 4). Using a rougher characterization of mating systems (outcrossing/mixed-mating/selfing) this picture is strengthened, and the difference is statistically significant (P = 0.0077, Fisher's exact test; Fig. 4). This simple analysis supports a bias towards selfing among domesticated species. However, we suggest that the fact that almost exclusively annual species were domesticated also explains the prominence of selfing species among domesticated cereals. Extension of this analysis to other groups, such as legumes, should bring sufficient statistical power to confirm or refute our conclusions in grasses.
Another prediction regards domestication rates. Selfing species should have been domesticated earlier and more rapidly than outcrossing ones. This hypothesis is difficult to assess because too few outcrossing grasses have been domesticated, and domestication dynamics are poorly known except for major crops. The first grasses domesticated, c. 10 000 BP, are self-fertilizing (wheats and barley). However, maize (outcrossers) and Asian rice (selfer) were both domesticated a bit later, c. 8000 BP. Within each centre of origin, the order of domestication also seems independent of mating systems. In Mesoamerica, maize was domesticated before common bean (Phaseolus vulgaris) (selfers; 4000 BP); in Africa, pearl millet (outcrosser) was domesticated c. 3000 BP, after sorghum (selfer; 4000 BP) but before African rice (selfer; 2000 BP) (dates reviewed in Doebley et al., 2006). Finally, the fact that multiple domestications occurred in most selfing species but not in outcrossing ones supports the idea that domestication should be easier in selfing species.
2. Mating systems and the genetic architecture of domestication traits
Origin and dominance of domestication alleles Mating systems should also affect patterns of dominance and linkage among traits selected during the domestication process. Most domestication alleles are thought to be strongly deleterious in the wild. In selfing populations, such alleles are maintained at very low frequencies because of efficient purging (Ohta & Cockerham, 1974). Selection should thus proceed from new mutations. In selfing species, any type of mutation could be equally selected for, which seems compatible with the variability of dominance levels detected at the gene and QTL levels (see Table 1 and Ross-Ibarra, 2005; Burger et al., 2008).
In outcrossing populations, selection of domestication alleles should mainly proceed from standing variation (a so-called ‘soft sweep’) or from new dominant mutations, because it is very unlikely to select for new recessive mutation (see VI.1). It has been argued that it is difficult to distinguish adaptation resulting from a single new mutation (a ‘hard sweep’) from adaptation resulting from standing variation (Orr & Betancourt, 2001; Innan & Kim, 2004; Przeworski et al., 2005). However, recent theoretical work suggests that soft sweeps may leave their own specific imprint on neighbouring neutral polymorphisms: under recurrent mutation, they are expected to leave a strong signature in patterns of linkage disequilibrium in the region around the sweep (Pennings & Hermisson, 2006).
In the outcrossing maize, Innan & Kim (2004) interpreted the weak signature of selection on domestication candidate genes involved in the starch (Whitt et al., 2002) and anthocyanin (Hanson et al., 1996) pathways as evidence of a ‘soft sweep’ from standing variation. More interestingly, in two major domestication genes involved in maize domestication –tb1, controlling apical dominance, and tga1, controlling the naked kernel trait – dominant or partially dominant alleles have been selected for (Dorweiler et al., 1993; Doebley et al., 1995, 1997). In pearl millet, another outcrossing species, the domesticated phenotype for plant architecture seems also to be dominant over the wild type (Poncet et al., 1998).
Clustering of domestication QTLs Mating systems should also affect the patterns of physical linkage among domestication genes. In a simulation study, Le Thierry d’Ennequin et al. (1999) predicted that the number of QTLs involved in domestication should be higher in selfing than in outcrossing species. Moreover, QTLs should be more tightly linked in outcrossing species. In maize, QTL (Briggs et al., 2007) and genome scan (Wright et al., 2005) approaches showed that a small number of genomic regions control several phenotypic traits involved in the domestication process and further selection. In pearl millet, major QTLs are also grouped (Poncet et al., 1998, 2000, 2002). However, this feature is not specific to outcrossers. QTLs have also been found to be grouped within a few genomic regions in rice (Cai & Morishima, 2002; Tan et al., 2008a,b), in wheat (Peng et al., 2003), and in barley (Gyenis et al., 2007), while some cases of apparent QTL clustering may be attributable to pleiotropic effects of individual genes.
Current data in grasses yield a mixed picture. Thanks to high-throughput sequencing technologies, we hope that it will soon be possible to perform genome scans to estimate the number and the distribution of domestication genes in many grasses and other important crop species. Comparative population genomics is a promising approach to address in much greater generality the question of the genetic effect of mating systems on the domestication process.
3. Domestication and the evolution of genetic systems
Finally, the domestication process may also affect the evolution of mating systems themselves, and genetic systems in general. An increase in selfing rates during domestication has been documented in several species. In both Asian and African rices, the mating system evolved from mixed-mating ancestors (O. rufipogon and O. barthii) to highly selfing domesticated species (O. sativa and Oryza glaberrima, respectively) (Caicedo et al., 2007; Sweeney & McCouch, 2007). Such a transition also probably occurred in finger millet between the wild (Eleusine africana) and the cultivated (Eleusine coracana) forms (Ganeshaiah & Umashaanker, 1982).
Conscious selection for selfing is hardly possible, but selfing could evolve under domestication for two reasons. First, selfing could be selected under directional selection because it increases additive genetic variance and thus the response to selection. This argument is similar to that for the evolution of recombination (Barton & Charlesworth, 1998), though there is as yet no formal model. Secondly, gene flow from wild or weed species recurrently introduces wild deleterious alleles into the cultivated pool. If selection and local adaptation are strong (which is the case for domesticated vs wild or weed species), it generates strong outbreeding depression, which will select for a higher selfing rate (Epinat & Lenormand, in press).
However, strong directional selection could also prevent the evolution of selfing rates that are too high. Despite the two-fold cost of outcrossing which should promote the complete fixation of selfing (Fisher, 1941), low outcrossing rates can theoretically be selected for in predominantly selfing populations under strong directional selection (David et al., 1993). This has been found in an experimental population of barley, which was constructed from intercrosses among 30 varieties and then put into cultivation according to agricultural practices (Kahler et al., 1975). The outcrossing rate, initially very low (0.57%), reached 0.88% 11 generations later and 1.24% 20 generations later (P < 0.001). More generally, rapid evolutionary changes and strong directional selection are expected to select for increased recombination rates during the domestication process irrespective of the underlying genetic basis of cultivation alleles (Burt & Bell, 1987; Otto & Barton, 1997). This theory has been confirmed for some crops, which show an excess of chiasma frequencies compare with wild species (Ross-Ibarra, 2004). Most cereals offer a good example of this evolutionary change under domestication (Fig. 5). Eleven out of 13 species exhibit an increase in chiasma frequency from the wild progenitor, by up to 40%.
The advent of genome-wide studies of nucleotide polymorphism in a series of grasses will yield unprecedented amounts of comparative data with which to revisit some of the questions outlined in this review (see for instance the recent publication of the sorghum genome in Paterson et al., 2009). Beyond dissecting the specific story of a given species, more general understanding of the factors affecting the domestication process (e.g. mating systems) could be achieved through comparative domestication approaches. In particular, the rice genome can now be used as a source of candidate genes for domestication in a host of domestic grasses with much more complex genomes such as polyploid wheat or barley. The hypothesis of convergence at the gene level during domestication could now be revisited in a lot more detail.
Some progress has been made in using current patterns of polymorphism within and between wild/cultivated grasses to unravel the evolutionary history of grasses under domestication. However, making sense of the wealth of empirical comparative and population genomics data will require careful model-based approaches (likelihood or Bayesian) combining more realistic demographic models and incorporating independent information from archaeological data (e.g. Allaby et al., 2008; but see Ross-Ibarra & Gaut, 2008). In this respect, a hierarchical modelling approach would allow the specification of both a historical scenario with a given mating system and a background level of negative selection affecting the whole genome as well as the incorporation of the local effects of directional selection (through hard and soft sweeps). This will hopefully be the way to resolve the current paradox where even fairly different evolutionary scenarios can hardly be distinguished based on the data (e.g. Caicedo et al., 2007). It will also shed more light on the relative importance of new versus standing variation. For instance, reanalysis of large population genomics data sets with the aim specifically of distinguishing among neutrality and hard and soft sweeps, in relation to dominance level analyses, may hold the promise of a better understanding of the processes of selection during domestication. Although progress has been impressive for simple traits such as qualitative differences at the phenotypic level, the challenge now will be to achieve the same degree of understanding for more quantitative traits (although some progress has been made regarding flowering phenology). The model-based population genomics approaches discussed above should be a starting point to solve this difficult issue.
Last but not least, a major challenge in the application of this evolutionary knowledge will be to use it to target efforts to introgress new genetic variation in the gene pools of elite cultivars, which are often lacking variation, and adapt these to a wide range of agrosystems and new challenging growing conditions.
This is publication ISE-M 2009-041. TB acknowledges financial support from a Steno fellowship provided by the Danish Research Council (FNU). We thank J. David and Eva H. Stukenbrock for helpful discussions and comments on the manuscript.