Plant domestication: a model for studying the selection of linkage


M. Le Thierry d’Ennequin Laboratoire Evolution et Systématique, UPRESA-CNRS 8079, Université PARIS XI, Bâtiment 362, 91405 Orsay cedex, France. Tel: +33 1 69 15 72 22; fax: +33 1 69 15 73 53; e-mail:


The process of domestication leads to acquisition of traits that are often similar between plant species that belong to the same family but have different breeding systems. Hence domestication is a useful model for studying evolutionary responses to selection in plants with contrasting breeding systems. We consider a stochastic model simulating gene flow between a natural population and an initial population containing mutants with domesticated phenotypes at low frequency. We assume that a large number of loci contribute equally to the cultivated phenotype. Our results indicate that the number of loci for which the mutant (‘domestication’) allele is maintained is larger in autogamous plants than in allogamous ones and that domestication can lead to the selection of tightly linked combinations of genes in allogamous plants. This work provides a general model for the selection of gene clusters through a sieve effect and it is discussed in comparison with models proposed to explain the evolution of linkage of genes determining wing patterns in butterflies exhibiting Batesian mimicry.


Domestication is characterized by the acquisition of traits which are adaptations to cultivation by humans and which would be rapidly eliminated in a natural environment. These traits define the domestication syndrome. In cereals, domestication traits often correspond to loss of function ( Heiser, 1988), such as loss of natural dispersal mechanisms (shattering, shedding ability), seed dormancy, or natural means of protection (decrease of involucre bristles, decrease of the upper floret lemma, poorly developed barbs and glumes, soft glumes). However, some traits also result from the gain of new developmental pathways, e.g. apical dominance in maize ( Doebley et al., 1997 ).

Cultivated plants were originally domesticated from wild-type progenitors genetically very similar to the domesticated forms, as shown by DNA and protein polymorphism studies ( Koenig & Gepts, 1989; Velasquez & Gepts, 1994). Despite high gene flow between the two forms, particularly in allogamous plant complexes, strong morphological differentiation is maintained. This differentiation has been carefully investigated in a few cases. For example, the dramatic differences between the architecture of the female inflorescence in maize and its wild-type relative, teosinte, has been attributed to a limited set of genes or QTLs, involving as few as five genomic regions ( Doebley et al., 1990 ; Szabo & Burr, 1996). In pearl millet, as in many cereals, the spikelet structure, including shedding ability and seed-coating, is a key trait of the domestication syndrome. The analysis of several progenies from wild-type × cultivated crosses ( Joly, 1984; Marchais & Tostain, 1985; Poncet et al., 1998 ) has shown that the main traits involved in spikelet structure exhibit simple Mendelian inheritance (mono- or oligogenic), the domestication allele being recessive. Furthermore, these traits map to a few (two maximum) linkage groups ( Poncet et al., 1998 ).

Pernès (1986) proposed that clustering of loci in allogamous plants could prevent introgression of wild-type alleles, allowing the maintenance of the cultivated phenotype. Linkage could evolve under disruptive selection on groups of genes coadapted, respectively, to a natural and a man-made habitat, the recombinants being deleterious in both environments. A similar result of selection against recombinants can be illustrated in the case of heterostyly in Primula ( Mather, 1950): the seven loci controlling the traits associated with heterostyly are linked in a supergene ( Kurian & Richards, 1997).

Two processes to explain the evolution of linkage have been proposed (see Lawrence & Roth, 1996). The first hypothesis is that loci were at first unlinked and then brought close together after the polymorphism arose; several mechanisms could bring genes closer together, including chromosomal rearrangements or selection for decreasing crossing-over rates via genes controlling recombination ( Fisher, 1930). The second hypothesis is that loci were tightly linked from the beginning. We explore this second hypothesis and consider a case of domestication with multiple ways of inactivating wild-type trait expression. We suppose that it is possible for a complex trait such as adaptation to man-made habitats to be coded by previously linked loci. Relative positions of these loci were inherited from the wild-type progenitor. We focus on the combinations of domestication loci that are finally retained by selection from all possible combinations of loci.

The question we assess is: given a high number of potential loci to be selected, what is the impact of the reproductive mode on the selection for linkage of genes finally involved in the phenotypic divergence between cultivated and natural populations? We modelled a sympatric situation of a completely wild-type population under natural selection and a population under cultivation which contains domestication alleles at low frequencies. We considered a sufficiently high number of loci so that clusters occur on the simulated genetic map. We tested the influence of several parameters (migration, drift and reproductive mode) on domestication success, on the number of selected domestication genes and on their localization on the simulated genetic map.


Genetic map

We simulated a map containing both linked and unlinked loci that determine each domesticated trait. We assumed 100 loci coding for four domesticated traits (25 loci each) distributed over a simulated genetic map of 1000 cM (equivalent to the order of magnitude of length observed in several grasses). The position of each locus on the entire genome was randomly assigned using a uniform distribution. The genetic map was then separated into five chromosomes of 200 cM each. Therefore, chromosomes vary by chance in their number of loci. One genetic map was used for all the simulations. The assumptions made to generate this map are not critical to our results, as long as linked combinations of loci coding for the domesticated traits can be selected. This is supported by the fact that employing another randomly drawn genetic map gave similar results (data not shown).

Determination of the domesticated phenotype

A completely domesticated type was assumed to depend on the acquisition of the four domesticated traits. Since such domesticated traits usually correspond to loss-of-function mutations, they are recessive and can be generated by a mutation at any point along a complex pathway involved in the function. Thus, a homozygous mutant for at least one of the 25 genes coding for a particular trait will have the corresponding domesticated phenotype. We assumed complete epistasis for loci involved in a particular trait.

Selection and reproduction

We adapted the Metapop software ( Fig. 1) as described by Le Corre et al. (1997 ).

Figure 1.

 Simplified scheme of the program ‘M ETAPOP’ used in this model. Two sympatric populations connected by pollen migration are considered. The probability for each individual to be chosen as a seed parent (mother) is equal to its fitness (w). If an individual is produced by selfing (probability s), both mother and father have the same genotype. If an individual is produced by outcrossing (probability 1 – s), the population of the pollen parent is randomly drawn (probability (1 – m) for the resident population and probability (m) for the migrant population). Parents are then chosen by random draw and gametes are formed by Mendelian segregation by taking into account the occurrence of crossing-over.

We considered two sympatric populations with the same number of individuals (N=100, 200, 1000 or 2000 individuals): a natural population and a population under cultivation, i.e. a population on which human selection for domesticated types was applied. In the population under cultivation, selection acted on the maternal characteristics of plants since human selection corresponds to a choice of the most valued plants as seed parents for the next generation.

In the cultivated population, the fitness (w) of each mother was proportional to the number of domesticated traits it possessed (fitness of 0 if it had none of the four domesticated traits (wild type individual), 0.25 if it had one of the four domesticated traits, 0.5 if it had two and so on, up to 1, which corresponded to the maximum fitness value). We thus considered the first stages of domestication in which individuals with at least one domestication trait were selected in the cultivated population.

In the natural population, an individual that had a cultivated phenotype for one of the four domesticated traits had a fitness value (w) of 0; thus domesticated traits were strongly selected against in this population as they represent loss of some important function. Mutations leading to soft glumes, for instance, would be highly deleterious because they would render the kernels vulnerable to predation ( Culotta, 1991). A wild-type individual had a fitness of 1 in the natural population.

In each population, the seed parent of each individual of the next generation was randomly drawn from a pool of potential mothers in the absence of seed flow. The probability for each individual to be chosen as a seed parent was equal to its fitness (w). We determined by random draw whether or not an individual was produced by selfing (s, being the selfing rate). If it was produced by outcrossing (probability (1 – s)), we chose by random draw the population of the pollen parent (probability m for migrant pollen and (1 – m) for resident pollen, m being the migration rate). The father was then randomly drawn (without selection) from its population.

Once parents were chosen, gametes were formed by Mendelian segregation and by taking into account the occurrence of crossing-over. For each parent we drew randomly which of the two chromosomes (with an equal probability of being chosen of 0.5) gave the allele at the first locus of the gamete. The simulated genetic map gave the probability of a crossing-over event between adjacent loci. This probability was used to choose at random whether the allele at the subsequent locus belonged to the same chromosome as the allele at the previous locus. No interference in crossing-over occurrence was assumed. We iterated this procedure all along the simulated genetic map. The genotype of each individual of the next generation was formed by pairing one gamete of each of its parents.

Genetic exchanges

Crosses between cultivated and wild-type plants were influenced by two parameters: the outcrossing rate and the pollen migration rate which was assumed to be symmetrical between populations. Values chosen, respectively, for these two parameters were: 1, 2, 20 and 100% for the outcrossing rate and 5, 10 and 50% for the migration rate. High values of migration rate were chosen in order to take into account one of the possible scenarios for cereal domestication under which cultivated fields were planted in the vicinity of gathered wild populations. The effective migration rate was the product of both parameters.

Initial conditions

In this study, we did not consider mutation events and simulated the beginning of selection in cultivated populations in which alleles relevant for the domesticated traits were already present. The initial frequency of cultivated alleles was zero in the natural population and 5% at each locus in the cultivated population. We chose a high initial frequency of cultivated alleles since when initial frequency was lower at the very beginning of the simulations, the intensity of selection for increasing frequency of cultivated alleles was too weak to prevent them to be lost by drift.

Simulation outputs

For each parameter set of migration rate, outcrossing rate and population size, 30 simulations were performed. Simulations gave the mean population fitness and the frequency of each allele in each population after pollen migration. Mean population fitness and standard deviations were calculated from the 30 repeats in each population after pollen migration. Simulations were stopped at 10 000 generations, which approximately corresponds to the age of the most ancient archaeological evidence of cereals. At the end of the simulations we evaluated the average number of domestication loci (ANDL) for which cultivated alleles were not eliminated from the cultivated population (domestication genes responsible for the differentiation between both populations).

The intensity of selection in favour of the domestication allele at a given locus was inversely proportional to the frequency of domestication alleles at other loci coding for the same trait; once a locus was selected, cultivated alleles at the other loci coding for the same trait behave as quasineutral and were lost by drift. Hence the expected number of domestication loci at equilibrium (those ultimately involved in the domestication syndrome) was four (one per trait) in a fully cultivated population.

The mean time (MT) necessary to reach this equilibrium was evaluated. If equilibrium was not reached, MT was set to 10 000 generations. We estimated the strength of linkage between pairs of four adjacent domestication loci coding for the four traits. Linkage was estimated through the rate of parental gametes produced by a hypothetical (domesticated × wild-type) F1 hybrid genotype heterozygous for the domestication loci. Considering four loci (labelled and ordered consecutively 1, 2, 3, 4) involved in the domestication syndrome, with recombination rates r1–2, r2–3, r3–4 between adjacent loci, the rate of expected parental gametes corresponding to the product of probabilities of nonrecombination between loci, was: (1 – r1–2) × (1 – r2–3) × (1 – r3–4). For instance, in case of four independent loci, i.e r1–2=r2–3=r3–4=0.5, the rate of expected parental gametes production was (0.5)3=0.125. In the case of more than four domestication loci (ANDL > 4), we evaluated the mean rate of parental gametes produced by all potential hybrids heterozygous for four domestication genes coding for the four traits. In the case of fewer than four domestication loci (2 or 3) we did not calculate the rate of parental gamete production.

For ANDL, MT and the rate of parental gamete production, mean values, standard errors or standard deviations were calculated for each set of 30 simulations. We used the SAS ( SAS, 1990) VARCOMP procedure to estimate the components of variation of the rate of parental gamete production explained by the three fixed effects: population size, migration rate and outcrossing rate, using the SAS-type1 method.


Fitness evolution

The dynamics of fitness in the cultivated population was computed over the first 500 generations. Influence of reproductive mode and migration rate is presented in Fig. 2. The increase in fitness over generations was slower for allogamous plants than for autogamous plants, for which a maximal fitness value of 1 was rapidly reached. Increasing effective migration rate decreased the rate of increase in fitness.

Figure 2.

 Influence of outcrossing and migration rates on fitness dynamics. The increase in fitness is more rapid for autogamous plants. Differences observed between allogamous and autogamous plants become higher for high values of migration.

Number of loci ultimately involved in the domestication syndrome

Mean time (MT) and average number of domestication loci (ANDL) were influenced by all factors (population size, outcrossing rate and migration rate). A decrease in outcrossing and migration rates resulted in an increase of MT (Table 1) and ANDL ( Fig. 3), whereas a decrease in population size had the opposite effect. With very high effective migration rate (50%), an equilibrium with only two or three domestication loci was sometimes reached, particularly for small population sizes (100 or 200 individuals). For a population size of 100 and an effective migration rate of 40%, such a situation was reached in 57% of 30 simulations (data not shown).

Table 1.   Impact of migration rate, outcrossing rate and population size on the time necessary to reach equilibrium. Thumbnail image of
Figure 3.

 Impact of migration rate, outcrossing rate and population size on the number of domestication loci at equilibrium. Standard errors have been indicated with bars.

Loci clustering

Results of the VARCOMP procedure (Table 2) revealed that in the range of parameters values used in simulations, variation in outcrossing rate was the most important factor influencing the physical organization of the domestication loci on the chromosome (71.5% of the variance explained), while migration rate explained only 7.4%. The effect of both factors on the percentage of parental gametes produced is shown in Fig. 4. An increase in effective migration rate tended to favour clustering of domestication loci, i.e. an increase in the rate of parental gamete production. Figure 5 illustrates the range of combinations of domestication loci that were selected from all possible combinations for different values of outcrossing rate. We find that combinations generating high values of rate of parental gamete production were preferentially selected in allogamous but not in autogamous plants. However, standard deviations on the rate of parental gamete production were high, particularly in allogamous plants (Table 3).

Table 2.   Percentage of variance on the rate of parental gamete explained by each factor (population size, migration rate and outcrossing rate). Thumbnail image of
Figure 4.

 Impact of outcrossing and migration rates on the rate of parental gamete production. The lower the outcrossing rate and the lower the migration rate, the lower the proportion of parental gametes.

Figure 5.

 Largesse of the simulated genome and its use by the sieve process as a function of outcrossing rate. The largesse of the genome illustrates the fact that a high number of loci (25) can equally contribute to a given domesticated trait. To estimate the potentiality offered by the largesse of the simulated genome, we calculated the rate of parental gamete production by all potential hybrids heterozygous for four loci, each one coding for one of the four traits. The distribution represents the frequency of the proportion of parental gamete production. The sieve operates by selecting particular genetic combinations of loci from all possible combinations. 95% range of variation and mean value of the rate of parental gamete production for each condition of outcrossing rate (1, 0.2, 0.02, 0.01) are indicated on the figure.

Table 3.   Standard deviations on the rate of parental gamete production. Thumbnail image of


Domestication success

The time necessary to reach the equilibrium was influenced by the initial conditions. As we chose a high initial frequency of cultivated alleles (5% at each locus), we could infer that we gave under-estimations for the mean time necessary to reach the equilibrium (MT). MT should then be interpreted as a relative measure of domestication time for the different parameter values such as outcrossing rate.

The time necessary to reach the equilibrium was largely influenced by reproductive mode (Table 1). This was mainly due to strong differences in homozygosity depending on the outcrossing rate. At the beginning, all individuals homozygous for at least one of the domesticated traits were selected in the cultivated population. In autogamous plants, the proportion of heterozygotes decreased by half at each generation, a fitness value of 1 was rapidly reached ( Fig. 2) and all individuals acquired a cultivated phenotype. In contrast, in allogamous plants, the maximal fitness value was never reached because effective migration continuously introduced wild-type alleles from the natural population. Therefore, domestication in autogamous plants may be faster than in allogamous ones, as demonstrated in a one-locus model by Hillman & Davies (1990).

For small population sizes and high migration rates, only two or three domesticated traits were retained and individual plants did not acquire the complete cultivated phenotype. This suggests that the domestication process that produced today’s cereals could be more efficient in a single population under favourable conditions, namely either large or isolated, at least temporarily, from wild-type relatives. Alternatively, independent domestication events may have occurred in different places, resulting in fixation of only some of the domestication loci. Hybridization between populations carrying fixed domestication loci, coding for different traits, could then have lead to the acquisition of the complete domesticated phenotype as proposed by Sharma & Waines (1980).

Number of loci determining the cultivated phenotype

The expected equilibrium was the fixation of the cultivated allele at only one of the 25 loci coding for each domesticated trait in the cultivated population. We reached this situation except for autogamous plants for which the number of loci underlying the domestication syndrome was higher than in allogamous ones ( Fig. 3). At equilibrium, autogamous plants had cultivated alleles at several loci coding for the same domesticated trait, whereas allogamous plants had only one locus with a domestication allele for each trait. In the quasi-absence of gene flow (highly selfing plants or very low migration rate), selection in favour of the domestication alleles in the cultivated population was poorly counteracted by migration from the natural population. Which of the genotypes of highest fitness was present at the end of the simulations was mainly determined by drift. In this situation, the time to reach equilibrium was very long (more than 10 000 generations) but a fully domesticated phenotype was obtained very rapidly. In contrast, in the presence of gene flow (allogamous plants, high migration rate) recombination with migrant gametes from the natural population generated variable offspring. Which haplotype remained at the end of the simulation was determined by the action of three forces: drift, selection in favour of domestication alleles and against recombinants, and migration which imported wild-type alleles that had been selected in the natural population. The intensity of selection against wild-type traits in the cultivated population depended in this case on the rate of immigration. In allogamous plants the final number of domestication loci was generally four. However, in some cases we found situations with more than four domestication loci. In large populations of allogamous plants (1000 and 2000 individuals), and for a migration rate of 5%, ANDL was 4.2 and 4.3, respectively ( Fig. 2). Studies on pearl millet have indeed shown that the number of loci involved in variation of the abscission layer determining the ‘shedding ability’ varies between one and two, depending on the wild-type × cultivated crosses realized ( Joly, 1984; Poncet et al., 1998 ).

Given that selfing plants acquire domestication alleles at several loci, various types of organization of domestication genes might occur. A study characterizing QTLs of domesticated traits in rice, maize and sorghum ( Paterson et al., 1995 ) found that some of the QTLs shared corresponding positions in all three species. These authors proposed that the number of possibilities underlying the cultivated phenotype is limited among cereals and that different evolutionary pathways have given rise to a convergent domestication of cereals. However, as these common loci explain less than 50% of the variance, these results may also indicate that domestication led to selection of different genes in these species. Furthermore, several studies on the inheritance of key traits distinguishing maize and teosinte have revealed an inconsistent localization of genes and QTLs ( Szabo & Burr, 1996). An hypothesis for these discrepancies in QTL correspondence could be that independent domestication events among or even within species could lead to selection of different genetic bases for domestication. The number of resulting genetic configurations was limited in allogamous plants by both the number of selected loci and the selection in favour of closely linked gene clusters. Hence, our results suggest that the homology of domestication genes among or within species, if it exists, could be greater among allogamous species, and that cases of synteny could be more frequent. A perspective for improving our model is to introduce mutation. Notably, the dynamics of the appearance of the mutations in the natural population could limit the number of resulting genetic configurations, and provide a better explanation for the known cases of gene synteny ( Devos & Gale, 1997).

Clustering of genes

The resulting genetic configuration of the domestication loci was mainly influenced by the breeding system ( Fig. 4). Although the standard deviations were high (Table 3), suggesting that our data should be interpreted with caution, we observed that tight linkage of domestication loci (including two to four linked loci) was produced in outcrossing plants. This finding is consistent with experimental results on the clustering of domestication genes in pearl millet ( Joly, 1984; Poncet et al., 1998 ) and maize ( Doebley et al., 1990 ), and with results of previous models investigating the impact of linkage among three biallelic loci (coding for three domesticated traits) on the maintenance of the domesticated phenotype. An analytical model ( Slatkin, 1995) showed that tight linkage between ‘plus’ alleles favours a locally stable polymorphic equilibrium, but that even a very low rate of immigration can prevent this equilibrium from being reached. However, this model assumed complete epistatic interactions among loci. In our case, the model is additive regarding all domesticated traits and weak immigration can be overcome by selection.

These results could help to explain why only few grass species have been successfully domesticated ( De Wet, 1979), suggesting that genome organization of plant species may have contributed to the success of their domestication. The relative importance of a lack of gene clustering in determining the success of domestication remains to be compared with other ones such as assortative mating ( Robert et al., 1992 ).

Plant domestication and the ‘sieve’ hypothesis in Batesian mimicry

Our model of selection of linkage can be compared with other cases of gene clustering, such as the evolution of supergenes in Batesian mimicry in butterflies for which successful mimicry requires a complete resemblance with the model. Different elements of the mimicry supergene affect different aspects of the phenotype (e.g. presence/absence of tails, wing patterns). Traits involved in Batesian mimicry have high heritabilities ( Monteiro et al., 1997 ) and must have involved changes at only a small number of loci ( Sheppard, 1962; Turner, 1977; Charlesworth, 1994). Similarly, we consider only four traits that govern the entire domestication syndrome (these traits being supposed to have high heritabilities), so changes at only four loci are required to produce the domesticated phenotype. Predation pressure operates as a selective factor in Batesian mimicry. Recombinants produce inappropriate combinations of the various components of the mimetic patterns and are counter-selected.

One hypothesis proposed to explain the clustering of genes in Batesian mimicry is called the ‘sieve’ hypothesis ( Turner, 1977). For the sieve to operate, there must be a high number of loci all capable of producing the required adaptation (the mimicry). This capacity has been called by Turner (1977) ‘the largesse of the genome’ ( Fig. 5) and implies that similar pattern elements can be produced by different genetic mechanisms ( Mallet & Barton, 1989).

We propose that in allogamous species, the absence of potential clusters of loci coding for the different selected phenotypes could preclude the evolution of the domestication syndrome. Similarly, such a genomic organization could prevent the evolution of multiple mimetic forms in certain species of butterflies.


Our model suggests that physical linkage of loci involved in the domestication syndrome might arise by selection among different genotypes producing the same phenotype through loss of function. In allogamous species, high levels of recombination and generation of new combinations of alleles may lead to situation where alleles at tightly linked loci coding for different domesticated traits could be transmitted preferentially to offspring. In contrast, in autogamous species, domestication gene organization remained selectively neutral. The sieve effect could thus be responsible for the origin of linkage between domestication genes in allogamous species. However, once this situation exists, linkage could be reinforced by a Fisher (1930) effect.


We thank N. Machon for her valuable help in starting this study; F. Hospital and Y. Michalakis for their useful comments; J. A. Shykoff, A. Sarr, R. E. Michod, O. Kaltz and anonymous referees for improving the manuscript. Most of simulations were done on a SP2 computer at the Centre de Ressources Informatiques of the Université de Paris-Sud (Orsay).


  1. The two first authors have equally contributed to this work.