Maize Sh2 gene is constrained by natural selection but escaped domestication


D. Manicacci, Station de Génétique Végétale, UMR INRA-UPS-CNRS-INAPG, Ferme du Moulon, F91 190 Gif sur Yvette, France.
Tel.: 33 1 69 33 23 61; fax: 33 1 69 33 23 40;


In Zea mays L., we studied the molecular evolution of Shrunken2 (Sh2), a gene that encodes the large subunits of a major enzyme in endosperm starch biosynthesis, ADP-glucose pyrophosphorylase. We compared 4669 bp of the Sh2 coding region on 50 accessions of maize and teosinte. Very few nucleotide polymorphisms were found when compared with other genes in Z. mays, revealing an effect of purifying selection in the whole species that predates domestication. Additionally, the comparison of Sh2 sequences in all Z. mays subspecies and outgroups Z. diploperennis and Tripsacum dactyloides suggests the occurrence of an ancient selective sweep in the Sh2 3′ region. The amount and nature of nucleotide diversity are similar in both maize and teosinte, confirming previous results that suggested that Sh2 has not been involved in maize domestication. The very low level of nucleotide diversity as well as the highly conserved protein sequence suggest that natural selection retained effective Sh2 allele(s) long before agriculture started, making human selection inefficient on this gene.


The evolution of intraspecific molecular diversity of genes involved in morphological variation is of major importance to understand the history and adaptation of a species. The pattern of molecular diversity of particular genes is the result of complex interactions between different evolutionary forces such as selection, genetic drift and migration among populations. Most data of intraspecific nucleotide variation have been obtained in human or Drosophila species. During the past decade, however, an increasing amount of plant nucleotide diversity has been described, mainly in Arabidopsis and maize (see Ching et al., 2002; Nesbitt & Tanksley, 2002; Yoshida et al., 2004; Henry et al., 2005; Nordborg et al., 2005; Schmid et al., 2005; Wright et al., 2005; Wright & Gaut, 2005 for a review and references therein). Although most of these studies focused on nonrandom candidate genes, only some of them showed clear-cut evidence of selection, for genes involved in resistance to pathogens or herbivores (Stahl et al., 1999; Tian et al., 2002), flower development (Olsen et al., 2002), geographical adaptation (Morrell et al., 2003) or domestication (Wang et al., 1999; Tenaillon et al., 2004). In these studies, molecular signs of selection are deduced from values of the amount of diversity, frequency distribution of polymorphisms or haplotype frequency that are unexpected under neutrality hypothesis. However, all these parameters are also strongly affected by demography, population structure or nonrandom sampling that may affect molecular diversity at the genome level. Unambiguous identification of selective events has been made possible in species where numerous genes have been surveyed and compared, such as Arabidopsis thaliana (Aguade, 2001; Kuittinen et al., 2002; Nordborg et al., 2005; Schmid et al., 2005) and maize (Whitt et al., 2002; Tenaillon et al., 2004; Wright et al., 2005).

In Zea mays, where up to 50 loci have been surveyed, nucleotide polymorphism appears to be (i) very high when compared with other species, including plants, animals and humans (Buckler & Thornsberry, 2002), and (ii) lower in cultivated maize than in its wild relative Z. mays ssp. parviglumis, as a result of both selection and demographic bottleneck during domestication (Tenaillon et al., 2004; Wright et al., 2005). Selection has been shown to affect 4–20% of these genes during domestication, including Tb1, Ts2, D8, Y1 and Bt2 (Wang et al., 1999; Whitt et al., 2002; Tenaillon et al., 2004; Wright et al., 2005). Indeed, domestication is a particularly favourable context to reveal selective events as it occurred recently, i.e. around 10 000–5000 years ago (Iltis, 1983; Doebley, 1990a; Matsuoka et al., 2002), led to rapid and drastic modification of phenotypic traits such as plant architecture or grain weight, and may easily be studied through the comparison of present wild and cultivated forms. The promoter region of Tb1, a gene that affects plant architecture and flower sex, is one of the most striking examples of selection and reduced diversity resulting from maize domestication as almost no variability remains among cultivars whereas teosintes are strongly polymorphic (Wang et al., 1999; Clark et al., 2004).

Among all morphological traits that differ between the wild and domesticated forms in Z. mays, kernel weight shows both a strong variation between wild and cultivated forms and quantitative variability within maize. Endosperm weight and composition are important factors of both wild teosinte fitness, as endosperm resources are directly involved in seedling growth, and maize yield, kernel weight being one of the main criteria for human domestication and selection. Kernel weight mainly depends on the ability of plant to accumulate starch (80% of mature endosperm weight) in its endosperm. Several studies provide evidence that Shrunken2 (Sh2) affects seed weight through starch content in maize kernel endosperm. Sh2 encodes the large regulatory subunits of the heterotetrameric ADP-glucose pyrophosphorylase (AGPase) present in endosperm, whereas the small subunits are encoded by the Brittle2 (Bt2) gene. AGPase catalyses the first step of starch synthesis in plants, i.e. the production of ADP-glucose that is polymerized into the two starch compounds amylose and amylopectin (Preiss, 1988). Evidence for the important role of this enzyme was first provided by mutants where starch deficiency is associated with the loss of AGPase activity (Tsai & Nelson, 1966) and shrunken/brittle phenotype. Quantitative variation of AGPase activity has also been reported in normally developed endosperm and correlated with starch content (Stark et al., 1992; Prioul et al., 1994; Giroux et al., 1996; Greene & Hannah, 1998). Several quantitative trait loci (QTL) mapping studies in controlled maize populations revealed that a major genetic factor for grain yield (Stuber et al., 1992), kernel starch content (Goldman et al., 1993) and amylose content (Sene et al., 2000) co-localizes with Sh2 gene on chromosome 3L, suggesting that Sh2 is a candidate gene for these trait variations among cultivated maize. Finally, in a population derived from a cross between teosinte and maize, Doebley et al. (1994) showed that a QTL for seed weight also co-localizes with Sh2. Overall, these studies highlight the important role that Sh2 diversity may play in determining phenotypic variation of fitness traits such as kernel weight and composition, both among maize cultivars and between wild and cultivated forms. An analysis of diversity pattern of several genes involved in starch biosynthesis pathway in maize inbred lines and teosintes from the parviglumis subspecies suggested that Sh2 nucleotide diversity has been shaped by selective constraints predating domestication (Whitt et al., 2002).

In order to evaluate the role of Sh2 precisely in adaptive processes, we assess the molecular diversity of this gene among both wild and domesticated forms of Z. mays. Our study includes two annual teosintes, the most closely related to cultivated forms, Z. mays parviglumis and Z. m. mexicana, as well as cultivated accessions from different levels of human selection such as landraces (i.e. traditional populations cultivated in central America) and highly selected inbred lines. The particular design of our genotype sample allowed us (i) to evaluate the extent and structure of molecular diversity in Sh2, (ii) to determine the extent to which this gene shows evidence for selective processes, as a result of natural selection in the whole species and/or artificial selection during domestication (comparing teosintes and landraces) or since domestication (comparing landraces and inbred lines), and (iii) to describe the history of Sh2 gene within Z. mays species using phylogenetic reconstruction. Based on a large data set of homologous sequences, we were first able to confirm the observation of Whitt et al. (2002) that Sh2 was submitted to selection only prior to domestication. Moreover, as our data were carefully checked for polymerase chain reaction (PCR) artefactual mutation and PCR-mediated recombination, we were able to analyse linkage disequilibrium along the gene and construct a phylogenetic tree of Sh2 alleles, allowing better insight into the understanding of the evolutionary history of Sh2 in Z. mays.

Materials and methods

Isolation and sequencing of alleles

Molecular diversity of Sh2 was studied by sequencing the gene in 33 inbred lines, seven landraces and nine teosintes. Inbred lines were chosen to be (i) representative of modern inbred line diversity belonging to different heterotic groups (15 of them are representative of Sh2 variability revealed by PCR-restriction fragment length polymorphism (RFLP) in a previous study, results not shown), and (ii) unrelated or with a low level of relatedness in order to limit the extent of linkage disequilibrium (LD) between long-distance loci. Our sample includes 20 dent lines (F113, F252, F271, F284, F292, F584, F608, F618, LH146, LH52, LH74, LH82, MBS847, MBS979, A654, B89, Co158, LH181, CM105, MO17), four flint lines (F2, F268, F283, F7), four dent/flint lines (Co255, F1852, F476, F7001), one tropical (CML239) and one floury (Coest6) lines, as well as RX01, RX02 and RX03 confidential lines belonging to Rustica Prograin Genetics. The analysis of 20 RFLP markers distributed on each of the 20 chromosome arms confirmed the absence of linkage disequilibrium in this sample (results not shown). The seven cultivated landraces are from Central and South America (ALT: Altiplano-Argentina, CIMMYT accession number 2753; CHA: Chapalote-Mexico, #861; CON: Confite Morocho-Peru, #8381; COR: Coroico-Bolivia, #6270; HAR: Hariñoso de ocho-Mexico, #2250; JAL: Jala-Mexico, #2215; TUX: Tuxpeño-Mexico, #679). The nine teosinte accessions belong to the two subspecies closely related to cultivars, i.e. Z. m. parviglumis [accession numbers 9477 (par1) and 11403 (par2) from Balsas-Mexico; par3: Jalisco (El Rodeo) provided by J. Doebley] and Z. m. mexicana (mex1: Chalco, #12823; mex2: Nobogame, #11393; mex3-mex6: Central Plateau, #8771, 11397, 11394 and 11396). Additionally two Z. m. huehuetenangensis (two alleles from Guatemala accession #9479) were included in some analyses as outgroup. These 16 traditional and 11 wild accessions, as well as outgroups from a species of the same genus Z. diploperennis (two alleles from Mexico accession, #9476), were provided by the CIMMYT, Mexico. The tetraploid sexually reproducing Tripsacum dactyloides (DNA provided by O. Leblanc; IRD-CIMMYT) was included as an additional outgroup.

From each accession, a single sequence of Sh2 gene was obtained, i.e. for potentially heterozygous landraces, teosintes and outgroups, we sequenced a single randomly chosen Sh2 allele, except for Z. m. huehuetenangensis and Z. diploperennis accessions where we sequenced two alleles from distinct individuals of the same population. DNA was isolated from young leaves as previously described (Causse et al., 1995). PCR primers were designed from the Sh2 sequence of Black Mexican Sweet (BMS) line (Shaw & Hannah, 1992) in order to amplify two overlapping fragments, referred to as Sh2-I and Sh2-II. These fragments were 2653 and 2455 bp in length, respectively, and altogether cover a total of 4658 bp. This includes exons 1 to 12 and part (30 bp) of exon 13, introns 1 to 12, and 1008 bp of putative promoter located 5′ upstream of exon 1 (referred to as 5′ upstream). Primers were designed as follows: 5′-CTGGGCAGGGAGAGCTAT-3′ (position −1008 in the reference sequence) and 5′-GGATATCAATAAGCCTGTAACAT-3′ (position 1623) for Sh2-I, and 5′-TGCAGCATTCTCAAACACAG-3′ (position 1197) and 5′-CGATGTTGCATTCTCTCAGTAA-3′ (position 3630) for Sh2-II. The three latter primers were chosen in exonic sequences (exons 4, 3 and 13 respectively) to maximize the PCR efficiency on DNA from unrelated materials. PCR amplifications were carried out in a total volume of 100 μL containing 0.3 μm dNTP, 1.9 μm Mg2+, 0.5 μm of each primer, 1.75 U of Expand High FidelityTM TAQ polymerase (Boehringer-Mannheim, Indianapolis, IN, USA) in its buffer, and approximately 100 ng of DNA, in the following conditions: 2 min predenaturation at 94°C, nine cycles with 30s at 94°C, 30s annealing decreasing 1°C per cycle from 60 to 52°C, 2 min 72°C, 25 cycles with 30s at 94°C, 30s at 51°C, 72°C elongation starting from 2 min for the first cycle and increasing 20s per cycle, followed by 7 min elongation at 72°C. PCR products obtained from inbred lines were purified on agarose gel using QIAex gel extraction kit (Qiagen, Valencia, CA, USA), and directly sequenced. As plants from the landrace populations, as well as teosintes, Z. diploperennis and T. dactyloides, are potentially heterozygous for the Sh2 locus, a single 4.6 kb fragment was amplified using both external primers (positions −1008 and 3630) with the same PCR conditions. These PCR products were then ligated into pGEM®-T plasmid vector (Promega, Madison, WI, USA) and used to transform subcloning efficiency DH5αTM chemically competent E. coli cells (Gibco BRL; Invitrogen, Carlsbad, CA, USA) according to the manufacturer's instructions. A single clone was randomly chosen for each accession, and DNA extracted using QIAprep Spin miniprep kit (Qiagen). All sequences were performed by a subcontractor (GENOME EXPRESS, Inc., Grenoble, France). The first sequence runs were obtained using the four PCR primers for inbred lines and the M13 universal primers for cloned fragments. The middle part of the fragments were sequenced step by step, using a gene-walking approach, and the following additional sequencing primers were designed as conserved among all Sh2 alleles : 5′-ATGTCCTGCACCTAGGGAGC-3′ (position−424), 5′-CCACCAGTATGCCCTCCTCA-3′ (position−58), 5′-GACCCTATTTGAACAAATCTT-3′ (position 852), 5′-GCAGCAACTCTAAGGTCTATTT-3′ (position 961), 5′-TTTGGGAGACTTCCAGTCAA-3′ (position 1503), 5′-CATCGTCCTCGACATGTTT-3′ (position 2365), 5′-GGAAAGCAGATTAGACCATAT-3′ (position 2486), 5′-TGGAAACTGGAAACAAAAACAAC-3′ (position 3098). All sequences were checked manually and edited according to electrophoregrams. For inbred lines, we observed no difference among overlapping sequence runs within each PCR fragment for all accessions, and sequences of the 406 bp overlap between fragments Sh2-I and Sh2-II were always perfectly identical. All singletons observed in inbred lines were verified by sequencing newly amplified fragments. Among more than 16 000 bp sequenced twice, no difference was observed. Singletons observed in potentially heterozygous landraces and teosintes were verified from new PCR amplification and cloning of the appropriate fragments. We then screened for the initially sequenced allele among three to eight clones and sequenced one clone for all regions showing singletons. Fifty-two per cent (22/42) of the initial singletons were found to be Taq polymerase errors. Sequences are accessible as GenBank DQ019876DQ019928.

Finally, sequences from potentially heterozygous materials were tested for PCR-mediated recombination that may generate hybrid molecules between both alleles (Cronn et al., 2002; Shafikhani, 2002). Verifications were performed for all three sequences that diverge from inbred sequences for polymorphisms other than singletons or nanosatellite-type mutations (e.g. variation in the number of T in a polyT), i.e. ALT, par2 and hue1 (see the Results section, Figs 1 and 4). PCR-mediated recombination was also checked for mex4 as this teosinte was known to be highly heterozygous from the singleton verification step. For each of these genotypes, we obtained 10 to 12 clones from the initial PCR and sequenced them for two to three runs along Sh2. For par2 and hue1, all clones were identical indicating that either these plants were homozygous or one allele was highly competitive during PCR. For ALT and mex4, we found two contrasted parental haplotypes and a single recombinant among mex4 clones, the mex4 sequence reported in the present paper bearing a parental haplotype. All together, we thus observed one PCR-mediated recombination over 46 clones. These results indicate (i) that our landrace and teosinte sequences do not include PCR artefacts and (ii) that PCR-mediated recombination, although rare, may occur when amplifying fragments of 4–5 kb under standard conditions.

Figure 1.

 Substitution (SNPs) and insertion/deletion (IDPs) polymorphisms of Sh2 among 52 alleles from different Zea mays. All haplotypes are given in comparison with haplotype 1 (Hap1) that includes the reference sequence from Black Mexican Sweet (BMS) line (GenBank accession number M81603). The positions of polymorphisms refer to this published sequence and are indicated at the top, base number 1 corresponding to the transcription start site. SNPs that change the amino acid sequence are indicated in bold. Bi-allelic IDP longer than one base are coded ‘I’ for insertion and ‘–’ for deletion. Their sequences are as follows: IDP-956: TG; IDP-830: TGAGAAA; IDP-580: TCACCTAT; IDP-491: ATTCAATA; IDP879: AAT; IDP2179: TT; IDP2457: AT; IDP3123: GTTTTTATTTA; IDP3534: AT. Some haplotypes correspond to more than one accession in some genotype categories, i.e. inbred lines, landraces or teosintes, as follows: Hap1 (eight inbreds) : BMS, F252, F268, F271, F608, LH52, MO17 and RX01; Hap5 (nine inbreds): Co255, F2, F283, F584, LH146, LH74, CML239, CM105 and RX02; Hap9 (two inbreds): F1852 and A654; Hap11 (two inbreds): F292 and Coest6; Hap13 (six inbreds): F7001, LH82, MBS979, LH181, MBS847 and B89; Hap15 (two teosintes): mex2 and mex6; Hap18 (three inbreds): F113, F476 and Co158.

Figure 4.

 A parsimony phylogenetic reconstruction (including SNPs and IDPs) of the 54 Zea sequences on Sh2 gene. Bootstrap values > 50% are given next to the corresponding nodes. This reconstruction excludes sites in the 3′-part of the sequenced region (from position 3398) that support an obvious recombination event. Outgroups Tripsacum dactyloides and Sorghum bicolor were not represented because they are very divergent from the Zea genus. Capitals refer to landraces and italics to teosintes. Outgroups are underlined italics.

The Sh2 sequence from Sorghum bicolor used as outgroup was obtained from GenBank (Chen et al., 1998) (GenBank AF010283).

Data analysis

All sequences were aligned, including the reference sequence from BMS line, either visually or using the Clustal multiple alignment procedure (Higgins et al., 1992) of Sequence Navigator® 1.0 software (Applied Biosystems, Foster City, CA, USA). Hereafter, base substitution among sequences will be referred to as SNPs, whereas 1 to several bp insertion or deletion polymorphisms will be referred to as insertion/deletions (IDPs). Polymorphism data were analysed using dna-sp software (Rozas & Rozas, 1999). Nucleotide diversity was estimated using (i) π, the average number of pairwise differences among sequences per nucleotide site (Tajima, 1983) and (ii) θW, Watterson's estimator of θ = 4Neμ per base pair (with Ne being the effective population size and μ the mutation rate) based on the number of segregating sites (Watterson, 1975). As π varies with allele frequency whereas θW only accounts for the number of polymorphic sites, both diversity indexes give different values if the distribution of allele frequencies is biased when compared with what is expected under the population genetics null hypothesis that includes selective neutrality, constant population size and no population structure. This hypothesis may thus be tested based on the comparison of π and θW through Tajima's DT statistics (Tajima, 1989). The statistics of Fu & Li (1993) produced very similar results and were thus not shown. Hitchhiking accompanying a selective sweep was tested using Fay and Wu's H-test on Z. mays with both Z. diploperennis sequences as outgroup (Fay & Wu, 2000), and performed with 5000 simulations at The significance of pairwise linkage disequilibrium (LD) among sites was tested using Fisher's exact test excluding non-informative sites (singletons), and corrected for multiple analyses using Bonferroni procedure (Sokal & Rohlf, 1981). The decay of LD along Sh2 was evaluated by using nonlinear regression (PROC NLIN in sas software, SAS, Cary, NC, USA) between polymorphism squared allele frequency correlations (r2) and physical distance along Sh2 (Remington et al., 2001). We tested both the drift-recombination model (Sved, 1971) and the model of Hill & Weir (1988) that includes a low level of mutation and adjustment for sample size. The minimum number of recombination events Rm was estimated using the four-gamete test (Hudson & Kaplan, 1985). We also estimated C = 4Nc, with N = effective population size and c = recombination rate between the most distant sites, using the method of Hudson (1987) to calculate Ĉ. As recombination may be strongly positively linked to diversity in plants (Dvorak et al., 1998; Tenaillon et al., 2002), we estimated the c/μ (μ = mutation rate) ratio through Ĉ/θW as θW estimates 4Nμ (Watterson, 1975), allowing comparison of recombination among genes. Zea mays huehuetenangensis strongly differs from all other Zea subspecies for Sh2 sequence, as commonly found for geographical distribution, phenological and morphological traits and molecular markers (Doebley, 1990a; Matsuoka et al., 2002). As a consequence, we did not include it in statistical analyses concerning the teosinte group nor the whole Z. mays species. The phylogenetic history of Sh2 alleles was reconstructed from nucleotide diversity based on parsimony method using paup software (Swofford, 2003). For this analysis, IDPs were included and weighted as one polymorphism each regardless to their length, and node confidence was assessed using 2000 bootstrap replicates (phylogenetic reconstruction including SNPs only gave the exact same tree and very similar bootstrap values).


Nucleotide variation at the Sh2 locus in Zea mays

DNA polymorphisms found among the 50 Z. maysSh2 alleles (including 33 inbreds, the GenBank BMS sequence used as reference, seven landraces from Z. m. mays, and nine teosintes from Z. m. parviglumis and Z. m. mexicana subspecies) are presented in Fig. 1 and summarized in Table 1. Out of a total sequence length of 4669 bp, 69 (1.48%) substitutions are observed. Most of them (88%) are found in noncoding regions, including the 5′ upstream region, introns, and the 5′ untranslated region (UTR; defined as the transcribed nonspliced region of Sh2, and including exon 1 and part of exon 2). The rate of polymorphism among synonymous sites of the coding region is not significantly different from that of the noncoding region (1.07% and 1.77%, respectively, inline image = 0.756). The estimates of nucleotide diversity π (Tajima, 1983) and θW (Watterson, 1975) both confirm a higher diversity in noncoding regions, especially in the 5′ UTR, when compared with the coding region where nonsynonymous sites show a very low diversity (Table 1). Cultivated forms retain 93% of the teosinte diversity, and inbred lines retain 86% of the landraces diversity. In contrast, for the coding region, nucleotide diversity is higher in cultivated when compared with teosintes, and higher in inbred lines when compared with landraces.

Table 1.   Nucleotide polymorphism at the Sh2 locus for several groups of accessions among 50 Zea mays sequences (teosintes include Z. m. mexicana and Z. m. parviglumis ssp.).
NHap# SitesTotal
Silent sites
Total coding
Total noncoding
  1. N, sample size; Hap, number of haplotypes based on SNPs and IDPs; π, the average number of pairwise differences among sequences per nucleotide site (×104, Tajima, 1983); θW, Watterson's estimator of θ = 4Neμ per base pair (with Ne being the effective population size and μ the mutation rate) based on the number of segregating sites (×104, Watterson, 1975).

Inbred lines
3410# SNPs53494647
π (×104)27.733.64.75.535.7
θW (×104)28.132.710.311.933.9
# IDPs15150015
77# SNPs36342333
π (×104)27.432.48.18.534.2
θW (×104)31.837.78.610.039.7
# IDPs11110011
4115# SNPs58544751
π (×104)27.533.
θW (×104)29.434.49.913.335.2
# IDPs17170017
65# SNPs25232223
π (×104)21.925.77.05.427.8
θW (×104)23.627.
# IDPs88008
98# SNPs39363336
π (×104)
θW (×104)
# IDPs13130013
5020# SNPs69645861
π (×104)27.733.55.55.835.7
θW (×104)33.439.011.814.540.2
# IDPs18180018

Eighteen IDPs are observed among the 50 accessions along the total sequenced region, among which none is located in the coding region, one in the 5′ UTR transcript region, and all others either in introns or in the 5′ upstream region (Fig. 1). These IDPs vary from 1 to 11 bases in length. Six of them are one-nucleotide repeat microsatellite-type IDPs (SSR). All other, non-SSR, IDPs show only two alleles, i.e. presence or absence of a 1–11 bp sequence. Unlike other maize genes (Tenaillon et al., 2002), longer IDPs were not found in Sh2.

Considering both SNPs and IDPs among the 50 Z. mays sequences, there is a total of 20 different haplotypes. Haplotypic groups based on SNPs are identical to those based on IDPs, with the exception of inbred line F284 that differs from haplotype 11 for a single 1-bp IDP in position 2515. Eight haplotypes (numbered Hap1, Hap5, Hap9, Hap11, Hap13, Hap15, Hap18 and Hap19, Fig. 1) are common to more than one accession, leading to a total of 10 haplotypes among 34 inbred lines, seven haplotypes among seven landraces, and eight haplotypes among nine teosintes from parviglumis and mexicana subspecies. The haplotypic diversity is slightly smaller among cultivated forms (0.871), and especially among inbred lines (0.847), than among teosintes (0.972) for the whole sequence. In terms of divergence among groups, we observe that two haplotypes are common to inbred lines and landraces (Hap5 and Hap18), two haplotypes are common to landraces and teosintes (Hap15 and Hap19), Hap9 is common to inbred lines and teosintes, and none is common to all three groups of accessions (Table 1).

Inferred amino acid polymorphisms

Although polymorphisms are more frequently observed in noncoding regions, they also occur within exons and, given their position and/or nature, they may or not alter the amino acid (aa) sequence of AGPase. The nucleotide sequence data allowed us to infer 402 aa of the 516 constituting the entire SH2 protein in Z. mays. Five aa of the 402 are polymorphic among the 50 Z. mays accessions, leading to five different proteins (Fig. 2). The most frequent aa sequence is shared by 37 accessions, including 27 inbred lines, four landraces, two Z. m. parviglumis and four Z. m. mexicana. Two other proteins are shared by five and six accessions, respectively, including inbred lines, landraces and teosintes in both cases. Among all replacements observed in Z. mays, six correspond to a modification in amino acid charge or polarity (aa at positions 29, 52, 275, 316, 318 and 395), whereas two lead to similarly charged aa (positions 256 and 331).

Figure 2.

 Polymorphisms from inferred amino acid (aa) partial sequences of 53 Zea accessions. Numbers at the top indicate the polymorphic nucleotid and aa positions in reference to the BMS published sequence. *The most frequent aa sequence is found in 37 accessions: BMS, F252, F268, F271, F608, LH52, MO17, F292, Coest6, Co255, F2, F283, F584, LH146, LH74, CML239, CM105, F7001, LH82, MBS797, LH181, MBS847, B89, F284, RX01, RX02, RXO3, CHA, CON, COR, TUX, par1, par3, mex1, mex2, mex4 and mex6.

Allelic frequency distribution and neutrality tests

In order to evaluate the role of selective processes on Sh2 diversity, we calculated Tajima's DT statistics (Tajima, 1983) on various subsets of Z. mays sequences and for various gene regions. Table 1 shows that θW is higher than π for the whole sequenced region in all gene regions, and most specifically in the coding region. The deduced Tajima's DT value for the coding region is negative and close to significance (P < 0.10) when considering maize, teosintes (Z. m. mexicana and Z. m. parviglumis) or both together (Table 2). In teosintes, this effect is clearly caused by nonsynonymous sites, as Tajima's DT is more significant when considering nonsynonymous sites only (P < 0.05). These results globally indicate a significant excess of low-frequency polymorphisms in the sample. For the noncoding and the whole sequenced regions, as well as for silent sites, Tajima's DT values are negative for all sequence subsets, although not all significant.

Table 2.   Tajima's DT tests of neutrality on Sh2 SNPs. Noncoding region includes the 5′ upstream region (putative part of the promoter), 5′ UTR and introns. Analyses were performed on cultivated maize (inbred lines + landraces), Zea mays mexicana subspecies, teosintes (excluding huehuetenangensis), and all Zea mays (excluding huehuetenangensis).
 MaizeZ. m. mexicanaTeosintesZ. mays
  1. *0.02 < P < 0.05; †0.05 < P < 0.10


We tested for genetic hitchhiking by calculating the Fay and Wu's H statistic on Z. mays including the huehuetenangensis subspecies, using Z. diploperennis and T. dactyloides to infer ancestral allele at each SNP. We found a nonsignificant result (H = 1.53, P > 0.50) for the whole Sh2 sequenced region, indicating that most derived SNP variants are rare in Z. mays species. As a recombination has occurred before base 3398 (see below), we also performed the Fay and Wu's test considering only the 3′-end of the sequenced region, and found a very significant H (H = −5.68; P = 0.0002). Indeed, along this 232 bp region, eight SNPs were found among which four show the same striking pattern (positions 3398, 3406, 3463 and 3606 in Fig. 1): the ancestral allelic form is present, among Z. mays, in the two huehuetenangensis accessions and the F618 inbred line only.

Nonrandom associations between polymorphic sites

We analysed statistical associations among Sh2 polymorphic sites in order to (i) estimate the amount of recombination events among the study population and (ii) infer the evolutionary history of this gene. The latter is of particular interest for intraspecific phylogenetic reconstruction as reticulation, caused by recombination, may bias the inferred phylogeny. As the value of linkage disequilibrium (LD) depends on allele frequency, we compared pairs of polymorphisms, including both SNPs and IDPs and using Fisher's exact tests. This was carried out for several sequence subsets, e.g. all Z. mays, cultivated forms or inbred lines only. For inbred lines, Fig. 3 reveals many occurrences of significant LD along Sh2, although Bonferroni's correction makes our analysis extremely conservative (indeed, many pairs of polymorphisms that are in strong or even total LD show nonsignificant P values because of unbalanced allele frequencies). It is striking that the distance between sites in LD varies from very short (Fig. 3, along the diagonal) to as long as more than 4000 bp apart. Moreover, our data show nonsignificant decay of LD (either D′ or r2) with physical distance, whether we consider the drift-recombination model (Sved, 1971) or the mutation-drift-recombination model (Hill & Weir, 1988; Remington et al., 2001). We obtained very similar results when excluding sites with rare alleles (lower than 5%) instead of excluding singletons only, indicating that this pattern is not due to complete LD between rare mutations.

Figure 3.

 Linkage disequilibrium (LD) matrix among Sh2 polymorphisms for 34 Zea mays sequences of inbred lines. The top part of the graph refers to R2 values, and the bottom part to P values after Bonferroni correction on Fisher's exact tests. Numbers on top and left of the matrix refers to SNP and IDP positions along Sh2 gene.

The strong LD found along Sh2 could be the result of the absence or very rare occurrence of recombination in the history of the study sample (50 accessions). However, Hudson's four-gamete test (Hudson & Kaplan, 1985) provides an estimate of Rm = 3 recombination events in this sample for the whole sequenced region. These three recombination events occurred between positions 32 and 321 among the 4669 sequenced base pairs. The estimation of recombination from Hudson (1987) leads to a c/μ ratio of 2.59 for the 50 Z. mays accessions. Finally, considering the Z. m. huehuetenangensis accession in Hudson and Kaplan's test increases the number of recombination events by 1 as Z. m. huehuetenangensis strongly differs from all other Z. mays accessions except for four sites at positions 3398–3606 (3′ end of the sequenced region) where it is identical to inbred line F618. This reveals two blocks along the sequenced region, and is the only clear evidence of recombination event in our data set. As a consequence, sites 3398–3606 were excluded from the phylogenetic reconstruction.

Phylogenetic relationships among Sh2 haplotypes

We constructed a phylogenetic tree of Sh2 alleles in the Zea genus using both T. dactyloides (sequence not shown) and Sorghum bicolor (Chen et al., 1998) as outgroups. Under the hypothesis that Sh2 is involved in the domestication syndrome of maize, we would expect a monophyletic maize group. Figure 4 clearly shows that it is not the case. Indeed, the tree shows no evidence of monophyly for any Z. mays subspecies as the two Z. m. parviglumis and the four Z. m. mexicana alleles are distributed among branches that also include Z. m. mays alleles. As observed for the Tb1 gene (Wang et al., 1999), selection during maize domestication may have focused only on part of the sequence. To test this hypothesis, we built trees for the 5′ upstream region only, the 5′ UTR, the gene itself (including exons and introns) and the coding region only. In all cases, we did not observe a better resolution of the Z. mays subspecies nor a monophyletic Z. m. mays group, the huehuetenangensis accession consistently appearing more closely related to Z. diploperennis than to the other Z. mays. We should note that, among all polymorphisms, only two substitutions are not consistent with the reconstructed tree, and could be caused by homoplasy (positions 38 and 297 in nontranscript regions; see Fig. 1). The proposed phylogeny (Fig. 4) includes these two sites. Excluding them leads to a very similar tree with higher bootstrap values.


Sh2 is under selective constraint in Zea mays

The study of nucleotide polymorphism in Sh2 suggests that this gene has experienced selective constraints in both wild and cultivated Z. mays. This was already suggested by Whitt et al. (2002) although they obtained rather higher diversity values (0.0050 and 0.0013, for silent and nonsynonymous maize diversity respectively) than the ones found in the present study (0.00334 and 0.00055 respectively) probably because we corrected our sequences for PCR artefacts, removing many singletons in sequences obtained from cloned fragments in landraces and teosintes. The very low diversity values observed in our data, either for cultivated accessions or teosintes (mexicana alone, or both mexicana and parviglumis subspecies) indicate that Sh2 diversity is low compared with other genes in Z. mays (White & Doebley, 1999; Tenaillon et al., 2001; Thornsberry et al., 2001; Tiffin & Gaut, 2001; Zhang et al., 2002; Wright et al., 2005), including genes involved in the starch biosynthetic pathway (Whitt et al., 2002) or the protein biosynthesis such as Opaque2 gene (O2) that has been sequenced for the same sample as Sh2 in our laboratory (Henry et al., 2005). The Sh2 polymorphism level, in maize as well as teosintes, is similar to values for genes such as Te1, Tb1 and C1 that are known or strongly suspected to be under selection (White & Doebley, 1999; Tenaillon et al., 2001, 2004; Zhang et al., 2002).

This very low nucleotide diversity appears as the strongest evidence for selective constraints acting on Sh2 in the whole species Z. mays. A consequence of such low diversity is that most of the statistics that could be calculated to reveal selective events have very low power as they are based on very few SNPs. Nevertheless, other arguments are consistent with the selective constraint hypothesis. First, the level of nucleotide diversity along Sh2 gene is the lowest among nonsynonymous coding sites, and much lower among coding than noncoding sites (Table 1). The low nucleotide diversity in noncoding regions may be a consequence of selection on the coding region and strong linkage disequilibrium all along Sh2 gene (Fig. 3). As a consequence of the low nucleotide diversity in the coding region, amino acid sequence of SH2 protein is highly conserved (Fig. 2). Secondly, negative Tajima's DT were found for the coding region (0.05 < P < 0.10), and for nonsynonymous sites in teosintes (P < 0.05; Table 2). A negative Tajima's DT value reveals an excess of rare alleles and may be the consequence of evolutionary mechanisms acting either at the gene level, i.e. selection for an advantageous allele accompanied by the elimination of deleterious variants, or at the population level such as population structure or recent increase in population size (Tajima, 1993). We cannot rule out a potential demographic effect on Tajima's DT values in cultivated maize, as maize domestication is known to have been accompanied by a strong bottleneck followed by rapid increase in population size (Eyre-Walker et al., 1998). However, it is not supposed to be the case for teosinte accessions, which exhibit Tajima's DT value similar to that of maize. In allogamous species, where most genes may be considered to evolve independently, the comparison among Tajima's DT values for different genes could help to distinguish between population level mechanisms that affect the whole genome and selective processes that should affect limited genomic regions around selected loci. The O2 gene, sequenced for the same sample shows Tajima's DT values far from significance (Henry et al., 2005). In the study of 21 loci sequenced for 25 cultivated maize, the strongly selected Tb1 gene is the only locus showing a significant Tajima's DT value, similar to that observed for Sh2 in the present study (Tenaillon et al., 2001). Among a large study of 774 loci, only about 5 % shows negative DT as low as that we found for Sh2 (Wright et al., 2005).

Together, these results indicate that processes acting at the population level, such as structure or demography, are not sufficient to explain the Sh2 pattern of polymorphisms. Our results show that purifying selection is acting on this gene, in accordance with previous work (Whitt et al., 2002), leading to reduced diversity and excess of rare alleles in both wild and cultivated Z. mays. This selective force may result from the important role of Sh2 in plant starch biosynthesis (Tsai & Nelson, 1966; Preiss, 1988), leading to a strong conservation in amino acid sequence in Z. mays species, as well as more largely in plants (Bhave et al., 1990).

Evolutionary history of Sh2 in Zea mays

As a result of bottleneck and/or selection during maize domestication, nucleotide diversity is lower in maize than in teosintes for all genes reported in the literature (Hanson et al., 1996; White & Doebley, 1999; Whitt et al., 2002; Zhang et al., 2002; Wright et al., 2005) as well as neutral markers (Matsuoka et al., 2002; Vigouroux et al., 2005). The present data do not show such a clear pattern on Sh2 as we found that diversity in maize is only slightly lower (93%) than in teosinte for the whole sequenced region, and even higher (150%) for the Sh2 coding region. Moreover, alleles sequenced in wild and cultivated accessions are very similar and interleaved in the tree reconstruction (Fig. 4). Maize and teosintes share three haplotypes (Fig. 1; Hap3, Hap5 and Hap9), and group in highly supported clades in the tree reconstruction (Fig. 4; bootstraps > 90%), indicating that there is no specific Sh2 sequence for either cultivated or wild forms. In contrast to several studies on genetic markers (Matsuoka et al., 2002), Sh2 diversity of maize is not more closely related to Z. m. parviglumis than to Z. m. mexicana. As it is commonly accepted and recently confirmed by SSR marker analysis (Matsuoka et al., 2002) that cultivated maize originated from Z. m. parviglumis through a single domestication event, our observations may result from a high level of post-domestication gene flow between wild and cultivated Z. mays. Direct observations of in situ hybridization between maize and sympatric teosinte have been documented in Mexico especially for the mexicana subspecies (Wilkes, 1977; Doebley, 1990b), suggesting that although maize derived from parviglumis, high level of introgression from mexicana should not be neglected. The phylogenetic pattern led us to the conclusion that selective pressures on Sh2 identified in the Z. mays species are independent of maize domestication, and that large-effect mutations on the AGPase enzyme would probably be deleterious for both maize and teosinte kernel development.

A specific allele pattern is observed at the 3′ end of the sequenced region (from nucleotide 3398 in intron 10 to 3606 in exon 13). The rare alleles at positions 3398, 3406, 3463 and 3606 present in inbred line F618 and Z. m. huehuetenangensis are also found in Z. diploperennis and T. dactyloides. These four SNPs are in total linkage disequilibrium and constitute a rare ancestral haplotype, leading to a locally highly significant Fay and Wu's H-test. The same haplotype is also rare among inbred lines sequenced by Whitt et al. (2002) as it is found only in NC260, B37 and B73, the two latter being strongly related to F618 (A. Charcosset, pers. comm.), suggesting that these four inbreds recently inherited the same Sh2 allele. This haplotype may originate in recent hybridization between a cultivated form and Z. m. huehuetenangensis, which retained an ancestral Sh2 allele, followed by a recombination event within Sh2 a few nucleotides upstream position 3398. However Z. m. huehuetenangensis strongly diverges from all other Z. mays both geographically and genetically (Matsuoka et al., 2002), suggesting low intercrossing, either through natural hybridization or introgression programmes. Alternatively, the spread of the derived haplotype may be the result of an ancient selective sweep that drove an advantageous Sh2 allele in most individuals of mexicana, parviglumis and mays subspecies. An excess of derived variants at high frequency is a unique pattern produced by hitchhiking during a selective sweep with low recombination (Fay & Wu, 2000) that may be distinguished from background selection (Kim & Stephan, 2000) and is generally accompanied by strong linkage disequilibrium (Kim & Nielsen, 2004) as observed along Sh2. We can thus propose the following scenario: the ancestral haplotype, still present in Z. m. huehuetenangensis, T. dactyloides and Z. diploperennis, has been replaced by a more efficient haplotype in the taxon ancestral to mexicana, parviglumis and mays subspecies. A few copies of the ancestral haplotype would have been conserved and transmitted, through the mexicana–parviglumis divergence and maize domestication, to some current inbred lines.

Evolution of AGPase large subunit in Zea mays

Nucleotide variation leading to amino acid changes in Sh2 clearly seems to be submitted to purifying selection. We observed a very low level of protein diversity, with no variation in protein length as no IDP exists in the coding region, and only five amino acid substitutions leading to five different protein sequences among 50 Z. mays accessions from the three subspecies mays, mexicana and parviglumis (Fig. 2). All these polymorphisms correspond to a modification in amino acid charge or polarity (aa in positions 52, 275, 316, 318 and 395), and two of them (positions 52 and 318) are potentially associated with phenotypic variation in AGPase activity, kernel weight or starch content among inbred lines (Manicacci et al., 2001). Physiological studies revealed that several amino acids are involved in protein regulation or activity, or in protein–protein interaction within the heterotetrameric AGPase (Mullerrober et al., 1994; Ballicora et al., 1998; Greene & Hannah, 1998; Greene et al., 1998). Among those sites, the ones we sequenced were all monomorphic. Variation at these sites may be strongly deleterious, leading to strict monomorphism or very low variation that has not been sampled. Another possibility is that selection acting on nonsequenced Sh2 sites may be detected on the sequenced region because of LD between sites distant from several kb. Although some recombination has been detected in the present data set, strong linkage disequilibrium occurs among sites that are separated by up to 4 kb and LD does not decay with physical distance (Fig. 3), as opposed to most other genes studied in this species (Remington et al., 2001; Tenaillon et al., 2001; Ching et al., 2002). A hypothesis to explain the strong LD along Sh2 could also be that selection acts on groups of sites that interact through molecular quaternary structure of the large subunit of AGPase, or on sites involved in interactions among large and small subunits. Unfortunately, very little is known about the quaternary structure of AGPase.

Starch biosynthesis, a complex pathway involving many enzymatic functions, is an important factor in maize seed germination and seedling fitness. We have shown that the Sh2 gene, which encodes a subunit of the key enzyme AGPase in this pathway, is under selective pressures in Z. mays. This result, as well as the strong conservation of the AGPase protein sequence in plants, suggests that natural selection has consistently reduced genetic diversity in the Sh2 gene, by favouring efficient SH2 proteins. Since the beginning of maize domestication, humans have brought drastic selective pressure to bear on many traits of domestication or agronomical importance, and have been able to continuously increase kernel weight, through increase in starch accumulation. Unexpectedly, no consequence of such a selection may be detected in molecular data of the Sh2 gene. To explain such a paradox, we may assume that, because of the crucial role of Sh2 in fitness, natural selection had retained highly effective alleles long before humans started agriculture, leading to a reduced genetic diversity in Sh2 and making human selection inefficient on this gene. The important response to human selection in maize kernel filling may thus be the consequence of genetic evolution of other genes or sequences involved in regulatory and/or enzymatic functions during starch accumulation.


The authors thank Maud Tenaillon for helpful discussion, and anonymous reviewers for relevant comments. Part of this work has been supported by the Genoplante I S3P1 programme.