Aegilops tauschii single nucleotide polymorphisms shed light on the origins of wheat D-genome genetic diversity and pinpoint the geographic origin of hexaploid wheat


Author for correspondence:

Jan Dvorak

Tel: +1 530 752 6549



  • Hexaploid wheat (Triticum aestivum, genomes AABBDD) originated by hybridization of tetraploid Triticum turgidum (genomes AABB) with Aegilops tauschii (genomes DD). Genetic relationships between A. tauschii and the wheat D genome are of central importance for the understanding of wheat origin and subsequent evolution.
  • Genetic relationships among 477 A. tauschii and wheat accessions were studied with the A. tauschii 10K Infinium single nucleotide polymorphism (SNP) array.
  • Aegilops tauschii consists of two lineages (designated 1 and 2) having little genetic contact. Each lineage consists of two closely related sublineages. A population within lineage 2 in the southwestern and southern Caspian appears to be the main source of the wheat D genome. Lineage 1 contributed as little as 0.8% of the wheat D genome. Triticum aestivum is subdivided into the western and Far Eastern populations. The Far Eastern population conserved the genetic make-up of the nascent T. aestivum more than the western population. In wheat, diversity is high in chromosomes 1D and 2D and it correlates in all wheat D-genome and A. tauschii chromosomes with recombination rates.
  • Gene flow from A. tauschii was an important source of wheat genetic diversity and shaped its distribution along the D-genome chromosomes.


Hexaploid wheat (Triticum aestivum) originated by hybridization of tetraploid Triticum turgidum with diploid Aegilops tauschii (Kihara, 1944; McFadden & Sears, 1946). Aegilops tauschii is a widely distributed (van Slageren, 1994) and genetically diverse species (Lubbers et al., 1991; Dvorak et al., 1998c, 2011). It encompasses four morphological varieties, of which three (var. typica, var. anathera, and var. meyeri) are grouped into A. tauschii ssp. tauschii, whereas the fourth is monotypic (A. tauschii ssp. strangulata) (Eig, 1929; Kihara & Tanaka, 1958).

It has been widely accepted that A. tauschii ssp. strangulata is the source of the wheat D genome (Nishikawa, 1973; Nakai, 1979; Jaaska, 1980; Nishikawa et al., 1980; Lagudah et al., 1991; Lubbers et al., 1991; Dvorak et al., 1998c, 2012). Aegilops tauschii ssp. strangulata is distributed from Transcaucasia (Armenia and Azerbaijan) to eastern Caspian Iran (Kihara et al., 1965; Jaaska, 1980). In the southwestern and southern Caspian Iran, A. tauschii ssp. strangulata overlaps with A. tauschii ssp. tauschii var. meyeri and var. typica. Aegilops tauschii ssp. strangulata in Transcaucasia and southwestern Caspian Iran has been suggested most often as the putative source of the wheat D genome (Tsunewaki, 1966; Nakai, 1979; Jaaska, 1980; Dvorak et al., 1998c) although other proposals have also been made (Nishikawa et al., 1980).

Aegilops tauschii botanical categories agree poorly with genetic relationships (Dvorak et al., 1998c). The most apparent contradiction is encountered with var. meyeri, which is assigned to A. tauschii ssp. tauschii on the basis of morphology but is genetically closely related to A. tauschii ssp. strangulata (Lubbers et al., 1991; Dvorak et al., 1998c). In genetic studies, therefore, the use of categories based on genetic subdivision of A. tauschii is preferable to those based on formal taxonomy. The primary genetic subdivision is a subdivision into two evolutionary lineages (Lubbers et al., 1991; Dvorak et al., 1998c, 2012; Mizuno et al., 2010; Sohail et al., 2012). The lineages have received various names. For the sake of continuity with preceding studies, we will use names employed by Mizuno et al. (2010): lineage 1 and lineage 2 (L1 and L2, respectively). Lineage 1 is broadly related to A. tauschii ssp. tauschii and lineage 2 is broadly related to A. tauschii ssp. strangulata.

The differentiation of A. tauschii into two lineages elicits several questions relevant to the origin of wheat and shaping of its diversity. Recurrent hybridization and introgression between wheat and A. tauschii are known to have played a role in the origin of wheat D-genome diversity, although the magnitude is unknown (Dvorak et al., 1998a,c; Talbert et al., 1998; Caldwell et al., 2004; Akhunov et al., 2010). Did only lineage 2 contribute germplasm to the wheat D genome? If so, why lineage 2 and not lineage 1? Diversity is uneven among and along the wheat D-genome chromosomes (Akhunov et al., 2010). So, does the distribution of diversity along wheat chromosomes have anything to do with its distribution along the A. tauschii chromosomes, and what is the cause of this pattern?

To shed light on these and related questions, we analyze here genetic diversity in the wheat D genome with 7185 single nucleotide polymorphisms (SNPs) randomly selected among 195 631 genic SNPs discovered between A. tauschii accession AL8/78 collected in Armenia (lineage 2) and A. tauschii accession AS75 collected in central China (lineage 1) (You et al., 2011). The SNPs were used to design an A. tauschii 10K Illumina Infinium SNP array, which was employed in the construction of comparative genetic and physical maps of the A. tauschii genome (M-C. Luo et al., unpublished). Here we use the 10K Infinium SNP array for the study of genetic relationships within and between A. tauschii and wheat with the goal of identifying the sources of the wheat D-genome gene pool, its structure, and relative contribution of the A. tauschii lineages.

Materials and Methods


A total of 484 accessions of Aegilops and Triticum were used in this study: 402 accessions of A. tauschii, 75 accessions of hexaploid wheat, and seven accessions of tetraploid wheat (Supporting Information, Table S1). Of the 402 A. tauschii accessions, latitude and longitude were available for 325 accessions (Table S1). DNA was extracted from a leaf segment of a single plant from each accession as described earlier (Dvorak et al., 2006) and diluted to 200 ng μl−1 for Infinium SNP genotyping.

10K Infinium SNP array and SNP genotyping

The construction of the A. tauschii 10K SNP array will be detailed elsewhere (M-C. Luo et al., unpublished). A total of 7185 SNP markers in the array were mapped on the A. tauschii genetic map and on the physical map of the A. tauschii genome built from bacterial artificial chromosome clones (M-C. Luo et al., unpublished). SNPs among investigated plants were assayed according to manufacturer's protocol (Illumina, San Diego, CA, USA) at the University of California, Davis, Genome Center. Normalized Cy3 and Cy5 fluorescence for each DNA sample was graphed with the GenomeStudio program (Illumina), resulting in genotype clustering for each SNP marker.

Genotyping of polyploid organisms is complicated by the presence of two or more copies of a targeted gene in the nucleus (Akhunov et al., 2009). To ascertain that the D-genome marker was being assayed, tetraploid T. turgidum ssp. dicoccoides, T. turgidum ssp. dicoccon, and T. turgidum ssp. durum were genotyped along hexaploid wheat accessions as references to determine the likely genotype in the A, B, and D genomes. Data for each marker were manually checked. Data clustering and the reference location in the GenomeStudio graph were used to ensure that a D-genome marker was assayed.

To evaluate bias associated with the A. tauschii 10K SNP array, 11 wheat and four A. tauschii accessions (Table S1) previously genotyped by sequencing (Akhunov et al., 2010) were included in this study and genotyped with the A. tauschii 10K SNP array. Genetic distances among the accessions based on SNPs in haplotype sequences (Dvorak et al., 2012) and genetic distances based on SNPs in the A. tauschii 10K SNP array were used as variables in correlation analysis.

Phylogeny reconstruction

The genetic distances among accessions or groups of accessions were computed with MEGA 4.0. Pairwise genetic distances were calculated using the p-distance model. Variances were estimated by bootstrap technique (1000 replicates, random seed). The neighbor-joining (NJ) trees were constructed with 1000 bootstrap iterations using the nucleotide p-distance model under pairwise deletion in MEGA 4.0 (Tamura et al., 2007).

Genetic structure analysis

The FST parameter (Hudson, 2002) between A. tauschii lineages or sublineages and hexaploid wheat subspecies was estimated with DnaSP 5.10.01 (Librado & Rozas, 2009). The Bayesian inference program Structure 2.3.3 (Pritchard et al., 2000; Falush et al., 2007) was used to assess lineage structure and genetic admixture with the 7185 SNP markers mapped on the A. tauschii genetic map (M-C. Luo et al., unpublished). A total of 325 A. tauschii accessions with known geographic locations and 75 wheat accessions were employed in the analysis. The linkage ancestry model and the allele frequency correlated model were used. Linkage model analysis performs better than the original admixture model when using linked loci to study admixed populations. It achieves more accurate estimates of the ancestry vector, and can extract more information from the data. Unfortunately, the linkage ancestry model needs much more computing power than the admixture or no admixture models. We attempted to use 100 000 burn-in iterations followed by 100 000 Markov Chain Monte Carlo (MCMC) iterations as recently recommended (Gilbert et al., 2012). Owing to the large sample size and the large number of SNP markers, the analysis would require many months of computer time (AMD Opteron Processor 6212 × 16 CPUs, 32 Gb RAM, 7 Tb disk space). We therefore resorted to the following compromise. A total of 100 burn-in iterations followed by 100 MCMC iterations for K = 1–10 clusters were used to identify the optimal range of K. For each K, five independent runs were produced. The optimal value of K was determined using the delta K method (Evanno et al., 2005). The graph of delta K showed a maximum at K = 2 followed by K = 6. Values of K > 8 resulted in weak population structure and unstable groupings of accessions. Full runs were then performed using the linkage ancestry model option with 15 000 burn-in iterations followed by 10 000 MCMC iterations for K = 2–8, with three repetitions at each K. The alpha and likelihood statistics were verified to reach convergence before the 15 000 burn-in iterations. Distruct1.1 and Clumpp 1.1.2 were used to sort the cluster labels automatically and produce graphical displays of structure results (Rosenberg et al., 2002; Jakobsson & Rosenberg, 2007).

Chromosomal distribution of haplotypes

For each of the 7185 A. tauschii SNPs it was determined whether a site was polymorphic or monomorphic among 225 A. tauschii accessions of the L1 lineage, 137 A. tauschii accessions of the L2 lineage, and 75 wheat accessions. If a site was monomorphic, it was recorded whether the fixed haplotype corresponded to that present in A. tauschii accession AS75 (L1 lineage or to that present in A. tauschii accession AL8/78 (L2 lineage). The following conditions were considered as evidence of introgression of a haplotype from the A. tauschii L1 lineage into the wheat D genome: lineage 2 was monomorphic for the L2 haplotype, lineage 1 was monomorphic for the L1 haplotype or was polymorphic for both the L1 and L2 haplotypes, and wheat was monomorphic for the L1 haplotype or polymorphic for the L1 and L2 haplotypes. The total numbers of SNP sites in contiguous blocks containing two or more sites satisfying these conditions were recorded in each chromosome.


10K Infinium SNP array

A total of 402 accessions of A. tauschii and 75 accessions of T. aestivum were genotyped with the A. tauschii 10K Infinium SNP array (Tables 1, S1). The minor allele frequency (MAF) = 0.05 was selected as a boundary for declaring a locus polymorphic to minimize the effects of genotyping errors on the analyses.

Table 1. Geographic origin of Aegilops tauschii and hexaploid wheat accessions individually listed in Table S1
CountryCode A. tauschii WheataTotal
  1. a

    Hexaploid wheat includes Triticum aestivum ssp. compactum, ssp. macha, ssp. spelta, ssp. vavilovii, and ssp. aestivum.

ArmeniaARM14 14
AzerbaijanAZE37 37
RussiaRUS4 4
SyriaSYR8 8
TurkmenistanTKM8 8
AfghanistanAFG57 57
IndiaIND2 2
TajikistanTJK13 13
KazakhstanKAZ5 5
KyrgyzstanKGZ1 1
UzbekistanUZB17 17
MacedoniaMKD 11
AustriaAUT 11
Western AsiaN/A4 4
Wheat cultivarCV 1513
Total 40275477

To assess the utility of the A. tauschii 10K Infinium SNP array for studying genetic relationships, genetic distances among four A. tauschii accessions and 11 wheat accessions were estimated with the 10K Infinium SNP array and by Sanger sequencing of 121 genes (Akhunov et al., 2010; Dvorak et al., 2012). Correlation coefficients were computed using genetic distances as variables either for all data (r = 0.77, = 129, < 0.001) or for wheat accessions only (= 0.53, N = 78, = 0.046).

Genetic relationships

The model-based hierarchical structure among 325 A. tauschii accessions with known geographic origin and 75 wheat accessions was investigated using the Structure program. Based on the pattern and consistency of individual accession assignments in the global analysis, we concluded that K = 2–6 captured most of the biologically relevant information (Fig. 1). No additional information was gained at K > 6 (Fig. 1). The A. tauschii accessions were allocated into two lineages, L1 and L2, at all values of K. Both lineages were subdivided into western (W) and eastern (E) sublineages at K = 4–8. Sublineage 1W was located in eastern Turkey, Armenia, Azerbaijan, and western Iran and sublineage 1E was distributed from central Iran to China (Fig. 2). Sublineage 2W was located in Armenia and Azerbaijan, and sublineage 2E was located in Caspian Azerbaijan and Caspian Iran (Fig. 2).

Figure 1.

The structure of the population of 325 Aegilops tauschii accessions with known geographic locations and 75 hexaploid wheat accessions. Geographic areas are listed on top and sublineages or species are listed on the bottom. In a global analysis, all 400 samples were analyzed together (upper panel). They were also analyzed separately based on species and lineages (lower panels; from left to right): 225 accessions of lineage 1, 100 accessions of lineage 2, and 75 wheat accessions. For abbreviations of accession groups see Table 1. CHN-1, Xinjiang, China; CHN-2, Chinese provinces Henan and Shaanxi. Note the differentiation of the accessions from central China (CHN-2) from the rest of sublineage 1E. The Chinese wheat accessions are homogeneous and show the least admixture (bottom right panel) at all values of K.

Figure 2.

The topographic distribution of Aegilops tauschii accessions used in the study and their allocation into sublineages by Structure. Map source:

The Structure analysis was then performed separately with accessions allocated to lineage 1, lineage 2, and wheat using K = 2–6 (Fig. 1). The results were similar to those obtained with the global Structure analysis except that the accessions in central China separated from the rest of sublineage 1E at values of K = 4–6. Both eastern sublineages showed more admixture than the western sublineages.

Wheat clustered with lineage 2 at K = 2 and formed a separate cluster at K = 3–8 (Fig. 1). Admixture was prevalent in wheat. The landraces of Chinese origin and other wheats of the Far Eastern origin appeared homogeneous and separate from landraces that originated in western Asia.

Greater proximity of wheat to A. tauschii lineage 2 than to lineage 1 was confirmed by pairwise FST (Table 2). The smallest pairwise FST values for all wheat subspecies were with sublineage 2E, suggesting that all wheat forms originated from sublineage 2E (Table 3). Bread wheat (T. aestivum ssp. aestivum) and club wheat (Taestivum ssp. compactum) were the least differentiated from sublineage 2E and T. aestvum ssp. macha was the most differentiated from it. Pairwise FST values also showed that wheat was significantly less differentiated from sublineage 1W than from sublineage 1E (Table 3).

Table 2. Pairwise FST between Aegilops tauschii lineages 1 and 2 and wheat
Table 3. Pairwise FST between Aegilops tauschii sublineages 1W, 1E, 2W, and 2E and wheat subspecies
 1Wa1E2W2Eb aestivum compactum macha spelta vavilovii
  1. a

    Mean FST values between wheat using subspecies as variables and the 1W sublineage are significantly smaller than those between wheat subspecies and the 1E sublineage (P < 0.0001, t-test).

  2. b

    Mean FST values between wheat using subspecies as variables and the 2E sublineage are significantly smaller than those between wheat and the 2W sublineage (P < 0.0001, t-test).

ssp. aestivum0.9580.9620.7310.522    
ssp. compactum0.9600.9630.7370.5330.100   
ssp. macha0.9620.9650.7490.5490.3380.338  
ssp. spelta (Iran)0.9620.9650.7460.5540.2980.3110.549 
ssp. vavilovii0.9650.9680.7620.5760.4520.4900.6690.659

To validate the inferences obtained with the pairwise FST, genetic distances were computed between the five groups (Table 4). They completely agreed with the relationships suggested by pairwise FST.

Table 4. Genetic distance among Aegilops tauschii sublineages 1W, 1E, 2W, and 2E and hexaploid wheata
  1. a

    The upper triangle contains bootstrap variance estimates and the lower triangle contains genetic distances.


Similar genetic relationships within A. tauschii and between A. tauschii and wheat were also revealed by NJ trees constructed from genetic distances (Fig. 3). Aegilops tauschii accessions were split into two widely separated branches consisting entirely of accessions allocated to lineage 1 and 2 by Structure. Of the 402 A. tauschii accessions, only two, PI603254 (collected near Ramsar, province Mazandaran, Iran) and KU2157 (collected near Shahabad, province Mazandaran), were intermediate between the L1 and L2 branches. The NJ tree separated the 1W and 1E sublineages and 2W and 2E sublineages in agreement with the FST analysis. Based on genetic distances between individual A. tauschii and wheat accessions, the population of 75 wheat accessions formed a monophyletic branch near branch 2 in the vicinity of sublineage 2E (Fig. 3).

Figure 3.

Neighbor joining tree showing genetic relationships among individual accessions of Aegilops tauschii and wheat. Bootstrap confidence values of critical branches are shown. The allocation of accessions into sublineages 1W and 1E and 2W and 2E and to wheat is based on the Structure analysis. The wheat branch (inset) is rooted by a single A. tauschii accession selected among the accessions listed in Table 5. The branch indicates a monophyletic origin of wheat. The rooted wheat tree in the inset is subdivided into seven branches (W-1 to W-7). The W-1 branch consists of Iranian landraces sympatric with the 2E sublineage. The root subdivided the Asian accessions of wheat into Far Eastern W-2 and W-3 branches and western W-4 to W-7 branches. Numbers in the wheat tree are serial numbers of accessions in Supporting Information, Table S1.

The critical branches of the tree had 100% bootstrap confidence (Fig. 3). To further assess the confidence of the relationships between wheat and the A. tauschii sublineages, genetic distances were computed between 75 wheat and 402 A. tauschii accessions using SNP markers for each of the seven chromosomes separately. Twelve A. tauschii accessions showed the shortest distance to one or more wheat accession on single-chromosome basis (Table 5). All 12 belonged to sublineage 2E and all were located in southwestern and southern Caspian Iran (Fig. 2). Only one accession of the 12 was classified as A. tauschii ssp. strangulata by its collector. Six were classified as A. tauschii ssp. tauschii var. typica, and five as A. tauschii ssp. tauschii var. meyeri (Table 5).

Table 5. Geographic locations and classification of Aegilops tauschii accessions showing the shortest genetic distances to individual accessions of hexaploid wheat on a single chromosome basis
ChromosomeAccessionTaxonLongitudeLatitudeNo. of wheat accessions
1DKU2094 typica 52.09236.58767
CIae26 typica 49.46237.4718
2DPI276985 meyeri 53.54236.69444
CIae26 typica 49.46237.47136
3DCIae23 meyeri 50.68336.90067
KU2100 meyeri 50.42537.05514
4DKU2094 strangulata 52.09236.58768
 AS63 typica N/AN/A13
5DKU2103 typica 49.72137.27664
KU2106 typica 49.17337.54114
6DCIae21 typica 52.10036.58375
7DPI603253 meyeri 50.68636.88372
CIae23 meyeri 50.68336.9003
PI603251 typica 50.68336.9003
RL5289 meyeri N/AN/A3
1D-7DCIae23 meyeri 50.68336.939

To orient the wheat branch, wheat trees were constructed separately using each of the 12 A. tauschii accessions in Table 5 as a root. The 12 wheat trees were similar in topology to the tree depicted in Fig. 3 and all were rooted by the Wheat-1 branch consisting of Iranian bread wheat landraces PI622243 and PI622233, collected at Sari (province Mazandaran), and landrace PI622268, collected at Tonekabon (province Mazandaran).

Rooting revealed two main branches in the wheat tree: the Far Eastern branch consisting of Chinese landraces, Tibetan feral wheat (T. aestivum ssp tibetanum), Yunnan wheat (T. aestivum ssp. yunnanensis) (branch Wheat-2) and Chinese rice wheat (T. aestivum ssp petropavlovskyi) (branch Wheat-3); and the west Asian branch Wheat-1 and the Wheat-4 to Wheat-7 branches. The Far Eastern branch was more closely related to the A. tauschii sublineage 2E than was the west Asian wheat branch (Table 6).

Table 6. Genetic distances between the Far Eastern wheat population, the west Asian wheat population, introgressed Iranian wheat accessions (Wheat-1), and Aegilops tauschii lineage 2a
 L2Wheat-1Far EasternWestern
  1. a

    The upper triangle shows bootstrap variance and the lower triangle shows genetic distances.

  2. b

    Distances in columns followed by the same letter are not statistically significant at the 5% probability level.

  3. c

    Distances in the row are significantly different from each other at the 0.0001 probability level.

Wheat-10.141 bb0.0010.001
Far Eastern0.142 b0.022 b0.001
Westernc0.144 a0.026 a0.016

Iranian wheat landraces PI622243, PI622233, and PI622268, which were near the root of the wheat tree, were sympatric with the A. tauschii sublineage 2E in southern Caspian Iran. The landraces were more closely related to A. tauschii lineage 2 than was the western wheat population (Table 6). The three accessions were also more closely related to Far Eastern wheat than to western wheat.

Triticum aestivum ssp. macha clustered in branch Wheat-5 with two Iranian bread wheat landraces collected northeast of Tehran and with T. aestivum ssp. vavilovii. In branch Wheat-7, Turkish landraces from southeastern Turkey clustered with Turkish club wheat. This branch formed a sister branch to an Iranian spelt branch. Three accessions of Iranian spelt clustered with one Turkish landrace collected near the border between Turkey and Armenia, and Iranian bread wheat landraces collected near Hamadan in western Iran, the same area where the three accessions of Iranian spelt were collected. The remaining two Iranian spelt accessions were in branches Wheat-5 and Wheat-6 showing the heterogeneity of Iranian spelt.

Chromosomal distribution of haplotypes and L1 introgression into wheat

Minor allele frequencies at SNPs across the seven chromosomes were determined in 225 accessions of the L1 lineage, 137 accessions of the L2 lineage, and 75 wheat accessions. Polymorphic genes were abundant in the distal chromosome regions, whereas monomorphic genes were abundant in the proximal chromosome regions (Fig. 4). Contiguous blocks of various lengths of either polymorphic or monomorphic SNP sites were present in each A. tauschii and wheat D-genome chromosome. The AS75 haplotype (representing L1) was fixed in an overwhelming majority of monomorphic SNP sites in lineage 1, and the AL8/78 haplotype (representing L2) was fixed in an overwhelming majority of monomorphic SNP sites in lineage 2. Only in 76 (1.1%) SNP sites in the L2 lineage was an AS75 haplotype fixed and only in eight (0.1%) SNP sites in the L1 lineage was an AL8/78 haplotype fixed. Lineage 2 must be more diverse than lineage 1 because the blocks of monomorphic SNP sites were shorter (i.e. more often interrupted by polymorphic sites) in lineage 2 than in lineage 1.

Figure 4.

Distribution of polymorphic (pink) and monomorphic (blue for L1 and green for L2) single nucleotide polymorphism (SNP) haplotypes along the chromosomes of Aegilops tauschii lineages 1 (L1) and 2 (L2) and wheat chromosomes 1D–7D. Putative introgressed haplotypes from lineage 1 into wheat are marked by black rectangles. Centromeres are indicated by red rectangles.

In wheat, a vast majority of blocks of monomorphic SNP sites had an L2 nucleotide, confirming that all seven wheat D-genome chromosomes were contributed by A. tauschii lineage 2. According to criteria described in the 'Materials and Methods' section, 3.4% of the 7185 SNP sites in wheat could have been introgressed from lineage L1. Most of these were single L1 haplotypes inserted in L2 haplotype blocks. Only 0.8% of these sites were in contiguous blocks containing two or more L1 haplotypes (Table 7). Wheat chromosomes 1D and 2D contained the largest portion of putative L1 haplotypes (Table 7).

Table 7. Numbers and percentages of the wheat D-genome haplotypes putatively contributed by the Aegilops tauschii L1 lineage
ChromosomeNo. SNPsaSingle L1 haplotypes%Two or more contiguous L1 haplotypes%
  1. SNP, single nucleotide polymorphism.

  2. a

    M-C. Luo et al. (unpublished).


The presence of introgressed chromosome segments from lineage 1 in wheat was also assessed by constructing phylogenetic trees using only SNP markers located on a single chromosome and observing the position of the wheat branch in the resulting tree (Fig. 5). In all trees except for that of chromosome 5D, the wheat branch showed a location similar to that in the global tree, indicating the presence of lineage 1 haplotypes on six wheat D-genome chromosomes.

Figure 5.

Trees constructed using single nucleotide polymorphisms (SNPs) for individual D-genome chromosomes. Note that the wheat branch (brown) has an intermediate location between L1 and L2 branches in all trees except for the 5D-tree, indicating that introgression from lineage 1 is undetectable in that chromosome. Two intermediate Aegilops tauschii accessions apparent in the global tree (Fig. 3) are within the 2E branch in the 3D and 6D trees. Introgression responsible for their intermediate locations is undetectable in chromosomes 3D and 6D. In chromosomes 1D, 4D, and 7D, the branches are at identical locations in the trees, reflecting the identical genotypes in three chromosomes and showing that the two accessions had a common origin, although they were collected at different locations.

Single-chromosome phylogenetic trees (Fig. 6) also provided information about A. tauschii accessions PI603254 and KU2157 located at an intermediate position between lineages 1 and 2 in the global phylogenetic tree. In the 3D and 6D trees, the accessions were within the lineage 2 branch. Haplotype blocks in the two A. tauschii accessions showed similar distribution along chromosomes, suggesting that they had a common origin. The accessions were collected in Mazandaran at Ramsar and Shahabad. Their geographic proximity and genetic relationships indicated that both originated via hybridization of a single 2E sublineage plant with a plant of lineage 1.

Figure 6.

The distribution of minor allele frequencies (MAFs) across the physical maps of the wheat D-genome chromosomes. Each column represents MAF at a single nucleotide polymorphism (SNP) site. The absence of a column indicates a monomorphic site. The distribution of polymorphism along the wheat D-genome chromosomes is similar to that reported on the basis of wheat D-genome haplotype sequencing (Akhunov et al., 2010), including greater overall diversity in chromosomes 1D and 2D than in the remaining five chromosomes.

Diversity distribution along wheat chromosomes

To assess the distribution of diversity across wheat chromosomes, the MAFs were plotted along the physical map of each chromosome. MAFs were the highest in wheat chromosomes 1D and 2D and in the distal regions in all seven wheat D-genome chromosomes (Fig. 6). MAFs in the wheat D-genome chromosomes and in the A. tauschii chromosomes correlated with recombination rates along A. tauschii chromosomes (Table 8).

Table 8. Correlation of minor allele frequency (MAF) averaged across the single nucleotide polymorphism (SNP) loci in N nonoverlapping 30 Mb intervals with recombination ratesa in those intervals
ChromosomeWheatLineage 1Lineage 2
r N P r N P r N P
  1. a

    Recombination rates were estimated by M-C. Luo et al. (unpublished).

1D0.731170.00090.86117< 0.00010.379170.1333
2D0.91722< 0.00010.85122< 0.00010.96322< 0.0001
3D0.82921< 0.00010.90321< 0.00010.81521< 0.0001
5D0.652190.00250.8519< 0.00010.86819< 0.0001
6D0.717150.00260.87715< 0.00010.87215< 0.0001
7D0.84121< 0.00010.91821< 0.00010.80821< 0.0001
Overall0.732132< 0.00010.849132< 0.00010.749132< 0.0001


Ascertainment bias

The use of an oligonucleotide SNP array in a genetic diversity study always raises a concern as to whether or not the SNPs used for the construction of an SNP array are representative of diversity present in the studied population (Brumfield et al., 2003; Albrechtsen et al., 2010). A 6K Infinium oligonucleotide SNP array based on SNPs from a single heterozygous walnut tree generated biased estimates of diversity magnitude, but the estimates of genetic distances were correlated with the coefficients of parentage among walnut varieties, suggesting that genetic distances were less sensitive to a bias than estimates of diversity in that specific case (You et al., 2012).

To assess the extent to which our data were affected by the choice of SNPs, we compared genetic distances based on the A. tauschii 10K Infinium SNP array with genetic distances generated by haplotype sequencing of the same A. tauschii and wheat accessions. The genetic distances were correlated, although it is possible that the correlations reflected more the global relationships among lineages than the topology within lineages. We therefore compared our phylogenetic tree with trees generated with other marker systems. Great separation of lineages 1 and 2 from each other, the paucity of intermediate genotypes between them, and the relative lengths of branches in our tree agreed with trees constructed with amplified fragment length polymorphism (AFLP) markers (Mizuno et al., 2010), Diversity Arrays Technology (DarT) markers (Sohail et al., 2012), restriction fragment length polymorphism (RFLP) markers (Dvorak et al., 2012), and by haplotype sequencing (Dvorak et al., 2012).

Particularly informative was a comparison of our tree with the AFLP tree reported by Mizuno et al. (2010). As in our tree, branches in lineage 2 were longer than those in lineage 1, indicating that lineage 2 was more diverse than lineage 1. In our tree, sublineages 1E and 2W were more differentiated with respect to each other than sublineages 1W and 2E. On the basis of geographic locations of accessions, we inferred that sublineage 1W corresponded to Mizuno et al. (2010) sublineages 1-3 and 1-5, sublineage 1E corresponded to sublineages 1-1, 1-2, and 1-4, sublineage 2W corresponded to sublineages 2-1 and 2-2, and sublineage 2E corresponded to sublineage 2-3. Sublineages 1-1 plus 1-4 and 2-1 plus 2-2 were the extremes of the Mizuno et al. tree, whereas sublineages 1-3 and 2-3 were internal and faced each other. The same pattern was observed in our tree.

We also compared the patterns of SNP across the wheat D genome and along individual D-genome chromosomes based on the A. tauschii 10K Infinium SNP array and haplotype sequencing (Akhunov et al., 2010). The two patterns were similar.

Agreements between data generated with the 10K Infinium array and those generated by other marker systems suggested that the choice of SNP markers did not generate false phylogenetic relationships, particularly not between A. tauschii and wheat, the primary objective of this study. Although SNPs used for the construction of the 10K Infinium array were a nonrandom sample of SNPs within A. tauschii, they were probably a random sample of SNPs within wheat and between A. tauschii and wheat. The extent to which the choice of SNPs distorted genetic distances is not known.

A. tauschii subdivision

A notable feature of the A. tauschii phylogenetic tree was the paucity of accessions with intermediate locations. We found two such accessions, which were of a common origin, among 402 A. tauschii accessions. A similar paucity of accessions intermediate between lineages 1 and 2 (accounting for c. 1.4% of all accessions studied) was reported in other studies (Table 9). The great differentiation between lineages 1 and 2 is also indicated by fixation of L1 and L2 haplotypes in Fig. 4. These data suggest that the two lineages have been reproductively isolated in nature.

Table 9. Aegilops tauschii accessions with intermediate locations between lineages 1 and 2 in phylogenic trees
MarkersNumberNo. of accessionsaNo. with intermediate locationaReferences
  1. SSR, simple sequence repeat; DArT, Diversity Arrays Technology; IRAP, inter-retrotransposon amplified polymorphism; AFLP, amplified fragment length polymorphism; RFLP, restriction fragment length polymorphism; SNP, single nucleotide polymorphism

  2. a

    Some of the A. tauschii accessions were shared among different reports and the dataset could not be reduced to an unique set of accessions. The same was true for accessions with intermediate locations (we arrived at 10 unique accessions with intermediate location among the 16). We therefore used the redundant estimates, 1127 and 16, to compute the percentage (1.4%) of accessions with intermediate locations.

SSR17620Takumi et al. (2008)
Gene sequence12190Dvorak et al. (2012)
DArT4449813Sohail et al. (2012)
IRAP8570Saeidi et al. (2008)
SSR28190Tahernezhad et al. (2010)
AFLP 480Saeidi et al. (2006)
RFLP 200Ward et al. (1998)
AFLP 320Amirian et al. (2007)
SSR181134Pestsova et al. (2000)
AFLP 1123Mizuno et al. (2010)
RFLP 1724Dvorak et al. (2012)
SNP71854022This study
Total 112716 

The length of lineage 2 branches in the A. tauschii phylogenetic tree reported here and by Mizuno et al. (2010) showed that lineage 2 is more diverse than lineage 1. The same conclusion can be drawn from the greater length of blocks of fixed L1 haplotypes in the L1 lineage than those of fixed L2 haplotypes in the L2 lineage. A paradox is that the less diverse lineage 1 is more widely distributed than the more diverse lineage 2.

Sublineages within each lineage appear to be geographically isolated. Within lineage 1, sublineage 1W is located in Turkey, Transcaucasia, and western Iran, whereas sublineage 1E is located from central Iran to China. In China, A. tauschii in Xingjiang is affiliated with Afghan A. tauschii, but A. tauschii in central China is somewhat differentiated from the rest of the 1E sublineage, as reported previously (Mizuno et al., 2010). Sublineages 1W and 1E are predominantly found at high elevations (400–3000 m above sea level (asl)). Within lineage 2, sublineages occupy geographically different areas and different elevations. Sublineage 2W occupies elevations between 400 and 1500 m asl in Transcaucasia (Armenia and Azerbaijan), whereas sublineage 2E occupies elevations ≤ 25 m and is distributed across Azerbaijan and Caspian Iran.

The 2E sublineage is morphologically heterogeneous. It includes both the typical moniliform A. tauschii ssp. strangulata in eastern Caspian Iran and morphologically intermediate forms classified as A. tauschii ssp. tauschii var. meyeri and var. typica in southern and southwestern Caspian.

The poor agreement between morphological and genetic relationships among A. tauschii accessions shown here and in previous studies (Lubbers et al., 1991; Dvorak et al., 1998c; Mizuno et al., 2010; Sohail et al., 2012) can be reconciled by two mutually exclusive hypotheses: the A. tauschii morphological traits and the classification based on them are trustworthy, but the taxa are genetically heterogeneous as a result of gene flow between them; A. tauschii is genetically clearly subdivided but the subdivision is not faithfully reflected by morphology and taxonomic classification. Clear genetic separation of lineages 1 and 2 and the paucity of intermediate genotypes lead us to favor the latter alternative. As will be seen later, accepting this alternative is critical for correctly interpreting wheat evolution.

Wheat origin and its subsequent evolution

Previous genetic studies placed the origin of wheat in Transcaucasia and southwestern Caspian Iran (Tsunewaki, 1966; Nakai, 1979; Jaaska, 1980; Dvorak et al., 1998c) or southeastern Caspian Iran (Nishikawa et al., 1980). The consensus has been that A. tauschii ssp. strangulata was the wheat progenitor.

We identified 12 A. tauschii accessions, each closely related to a wheat D-genome chromosome. All 12 accessions belonged to sublineage 2E and were members of a population located in southwestern and southern Caspian Iran. In a startling departure from the belief that the source of the wheat D genome was A. tauschii ssp. strangulata, only one of these 12 accessions had been classified as A. tauschii ssp. strangulata by its collector. Eleven of the 12 accessions were classified as A. tauschii ssp. tauschii var. typica or A. tauschii ssp. tauschii var. meyeri. However, if we accept that morphology does not reflect genetic relationships, as in the second hypothesis, this conflict with the previous conclusions about the progenitor of the wheat D genome becomes irrelevant, as these accessions are genetically members of the 2E sublineage.

This study is not the first one indicating that the wheat D genome is most closely related to accessions morphologically classified as A. tauschii ssp. tauschii. In both RFLP trees reported by Dvorak et al. (1998c), the wheat branch emanates from a branch consisting of accessions collected in southwestern Caspian Iran and classified as A. tauschii ssp. tauschii (labeled as T in the Dvorak et al., 1998c study). The proximity of that branch to the branch consisting of A. tauschii ssp. strangulata in Transcaucasia led Dvorak et al. (1998c) to conservatively place the origin of wheat into a broad area ranging from Armenia to southwestern Caspian Iran.

Of the 7185 SNP sites in the wheat D genome, 3.4% appeared to originate by introgression from the L1 lineage. Most of these were single L1 haplotypes embedded within blocks of L2 haplotypes, which makes them poor candidates for introgression. More reliable evidence for L1 introgression is offered by blocks of contiguous L1 haplotypes inserted within blocks of L2 haplotypes. These blocks accounted for a mere 0.8% of the SNP sites, indicating that c. 99% of the D genome was contributed by A. tauschii lineage 2.

Given the extensive opportunity for hybridization between wheat and A. tauschii (Kihara et al., 1965), why does hexaploid wheat appear monophyletic and why the preference for sublineage 2E? An answer to this question may reside in the geography of cultivation of tetraploid wheat by early farmers. Aegilops tauschii readily hybridizes with tetraploid wheat, and triploid hybrids often produce so many unreduced gametes that they are fertile (Zhang et al., 2010). By contrast, hybridization of A. tauschii with hexaploid wheat is arduous and hybrids can be obtained only with the aid of embryo rescue. Introgression from A. tauschii into hexaploid wheat should therefore be expected only in the areas where tetraploid wheat was farmed in mixed populations with hexaploid wheat (Dvorak et al., 1998c).

Farming of tetraploid durum wheat is today limited to a few mountainous regions in northern Iran (Matsuoka et al., 2008), but the situation could have been different in the past. If wheat farming took hold predominantly in low elevations in Caspian Iran, and if the distribution of A. tauschii was similar to its present-day distribution, the only possible sources of the D genome could have been sublineage 2E, as only sublineage 2E is found at low elevations.

Hexaploid wheat has greater tolerance of frost and other environmental extremes than tetraploid wheat, and cultivation of hexaploid wheat became far more widespread than that of tetraploid wheat (Dubcovsky & Dvorak, 2007). Because farming of tetraploid wheat has been very limited in the Far East, such as China, introgression from A. tauschii did not take place in the Far East while it continued in the west. The absence of introgression in the Far East subdivided Asian hexaploid wheat into two populations, western and Far Eastern (Dvorak et al., 2006; Balfourier et al., 2007). Because of the importance of tetraploid wheat in gene flow from A. tauschii to hexaploid wheat, and because of the paucity of tetraploid wheat in the eastern area of wheat distribution, Far Eastern hexaploid wheat records more faithfully the original hexaploid wheat than the west Asian hexaploid wheat.

Haplotypes at loci controlling the free threshing habit in wheat revealed that European and most Asian accessions of spelt were derived from hybridization of free-threshing hexaploid wheat with emmer (Dvorak et al., 2012), instead of spelt being ancestral to free-threshing wheat, as originally suggested (McFadden & Sears, 1946). The sole exception was Iranian spelt, which had some of the genetic attributes expected in the ancestral hexaploid wheat (Dvorak et al., 2012). Iranian spelt was discovered in the mountainous region of western Iran (Kuckuck, 1959), not in the putative area of hexaploid wheat origin. The D genome of most accessions of Iranian spelt is closely related to the D genome of sympatric common wheat. These facts and the heterogeneity of Iranian spelt (Kuckuck, 1964) indicate that Iranian spelt has been introgressed with bread wheat, which obscures its position in wheat evolution.

Origins of wheat genetic diversity and its distribution along chromosomes

The wheat D genome appeared anomalous among the three wheat genomes by showing great fluctuation in diversity among chromosomes (Akhunov et al., 2010). This was confirmed here. Additionally, it has been shown here that diversity was unevenly distributed along all wheat D-genome chromosomes and correlated with recombination rates along the chromosomes. With the sole exception of chromosome 1D in A. tauschii lineage 2, similar correlations were also observed along each chromosome in both A. tauschii lineages.

All SNPs used for the construction of the 10K Infinium SNP array were discovered in A. tauschii. Each polymorphism found with the 10K Infinium SNP array in the wheat D genome must therefore have already existed in A. tauschii. Of 7185 SNPs evaluated for polymorphism in wheat, 538 (7.5%) were polymorphic and those wheat polymorphisms must have been introgressed into wheat from A. tauschii. The actual polymorphism resulting from introgression must be higher because a portion of SNPs in A. tauschii, particularly those within sublineage 2E, were not captured by the 10K Infinium array. Previous reports of few shared RFLP and SNP polymorphisms by A. tauschii and wheat (Dvorak et al., 1998b,c; Talbert et al., 1998; Caldwell et al., 2004) captured only the proverbial ‘tip of the iceberg’ of the magnitude of polymorphism introgression from A. tauschii into wheat.

In Drosophila, genetic diversity is known to be uneven along and among chromosomes and to be affected by recombination rates and other factors (Begun & Aquadro, 1992; Begun et al., 2007). In A. tauschii, RFLP was shown to correlate with recombination rates (Dvorak et al., 1998a). The same strong correlation with recombination rate was observed here for SNPs. Since the 538 polymorphisms observed here were introgressed into wheat from A. tauschii, the basic pattern of distribution of polymorphism along wheat chromosomes must have originated in A. tauschii and must have been introgressed into wheat from it.


We thank Patrick E. McGuire (Department of Plant Sciences, University of California, Davis, USA) for valuable discussions and suggestions for improvements of the manuscript. We thank also Assad Siham (ICARDA, Syria), Jon W. Raupp (Kansas State University, USA), Shuhei Nasuda (Komugi, Japan) and Harold Bockelman (USDA, USA) for plant materials. This work was supported in part by the US National Science Foundation (grant numbers IOS 0701916), the National Natural Science Foundation of China (grants 31230053 and 31171555), and the National Basic Research Program of China (grants 2011CB100100 and 2009CB118300). J.R.W. thanks the program for New Century Excellent Talents in University, China, for financial support.