Bread wheat derives from a grass ancestor structured in seven protochromosomes followed by a paleotetraploidization to reach a 12 chromosomes intermediate and a neohexaploidization (involving subgenomes A, B and D) event that finally shaped the 21 modern chromosomes. Insights into wheat syntenome in sequencing conserved orthologous set (COS) genes unravelled differences in genomic structure (such as gene conservation and diversity) and genetical landscape (such as recombination pattern) between ancestral as well as recent duplicated blocks. Contrasted evolutionary plasticity is observed where the B subgenome appears more sensitive (i.e. plastic) in contrast to A as dominant (i.e. stable) in response to the neotetraploidization and D subgenome as supra-dominant (i.e. pivotal) in response to the neohexaploidization event. Finally, the wheat syntenome, delivered through a public web interface PlantSyntenyViewer at http://urgi.versailles.inra.fr/synteny-wheat, can be considered as a guide for accelerated dissection of major agronomical traits in wheat.
Recent comparative genomics studies based on monocot genome sequences, including Panicoideae (sorghum, Paterson et al., 2009; maize, Schnable et al., 2009), Ehrhartoideae (rice, IRGSP, 2005), and Pooideae (Brachypodium, IBI, 2010), suggest that grasses derive from n = 7 (alternative scenario with n = 5) to 12 ancestral karyotypes (named AGK for Ancestral Grass Karyotypes). Modern grass genomes were then shaped from this AGK through a shared whole-genome duplication (WGD) followed by ancestral chromosome fusion (CF) events (for review Salse, 2012). Polyploidization has been shown to be followed by genome-wide diploidization (also referenced as partitioning) through differential elimination of duplicated gene redundancy at the whole-genome level (Wang et al., 2005; Freeling et al., 2012; Schnable et al., 2012a,b), leading to dominant (i.e. D, stable genomic compartment associated with low duplicated gene loss and reduced gene diversity) and sensitive (i.e. S, plastic genomic compartment associated with high duplicated gene loss and increased gene diversity) subgenomes in modern grass species (Woodhouse et al., 2010; Abrouk et al., 2012; Schnable et al., 2012a,b).
Bread wheat is a good plant model to study the impact of distinct rounds of WGD on the subgenome dominance phenomenon, as its genome comprises: (i) seven ancestral paleoduplicated blocks corresponding to the shared paleotetraploidization event identified in all known cereal genomes and dating back to 65 million years ago (hereafter, mya); as well as (ii) two recent neopolyploidization events leading to Triticum aestivum, which originated from two hybridizations between T. urartu (A genome) and an Aegilops. Speltoides related species (B genome) 1.5 mya, forming T. turgidum ssp. dicoccoides; and between T. turgidum ssp. durum (genomes A–B) and A. tauschii (D genome) 10 000 years ago (Feldman et al., 1995). Wheat diploidization (between A, B and D subgenomes) has been partially investigated at the whole-genome and gene levels where recent transcriptome analysis in hexaploid wheat suggested that 39 up to 46% of the wheat homoeologous genes may have either been lost or neo-/subfunctionalized within 1.5 my of evolution (Pont et al., 2011). Such structural or functional diploidization brings unbalanced polyploidy gene systems into diploid-like mode of expression (Edger and Pires, 2009).
The recent access to a large bread wheat genomic resources offers the opportunity to study in the same analysis not only the structural plasticity of paleoduplicated genes (during the last 50–70 million years of evolution) but also neoduplicated genes (during 0.1–1.5 million years of evolution) by comparing the conservation of A, B and D homoeologous gene copies, i.e. wheat homoeoalleles. Wheat genomics resources have been recently published with the release of the wheat genome shotgun sequences in hexaploid (Brenchley et al., 2012) and diploids (D genome ancestor sequence in Jia et al., 2013 and Luo et al., 2013 as well as the A genome progenitor sequence in Ling et al., 2013). However, these studies reported incomplete gene repertoire (i.e. whole-genome short reads assembled into gene models instead of whole-chromosome pseudomolecules) either not ordered or partially ordered based on synteny relationships, mainly with Brachypodium. Independently, genome-wide diversity maps have been also made recently available in hexaploids (Chao et al., 2009; Allen et al., 2011, 2013; Lai et al., 2012; Winfield et al., 2012; Cavanagh et al., 2013), tetraploids (Saintenac et al., 2011; Trebbi et al., 2011; Ren et al., 2013) or diploid progenitors (You et al., 2011; Wang et al., 2013). However, most of these studies relied on transcriptome (i.e. exome) sequencing from different genotypes and then are dependent on both the tissues used and the expression of the three homoeologs on the considered tested tissues and developmental conditions, then potentially leading to possible bias in homoeoallele diversity estimation due to gene silencing in the tested biological conditions.
In contrast, the evolution of the homoeologous gene space in hexaploid wheat can be investigated through comparative genomics approaches trying to introduce no (or reduced) bias in gene loss and sequence polymorphism assessment (Thomas et al., 2006; Bekaert et al., 2011). Up to now very few genes are physically and genetically mapped in hexaploid bread wheat. The integration of both genomic and genetic resources, described in the previous section, offers the opportunity to provide the most accurate wheat syntenic (or also referenced as computed, Pont et al., 2011) gene order (i.e. defined as syntenome) and test its accuracy as this will probably represent in a long-term the wheat reference genome, until complete pseudomolecules will be publicly released for the 21 chromosomes. Such validated wheat syntenome will also offer the opportunity to perform a comprehensive analysis of the wheat gene space evolutionary plasticity during the last 100 million years and to investigate the impact of paleo- and neopolyploidization events on genome rearrangement and gene diversity. Despite insights into wheat subgenome evolution, the wheat syntenome can also be consider as an applied tool for the dissection of major traits in wheat.
Wheat syntenome defines paleo/neodominant and sensitive blocks
Grasses has been proposed to derive from a n = 7 ancestor that has been duplicated to reach a n = 14 intermediate followed by two chromosomal fusions to reach a n = 12 ancestor (Figure 1a centre of the circle and inner circle A1–A12) founder of all the modern grasses (Salse, 2012). Comparing grass genome sequences, we identified a 17 317 conserved orthologous set (COS; also referenced as protogenes) located on 12 syntenic blocks (i.e. conserved ancestral regions; CARs) covering the rice, maize, sorghum, Brachypodium genomes (Figure 1a inner cereal circles and Table 1) and refining the previously reported conserved gene repertoire in grasses of 9731 protogenes (Murat et al., 2010). The 17 317 COS has been used as a matrix to produce a wheat syntenome where on average 2474 COS have been ordered on the seven wheat chromosome groups in respect to the position of their orthologous counterparts following the ordering priority of rice > Brachypodium > sorghum (cf wheat consensus circle on Figure 1a and Table 1). We made this wheat syntenome available through a public web interface named PlantSyntenyViewer at http://urgi.versailles.inra.fr/synteny-wheat (illustrated in Figure 1b and with raw data available in Table S1). The wheat syntenome offered the opportunity to unravel the evolutionary fate of such COS in wheat in characterizing retained vs. lost duplicates in response to both ancestral shared (~65 mya) and recent specific (<1.5 mya) polyploidization events. To do this, we performed two complementary strategies, bypassing the requirement of a complete reference genome sequence, consisting aligning the public wheat gene repertoire to the previous COS-based wheat syntenome (strategy #1 in Figure 1a referenced as ‘Wheat→COS’) as well as COS de novo sequencing in wheat (strategy #2 in Figure 1a referenced as ‘COS→Wheat’), Figure S1.
Table 1. Wheat syntenome and derived COS-SNP makers
The first strategy (#1, black arrow Figure 1a) in investigating the retention of ancestral genes in bread wheat consisted in comparing the recently published wheat genome shotgun sequence (Brenchley et al., 2012) to the previous wheat syntenome. When comparing the published wheat genome consisting in ~95K gene models with 18 483 of them assigned to the three subgenomes (consisting in 48 120 homoeologous-specific genes distributed 15 996 on A, 15 244 on B and 16 880 on D) to the 17 317 ordered protogenes in wheat (mapped on chromosome groups 1–7), 6423 (35%) ancestral grass genes appear retained in wheat (Table S1 and Figure 1a). This reduced rate of gene conservation is consistent with the 17 093 genes in A. tauschii with 26.1% associated with orthologs in grasses (Luo et al., 2013) as well as 42% (14 578 genes) of the 34 879 gene models from T. urartu reported as conserved with Brachypodium (Ling et al., 2013). Such limited level of single gene orthologous relationships between the wheat gene repertoire and the ancestral retained grass genes in the other species, is largely due to wheat-specific gene content amplification though intra- and inter-chromosomal duplications (Figure S2). The distribution of the retained ancestral genes on the wheat subgenomes (4905, 158, 347, 652, 62, 187, 112 respectively located on A–B–D, A–B, B–D, A–D, B, D and A genomes specifically) showed an increased gene density at the subtelomeric regions compared to the centromeres (compare with BIN-based gene density distribution in green on the Figure 1a). Homoeolog gene density appeared correlated to the recombination rate, as previously reported in the diploid A. tauschii (Jia et al., 2013; Luo et al., 2013) and T. urartu (Ling et al., 2013) genome sequences. The density of non-colinear genes per bin was correlated with recombination rate, with non-collinear genes located in distal compartments of the chromosomes, then suggesting that non-collinear genes may have survived in distal high recombination regions compared to proximal low-recombination regions.
The wheat syntenome illustrates that the seven Triticeae chromosome groups originated from a n = 12 ancestral species by five nested chromosome fusions (NCFs), involving chromosomes 1 (A5 + A10), 2 (A4 + A7), 4 (A3 + A11), 5 (A12 + A9 + A3), 7 (A6 + A8), Figure 1(a). The previous chromosome-to-chromosome relationships, established at the BIN-resolution level between wheat and the sequenced grass genomes, allowed us to transfer the ancestral dominant (A1–2–3–4–9–11, associated with higher retention of ancestral genes) and sensitive (A5–6–7–8–10–12, associated with higher loss of ancestral genes) chromosomal blocks reported in grasses (Figure S3) into the 21 bread wheat chromosomes (Figure 1a with D and S blocks). Among the 157 BINs available on the 21 bread wheat chromosomes, 79 and 69 were respectively identified as dominant and sensitive according to the known nature of their orthologous counterparts in the sequenced grass species (Table S2). The Triticeae genomes, that derived from the n = 12 ancestor (centre on the circle on Figure 1a) through five ancestral chromosome fusions, can then be re-organized into paleo-D and paleo-S (i.e. paralogous dominant (D) and Sensitive (S) blocks deriving from the ancestral WGD) chromosomal compartments where w1(S) = A10(S) + A5(S), w2(D) = A7(S) + A4(D), w3(D) = A1(D), w4(D) = A11(D) + A3(D), w5(D) = A9(D) + A12(S), w6(D) = A2(D), w7(S) = A8(S) + A6(S).
Such high resolution BIN-based wheat syntenome offered the opportunity to investigate the retention of the ancestral genes in wheat following the paleotetraploidization event (by comparing D and S wheat blocks defined previously) as well as the neohexaploidization event (by comparing the A, B and D subgenomes), as illustrated in Figure 2(a) (top) with D chromosomes in blue and S chromosomes in red. Regarding the number of retained genes in D vs. S blocks, we observed the expected higher retention (P < 5%) of protogenes in paleodominant blocks compared to paleosensitive counterparts (Figure 2b, left). This result confirm the existence of shared D and S blocks in wheat following the paleoteraploidization (~65 mya) as reported in rice, sorghum, Brachypodium and maize (Abrouk et al., 2012). Surprisingly, when comparing the retention of protogenes in the A, B and D subgenomes, we also observed a bias retention (P < 5%, Figure 2b right) of ancestral genes, with the B subgenome showing the property of a so-called sensitive subgenome with higher loss of protogenes. This observation raised the hypothesis that the reported genome partitioning phenomenon following the ancestral shared paleotetraploidization event (bias retention of protogenes between paleo-D and S blocks confirmed in wheat as illustrated in Figure 2b left) may also acts between the A, B and D subgenomes deriving form the neohexaploidization, with A and D as dominant (i.e. stable) and B as sensitive (i.e. plastic), Figure 2(b) right. This confirms and largely complements recent evidences of differential gene loss observed between the subgenomes of the wheat chromosome group 7 with increased gene numbers reported respectively on D > A > B (Berkman et al., 2013).
However, the previous conclusions regarding the wheat genome partitioning in response to both paleo- and neopolyploidization events were obtained in exploiting the public wheat genome shotgun sequence (Brenchley et al., 2012) that consists only in a partial gene repertoire. We cannot exclude that missing (non-sequenced) genes/families/homoeologs in such resource may have impacted our conclusion regarding the observed structural plasticity of paleo- and neoduplicated compartments in modern hexaploid bread wheat. Consequently, in order to validate and complement the previous conclusions regarding the wheat subgenome dominance, we performed a second strategy (#2, black arrow Figure 1a and Figure S1) that consisted in de novo sequencing and mapping of COS genes in wheat deriving from the 17 317 protogenes but not included in the previous 6423 wheat/grass ortholog repertoire (Figure S4). To do so, we applied a COS-finder tool (see Method, Quraishi et al., 2009) to the 10 894 (17 317 total COS excluding the 6423 COS identified from Brenchley et al., 2012) ancestral grass genes not associated with a public wheat gene model sequence. We have then been able to develop a tremendous catalog of 6033 primer pair set (primers pairs selection criteria in Method section) defining 5234 COS relationships (Table S1). Such COS primers, selected on highly conserved exonic regions, offer all the guarantee to be useful in sequencing the wheat orthologs (and associated homoeologs), based on the observed transferability of such COS markers among monocot species (i.e. sorghum, maize, oat, rice, Triticale, wheat, Brachypodium and rye) at both the amplification (Figure S5a on agarose gels and Figure S5b on capillary sequencer) and sequence levels (Figure S5c with melting curve profiles supported by sequence-based haplotyping data). Overall, the characterized COS markers (17 317 COS with 5234 selected COS for sequencing and 6033 associated primer pairs) as well as the COS-finder software are made available at http://urgi.versailles.inra.fr/synteny-wheat (Figure 1b and Table S1).
In order to validate the orthologous, homoeologous and paralogous gene contents and retention in wheat paleohistory, raising the evolutionary model suggesting subgenome dominance between D>A>B, we performed the sequencing of the selected 5234 COS ('Genome partitioning shaped wheat genetic landscape' and detailed strategy in Figure S6). COS markers have been sequenced in Chinese Spring using the 454 (Roche) technology, delivering 23 463 reference transcripts (RTs) covering 4582 COS (with an average of three RTs per COS, putatively homoeoalleles, Figure S6). This result established that 88% of the COS (i.e. genes conserved in rice, Brachypodium, maize and sorghum) are conserved in wheat, in contrast to 35% of the 18 483 gene models (from Brenchley et al., 2012) that have been associated with a COS gene in the previous section. This result reinforces the previous conclusion (Figure S2) that wheat has been enriched, in the course of evolution, by species and/or lineage-specific genes. The capture (Agilent SureSelect Target Enrichment) of the 23 463 RTs in eight bread wheat genotypes (Chinese Spring, Renan, Courtot, Alcedo, Brigadier, Genial, Hustler and Nicam) and short read sequencing (Illumina HiSeq 2000) delivered 9969 high quality SNP covering 2197 COS markers with a density of 1.9 SNP/500 bp (Table S1 and Figure S7). Such de novo COS sequences in wheat allowed us to confirm the bias retention of protogenes defining dominant as well as sensitive blocks for both paleo- and neopolyploidization events (Figure 2c). The number of sequenced COS in wheat observed in dominant and sensitive compartments confirmed our previous conclusions based on the public gene repertoire, where more retained ancestral genes (or orthologs) are observed in the dominant blocks (Figure 2c left). We also confirmed the bias retention of homoeologs between A, B and D subgenomes (Figures 2c right and S8). Overall, we then proposed, based on a pure in silico experiment (alignment-based synteny analysis) complement by a de novo sequencing strategy (COS characterization), that the ancient subgenome dominance process related to the shared paleotetraploidization in grasses is observed in wheat and has been eroded (P-values from ~3.10e−2 to ~5.10e−2) by the neohexaploidization event that may have led to a modern subgenome dominance where B is sensitive and A–D are dominant (P-values from ~1.10e−3 to ~8.10e−7).
In order to address not only the bias in gene retention (detailed in the previous section) but also in gene diversity between modern and ancient dominant vs. sensitive blocks, we used the overall depth of read coverage across COS as well as the depth of read coverage at variable sites to detect not only sequence variants (i.e. 9969 SNPs, Figure S7) but also putative structural variants such as presence/absence variations (PAVs), copy number variations (CNVs) in contrast to non-structural variation (NSV), in which all genotypes are covered for the considered COS). Figure S9 illustrates three distinct COS markers associated with: (i) sequences for the eight sequenced genotypes [COS-5368, observed in 80% (4286 COS) of the cases]; (ii) missing sequences [COS-542, PAV observed in 6% (304 COS) of the cases]; or (iii) extra copy sequences [COS-3070, CNV observed in 14% (644 COS) of the cases]. We then provide a complete view of the gene diversity distribution in wheat based on the precise characterization of both non-structural (SNPs) as well as structural variants (InDels, CNVs and PAVs). The wheat COS sequences offered the opportunity to test the impact of the reported biased chromosomal plasticity between dominant and sensitive blocks on gene diversity through differential SNP/InDel contents observed in such wheat genomic compartments. Figure S10 illustrates the observed distribution of InDel and SNP polymorphisms in wheat where the more frequently observed (61% of the cases) sequence variations of SNPs correspond to transitions (A/G or T/C) compared with transversions (A/T, A/C, T/G or G/C). The most frequently observed (52% of the cases) sequence variations of InDels correspond to a single nucleotide. The distribution of SNPs on D and S blocks as well as A, B and D subgenomes shows that the gene diversity increases in the sensitive blocks (either paleo-S blocks or B genome, Figure 2d, plain box-plots). Wheat chromosome groups derived five fusions of the 12 ancestral chromosomes then juxtaposing high-gene-density terminal paleoregions near low-gene-density paleocentomeric regions in the post-fusion neo-chromosome (Figure 1a doted connecting black lines), Murat et al. (2010). Consequently, despite bias distribution of sequence variants between dominant and sensitive subgenomes, we observed a higher gene (COS) diversity (i.e. SNP/bp density) in telomeric regions and synteny breakpoints (SBP) regarding the wheat chromosome groups that derived from the centromeric nested insertion of ancestral chromosomes (Figures 1a black stars and S11).
Until the wheat genome is entirely sequenced and available (in delivering complete pseudomolecules) the discovery and application of genome polymorphism for wheat crop improvement and evolutionary investigation remains a real challenge, in which the provided wheat syntenome constitutes a tremendous matrix for these purposes. Based on the previous in silico syntenome and SNP-based diversity map, we addressed differences in gene content and plasticity between dominant and sensitive fragments for both paleo- and neopolyploidization events. As the previous conclusions derived from pure in silico syntenome and COS-SNPs characterization, we validated the delivered computed wheat gene order (17 317 ordered COS) as well as associated sequence variants (9969 SNPs) through COS-SNP mapping (Figure S4). As a validation procedure, we have randomly selected a subset of 1135 (22%) among the 5234 COS markers to test the accuracy of the characterized in silico COS-SNPs as well as computed gene order of such COS. We have identified 807 in silico SNPs from 986 (out of 1135, 87%) COS markers that have been successfully sequenced in genotypes for witch mapping populations are available (see 'Genome partitioning shaped wheat genetic landscape' section). Five hundred and forty high quality scoring (Quality score A to D) and 267 low quality scoring (Quality score classes E and F) in silico SNPs (based on SNP calling criterion described in Figure S7) have been tested. 357 (66%) and 63 (24%) respectively have been validated as polymorphic between bread wheat cultivars Chinese Spring, Courtot, Arche, Recital and Renan. 375 (89%) out of the 420 (357 high quality + 63 low quality scores) validated SNP have been successfully mapped (Illumina approach) on the three available mapping populations (Courtot × Chinese Spring, Arche × Recital, Renan × Recital, see 'Genome partitioning shaped wheat genetic landscape'), illustrated in Figure 1(a) (red bars on the wheat chromosome circles) and available in Table S1. Such observed validation rate (420/807 = 52%) of in silico SNPs is consistent with the one (60%) reported by Allen et al. (2011) as well as in Trick et al. (2012) (56–58% of validation accuracy). Overall, based on the previous genotyping data, we provided in the current analysis 9969 high confidence SNPs (deriving from 5234 sequenced COS markers) for witch 4613 (9969 × 0.52 × 0.89) are expected to be converted efficiently in any functional assays. The remaining non-polymorphic in silico SNPs may be related to: (i) heterogeneity observed in using different genotyping approach (for example five SNPs failed to be mapped using Illumina technology but recovered with KASpar); (ii) in silico intervarietal high quality SNPs that happened to be homoeoSNPs; and (iii) SNPs associated with the same quality score that can either be validated or not through Illumina genotyping approach (Figure S12).
COS-SNPs have been mapped and tested on Illumina, KASpar, SSCP, HRM, LNA detection (see 'Genome partitioning shaped wheat genetic landscape') to test the accuracy of the provided computed wheat gene order based on the observed COS genetic position, Figure 2(b). Overall, taking into account the mapped COS-SNP information over the 21 bread wheat chromosomes, 89% (335) of mapped COS were retained at the conserved (i.e. expected orthologous position) loci whereas 40 are non-syntenic (Table 1 and Figure S13). Whereas 147 and 184 COS-SNPs have been mapped on the A and B genomes, only 44 have been positioned on the D-genome (Table 1 and illustrated as ‘mapped COS rate’ on Figure 2d, dashed box-plots) supporting the higher plasticity of the B (sensitive) compared to the A (dominant) related to long-term paleo-tetraploidization evolutionary event; and between A/B and D subgenomes due to the low level of genetic diversity in the D-genome associated with recent origin of hexaploid wheat. Comparison of the in silico COS-SNP density (Figure 2d, plain box-plots representing the observed ‘SNP/500COS ratio’ from 9969 in silico COS-SNPs) and mapped COS-SNP rate (observed from the 375 mapped COS-SNPs) it appeared that the D genome diversity may has been overestimated in silico through the possible mis-association of SNPs either on the A or D genomes, especially for short sequence reads not harbouring clear and multiple homoeoSNPs (Quality score classes E and F, Figure S7). Interestingly the 11% (40 genes) of COS-SNP markers mapped at non-orthologous positions (that can be considered as transposed genes) are mostly located (60%) on the B genome, reinforcing the structural plasticity nature of such genomic compartment (Table 1). As expected, due to the considered time frame (~65 mya), bias in gene diversity (expressed as SNPs/bp) cannot be observed for the ancestral paleotetraploidization event (Figure 2d left). The same observation is made when investigating the impact of paleo-dominance and paleo-sensitivity on the recombination pattern where no difference can be observed between paleo-D vs. paleo-S blocks (P > 5%) but a higher cM/marker ratio was observed for the B compared with the A (P = 2.4 × 10−2) and the D (P = 1 × 10−3) subgenomes (Figure 2e). The observed reduced cM/marker ratio of D subgenome compared with the A and B counterparts has been attributed to the loss of polymorphism during the genetic bottleneck that accompanied the development of modern elite cultivars. Overall, taking into account the provided COS-SNP markers associated with reference public wheat genetic maps, we then produced the most complete composite bread wheat genetic map (consisting in 7520 molecular markers covering 4318.03 cM with a marker density of one marker every 0.78 cM) of the 21 chromosomes and including mapped RFLP (Restriction fragment length polymorphism) (1687), AFLP (712), STS (262), SSR (2315), DaRTs (1246) and COS-SNP (375) gene markers, named Wheat Composite Genetic Map ‘WCGM2013’ (available as Table S3 and detailed provided in Figure S14).
Genome partitioning in delivering D and S fragments (shown as associated to bias in gene content, diversity and recombination in the current study) following polyploidy has been suggested to be epigenetically driven (Doyle et al., 2008). To test this hypothesis, we investigated C-band (from Hutchinson et al., 1982) differences between paleo- and neodominant and sensitive blocks (Figure 2f). Where differences are observed between ancient D and S compartments P = 2.7 × 10−2, Figure 2f left), the difference is more visible between the neodominant (A–D subgenomes) and neosensitive (B subgenome, P = 1.2 × 10−2 and 2 × 10−3 respectively) compartments, Figure 2(f) right. The transition nature (A/G or T/C) of the reported 9969 SNPs may also indicate the historic methylation profile of each subgenomes as the transition abundance bias is commonly observed for SNPs reflecting the high frequency of C to T mutation following methylation (Coulondre et al., 1978). Such a bias observed in polyploid wheat is greater than that observed in its diploid Triticea relative barley (Duran et al., 2009) and may reflect a higher level of methylation in polyploid wheat genome as part of the diploidization process leading to D (low methylation) and S (high methylation) compartments. Both observations (C-band differences as SNP transition enrichments) lead to the opening question of epigenetic driving the observed paleo- and neodominance in wheat as well as in other grasses. Overall, the clear characterization of ancient and recent D-S compartments, in relation respectively to the paleotetraploidization and the neohexaploidization events, suggest that they may have shape the modern bread wheat genomic and genetic landscapes with bias in gene number (i.e. higher gene retention in dominant blocks), gene diversity (i.e. higher variant per genes in sensitive blocks), recombination (i.e. lower cM/Marker in sensitive blocks) and heterochromatin structure (i.e. higher C-band and possibly methylation in sensitive blocks).
Bread wheat evolutionary model
Based on the previous observations, we propose an evolutionary scenario that has shaped the modern bread wheat genome during the last 65 my of evolution in response to three polyploidization events (Figure 3a). The n = 7 AGK has been paleotraploidized leading to seven pairs of dominant (red) and sensitive (blue) chromosomes. Five ancestral chromosomes fusions took place to build the seven Triticeae chromosomes and wheat ancestors (i.e. T. urartu, A. speltoides and A. tauschii), where T3–T4 are entirely sensitive, T1–T7 are entirely dominant and T2–T5–T6 are structured as a mosaic of D and S blocks. The first neotetraploidization event (1.5 mya) leaded to a subgenome dominance where the A subgenome became dominant and the B subgenome sensitive. The second neohexaploidization event (0.1 mya) leaded to a supra-dominance where the tetraploid became sensitive (subgenomes A and B) and the D subgenome dominant. Thus, the modern bread wheat genome can be divided into five supra-dominant chromosomes (also referenced as pivotal) deriving from surimposed dominances (i.e. chromosomes 3D, 2D-S/L, 4D, 5D-L, 6D-S/L) in contrast to five supra-sensitive ones (i.e. chromosomes 1B, 2B-C, 5B-S, 6B-C, 7B), and the remaining regions are associated with opposite D and S following the successive duplication events (i.e. with S for short arm, L for long arm and C for centromeric region), see Figure 3(a) (with the stability/plasticity scale at the right).
In order to investigate precisely supra-dominant and supra-sensitive chromosomal architecture, we focussed on rice chromosomes 1 (dominant) and 5 (sensitive) associated with differences in gene content, orthologous/ancestral gene retention, chromosome length as well as transposable element (mainly TE class II) repertoire (Figure 3b left). The orthologous counterparts in wheat (group 1 as sensitive and group 3 as dominant) display the same bias in number of orthologs (i.e. COS number higher on the dominant group 3) as well as gene diversity (i.e. SNP/COS density higher on the sensitive group 1). In our scenario, among the six investigated chromosomes, 3D is supra-D in contrast to 1B as supra-S. Consequently the chromosome 1B displays the lowest gene retention, the highest gene diversity, lowest cM/marker ratio, the highest number of transposed genes and highest number of the C-band.
Wheat genome architecture has been shaped by subgenome partitioning following paleo- and neopolyploidization
Common anchors (COS genes) to compare genomes and reconstruct CARs are necessary to unravel plant genome paleohistory (Salse, 2012). It has been shown that CARs duplication in the grass paleohistory has been followed by a structural partitioning in defining post-duplication dominant regions (defined as structurally stable with higher retention of protogenes) in contrast to sensitive paralogous counterparts (defined as structurally plastic with higher loss of protogenes), Abrouk et al., 2012; Schnable et al., 2012a,b. Based on the chromosome-to-chromosome synteny relationships established between the seven bread wheat chromosome groups and the rice, sorghum, Brachypodium and maize genomes, it was then possible to produce a partial wheat gene-based physical map (i.e. syntenome) including 17 317 COS. However, in the absence of large-scale gene mapping data in wheat, how relevant such computed gene order is robust, remains an open question and is of importance if one wants to investigate the evolution of wheat gene content in comparison of the sequenced grasses genomes. Our current study aiming at sequencing and mapping ancestral COS genes in bread wheat established that the synteny-based computed gene order delivered is currently correct for up to 89% of the simulated gene order and that lineage-specific rearrangement in wheat may account for <11% of the ordered genes in silico.
Such wheat syntenome, that deliver in silico synteny-based ordered genes conserved in close relatives, is a perfect matrix to characterized genes that have been conserved or lost in bread wheat evolution in response to paleotetraploidization (~65 mya) and neohexaploidization (>1.5 mya) events. Using complementary strategies consisting in aligning the public wheat genome repertoire (Brenchley et al., 2012) as well by de novo COS gene sequencing in wheat, we identified respectively 35% of wheat genes associated with an ortholog in the other grasses and 88% of COS successfully sequenced in wheat, suggesting that lineage-specific gene amplification and shuffling mechanisms have shaped the wheat genome in its recent evolution. More interestingly, more conserved genes are observed in dominant wheat regions in contrast to sensitive ones arising from the ancestral shared paleotetraploidization event as reported in the other grass species (Abrouk et al., 2012). Moreover, such bias in gene conservation is also observed between the three wheat subgenomes deriving from the neohexaploidization, with more orthologs detected in D > A > B. This subgenome dominance phenomenon may then explain the observed differences in genetic and physical size reported between subgenomes, the B (plastic) as the largest and the D (stable) the smallest (Furuta et al., 1986). Bias retention of ancestral genes has been observed between paleoduplicated (plasticity of S > D) as well between neoduplicated (plasticity of subgenomes B > A > D) genes, suggesting that ancient as well as recent polyploidization events are followed by a diploidization mechanism consisting in the structural partitioning of the paralogous fragments.
Despite the characterization of conserved genes in the wheat subgenomes leading to the identification of paleo- as neodominant and sensitive chromosomal regions, we investigated non-syntenic genes in wheat corresponding: (i) at the genome level to lineage-specific intra- (18%) and inter-chromosomal (82%) duplications; and (ii) at the gene level to CNVs (14%), PAVs (6%) as well as gene transposition (20%). As the COS markers have been selected has single copy conserved genes, it is unlikely that all cases of non-syntenic genes were due to erroneous mapping of duplicates or homologs. That raises the possibility of a small scale (i.e. few genes) transposition/movement mechanism despite large-scale ones (such as translocations and inversions) involving DNA segments carrying several genes, operating in the decay of synteny between grass genomes and then driving non-conserved genes in wheat (i.e. ~50% initially reported in wheat from Pont et al., 2011; and ~60% in the current analysis). More transposed genes have been observed on the B subgenome (sensitive according to the current analysis), that is consistent with the conclusion raised on chromosome group 1 (Wicker et al., 2011), based on partial sequence investigations. Gene movement may appear as a particularly active phenomenon in polyploidy wheat and may act then preferentially on the sensitive compartments.
Revisiting the B genome progenitor enigma based on contrasted plasticity following polyploidization
Several wheat phylogeny studies have tried to identify the progenitor of the B genome of polyploid wheat based on cytology (Zohary and Feldman, 1962), nuclear and mitochondrial DNA sequences (Dvorak et al., 1989; Dvorak and Zhang, 1990; Terachi et al., 1990) as well as chromosome rearrangement studies such as common translocation events (Feldman, 1966a,b; Hutchinson et al., 1982; Gill and Chen, 1987; Naranjo et al., 1987; Naranjo, 1990; Jiang and Gill, 1994; Devos et al., 1995; Maestra and Naranjo, 1999). More recent and representative molecular comparisons, at the whole-genome level using germplasm collections, have shown that the B genome could be related to several A. speltoides lines but not to other species of the Sitopsis section (Salina et al., 2006; Kilian et al., 2007). At the SPA locus level (Salse et al., 2008), close relationships between the A. speltoides and the hexaploid B subgenome has been reported based on both coding and non-coding sequence comparisons, but with a lower conservation compared to the A subgenome and its T. urartu progenitor at the PSR920 region (Dvorak and Akhunov, 2005; Dvorak et al., 2006). The greater genetic diversity of the B genome compared to the A genome of hexaploid wheat was also reported based of SNPs (previous references), SSRs (Roder et al., 1998a,b), RFLPs (Liu and Tsunewaki, 1991), SNPs (Akhunov et al., 2010), as well as in tetraploid (Thuillet et al., 2005; Ren et al., 2013). Such contrasted conservation between B subgenome and A. speltoides compared with the A genome and its T. urartu progenitor is classically explained with two hypothesis where: (i) the progenitor of the B genome is a unique Aegilops species that remains unknown (i.e. monophyletic origin and ancestor closely related to A. speltoides from the Sitopsis section); or (ii) this genome resulted from an introgression of several parental Aegilops species (i.e. polyphyletic origin) that need to be identified from the Sitopsis section. Overall the B genome is therefore with greater divergence compared to the A and D subgenomes relative to their putative diploid progenitors, with the D subgenome as the most stable in the course of evolution. The contrasted genomic (gene loss and diversity) and genetic (recombination and Linkage Disequilibrium) plasticity reported for the B > A > D (from the most plastic to more stable subgenomes) have been tentatively explained so far through hypotheses relying on diploid progenitors differences. However, these hypotheses rely on the assumption of a constant and similar evolutionary rate between subgenomes.
In contrast, we propose an alternative scenario where such particular conservation of the B subgenome with A. speltoides at the sequence level is the consequence of a differential evolutionary plasticity of the B subgenome compared with the other A and D subgenomes in response to polyploidization events. We then propose an evolutionary scenario where the modern bread wheat genome has been shaped through a first neotetraploidization event (1.5 mya) leading to a subgenome dominance where the A subgenome was dominant and the B subgenome sensitive. The second neohexaploidization event (0.1 mya) leaded to a supra-dominance where the tetraploid became sensitive (subgenomes A and B) and the D subgenome dominant (i.e. pivotal). Following our scenario, wheat subgenome architecture has then been polyploidy-driven (i.e. pure post-polyploidization mechanism), where genome doubling is followed by genome portioning into dominant (stable) and sensitive (plastic) compartments, instead of entirely explained by pre-existing differences between founder (A, B and D) progenitors (i.e. pure pre-polyploidization mechanism). We propose in the current study that subgenome dominance is not only active in paleopolyploids but also in modern neopolyploids such as bread wheat. In our bread wheat evolutionary model, we propose that the differences in genomic structure and genetical landscape between subgenomes resulted directly from differential modes of evolution between dominant and sensitive blocks, where B subgenome acts as a sensitive in contrast with A as dominant in response to the neotetraploidization event 2.5 mya and D as dominant and A–D subgenomes as sensitive in regard to the neohexaploidization event 10 000 years ago.
This scenario is in agreement with the differential plasticity (at the gene content and diversity levels) reported in the current analysis in regards to gene loss and SNP/gene, higher in the B subgenome compared to the two others. Moreover, our wheat subgenome dominance model provides highlight into earlier studies of genome rearrangement (Zhang et al., 2013) or gene loss (Ozkan and Feldman, 2001b; Ozkan et al., 2001a) in newly synthesized polyploids as well as the reported biased erosion of genetic diversity during domestication (Cavanagh et al., 2013). The B genome, sensitive subgenome in regard to the paleotetraploidization, is associated to a pronounced genome plasticity related to the number of paralogous loci (reported higher than in the A and D genomes in Akhunov et al., 2003), translocation (Kota and Dvorak, 1988), and other large-scale structural changes (Zhang et al., 2013). Zhang et al. (2013) using a set of 16 independently synthesized allohexaploid wheat lines produced >1000 individual plants at different selfing generations (S1 > S20) after allohexaploidization that have been karyotyped. Of the three consistent subgenomes, B (sensitive) showed the highest frequency of chromosome rearrangements, followed by A (dominant); and the D (supra-dominant, referenced also as ‘pivotal’ in Zhang et al., 2013) subgenome was largely observed with the higher stability. Overall, these data suggest that DNA rearrangements at the chromosome as well as gene levels, leading to the reported diploidization-driven subgenome dominance, occurred immediately or within a few generations following polyploidization.
Both scenarios aiming at explaining the structural asymmetry characterized between the wheat subgenomes need to be now considered, i.e. either a pre-polyploidization mechanism where the evolutionary history of the cross-pollinator A. speltoides progenitor (B donor) had a more diverse genome that self-pollinators T. urartu/A. tauschii (A/D donors) or a post-polyploidization mechanism with an accelerated plasticity of the B subgenome in contrast to the homoeologous counterparts.
Syntenome of complex polyploid species can be used as a guide for trait dissection
High resolution and large-scale comparative genomics studies offer a tremendous set of gene-based markers that can be used directly as founder resource for genome mapping (physical or genetic) and ultimately trait dissection. In the present work, we deliver a wheat syntenome consisting in 17 317 ordered COS genes in wheat and the associated sequence variations (SNP and InDels). Overall, the sequence capture of the COS markers associated with NGS methods appear to be a powerful technique for the large-scale discovery of gene-associated SNPs in any plant of interest. Moreover, such sequencing efforts within a large-scale intra-specific background will provide a complete set of putative causal SNPs, PAVs, CNVs, as functional diagnostic markers in the near future. Associated with public RFLP, AFLP, STS, SSR, DaRTs markers, COS-SNP characterization and mapping effort in wheat allowed us to provide here the most complete genetic map consisting in 7520 molecular markers. Such resolution including synteny-based COS markers allows to refine metaQTL intervals and immediately benefits from the COS markers to access robust links with sequence genomes and precise orthologous candidate genes as we precisely illustrated in Quraishi et al. (2011a,b) and Dibari et al. (2012). Moreover the wheat syntenome has been also used as a matrix for physical map construction (Lucas et al., 2013) prior QTL dissection.
The proposed impact of polyploidization-based subgenome partitioning on contrasted gene content and diversity in dominant and sensitive blocks may need to reconsider in agronomic traits dissection. First of all, the sensitive chromosomal compartments that appeared most plastic (in tem of polymorphism as well as recombination pattern) may be more accessible to QTL cloning. It would be then interesting to investigate the differential efficiency in trait or gene cloning observed in both D and S compartments. Moreover, the difference in subgenome stability (with higher plasticity observed for B > A > D) may also lead to the hypothesis that homoeologous groups may support different type of agronomic traits. Feldman et al. (2012a), Feldman and Levy (2012b) reported that genome A was found to control morphological traits while genome B in allotetraploid to control reactions to biotic and abiotic factors. This difference in trait natures in also consistent with the subgenome dominance hypothesis where the sensitive compartment (B subgenome in wheat as defined in the current analysis) has been suggested to control adaptative traits in contrast to more stable processes driven by the dominant counterpart (A and D in hexaploid wheat), Abrouk et al., 2012;. Moreover, QTL partitioning following polyploidy has been also suggested recently with 21% homoeologous fiber quality QTLs in cotton (Rong et al., 2007) and 23% homoeologous fruit quality QTLs in strawberry (Lerceteau-Köhler et al., 2012) characterized, then suggesting that the majority of QTL are no longer maintained on the duplicated blocks as putatively a direct consequence of the diploidization mechanism. To what extent the hexaploid wheat adaptation (in particular regarding adaptation in responses to biotic and abiotic stresses) is possibly partitioned in the genome, between the currently defined dominant and sensitive chromosomal compartments, remains an open question that still needs to be addressed in the future.
The experimental procedures may be found in the online version of this article in detailing: (i) Wheat Syntenome Protocol (COS identification; CAR identification; computational gene order in wheat; D-S blocks identification); (ii) Wheat COS Sequencing Protocol (COS-primer design; COS-SNP sequencing and clustering); (iii) SNP Genotyping Protocol (SSCP, Illumina, size polymorphism, sequencing, KASpar, LNA® Dual-Labeled Fluorogenic Probes); and (vi) Wheat Comprehensive Consensus Genetic Map Protocol (selection of public reference genetic maps; Construction of WCGM2013).
Funding and acknowledgement
This work has been supported by grants from INRA (‘Génétique et Amélioration des Plantes’ reference: ‘Appel d'Offre AIP Bioressources’), the Auvergne region (‘Pôle de compétitivité: Céréales Vallée’ reference: programme ‘Semences de Demain’), the Agence Nationale de la Recherche (Programs ANRjc-PaleoCereal, ref: ANR-09-JCJC-0058–01 and programme ANR Blanc-PAGE, reference: ANR-2011-BSV6–00801), and the European 7th Framework Programme (programme ‘TRITICEAE GENOME’, reference: FP7–212019). The author would like to thank Emmanuelle Lagendijk, Nils Stein, Laura Rossini for the coordination of the COS sequencing and Grain Fiber Content QTL characterization initiatives as well as Sébastien Faure for providing the wheat lines in the frame of the TRITICEAE GENOME project.