SEARCH

SEARCH BY CITATION

Keywords:

  • wheat genome;
  • chromosome sorting;
  • genome zipper;
  • grass comparative genomics;
  • wheat shotgun chromosome;
  • Triticeae genome

Summary

  1. Top of page
  2. Summary
  3. Introduction
  4. Results
  5. Discussion
  6. Conclusion
  7. Experimental Procedures
  8. Acknowledgements
  9. References
  10. Supporting Information

Wheat is the third most important crop for human nutrition in the world. The availability of high-resolution genetic and physical maps and ultimately a complete genome sequence holds great promise for breeding improved varieties to cope with increasing food demand under the conditions of changing global climate. However, the large size of the bread wheat (Triticum aestivum) genome (approximately 17 Gb/1C) and the triplication of genic sequence resulting from its hexaploid status have impeded genome sequencing of this important crop species. Here we describe the use of mitotic chromosome flow sorting to separately purify and then shotgun-sequence a pair of telocentric chromosomes that together form chromosome 4A (856 Mb/1C) of wheat. The isolation of this much reduced template and the consequent avoidance of the problem of sequence duplication, in conjunction with synteny-based comparisons with other grass genomes, have facilitated construction of an ordered gene map of chromosome 4A, embracing ≥85% of its total gene content, and have enabled precise localization of the various translocation and inversion breakpoints on chromosome 4A that differentiate it from its progenitor chromosome in the A genome diploid donor. The gene map of chromosome 4A, together with the emerging sequences of homoeologous wheat chromosome groups 4, 5 and 7, represent unique resources that will allow us to obtain new insights into the evolutionary dynamics between homoeologous chromosomes and syntenic chromosomal regions.


Introduction

  1. Top of page
  2. Summary
  3. Introduction
  4. Results
  5. Discussion
  6. Conclusion
  7. Experimental Procedures
  8. Acknowledgements
  9. References
  10. Supporting Information

Bread wheat (Triticum aestivum) provides approximately 20% of mankind’s dietary energy supply (http://www.fao.org), but, despite its importance as a crop, acquisition of its genome sequence remains a major challenge. The biological features responsible for the slowness of progress towards this goal are its large genome size (1C is approximately 17 Gb), its hexaploid status, and its high content of repetitive DNA (approximately 80%) (Flavell, 1986). Each wheat chromosome is larger than the entire genome of rice (Oryza sativa), and the whole genome is more than one hundred times larger than that of Arabidopsis thaliana. The species arose from two separate hybridization and allopolyploidization events, the first involving a hybrid between the A genome donor Triticum urartu (closely related to the cultivated species Triticum monococcum) (Dvorak et al., 1993) and the B genome donor, thought to be an ancestor of Aegilops speltoides (Sarkar and Stebbins, 1956; Dvorak and Zhang, 1990; Wang et al., 1997; Kilian et al., 2007). This formed the wild tetraploid Triticum dicoccoides, which was the ancestor of the cultivated tetraploid parent of bread wheat Triticum turgidum. The second, much more recent, event involved T. turgidum and the D genome diploid Aegilops tauschii (McFadden and Sears, 1946).

A successful strategy that is frequently adopted to circumvent many of the difficulties created by polyploidy has been to rely on diploid, and in some cases tetraploid, progenitors as surrogates (Feuillet et al., 2003). The availability of the genome sequences of rice and Brachypodium distachyon (Brachypodium) has been of particular value in providing saturation of the genetic map in specific regions of the wheat genome (International Rice Genome Sequencing Project 2005, International Brachypodium Initiative 2010). As an alternative, Doležel et al. (2007) proposed that genome sequencing be based on flow-sorted individual chromosomes or chromosome arms, an approach that simplifies genome analysis by simultaneously reducing the template to a manageable size, and crucially avoids all of the complications introduced by the triplication of genic sequence arising from wheat’s hexaploid status (Kubalákováet al., 2002). Next-generation sequencing of chromosomal DNA provides a powerful approach to identify most of the genes and low-copy regions on a chromosome and to produce annotated syntenic builds whereby the majority of genes are placed in an approximate order and orientation (Berkman et al., 2011; Mayer et al., 2011, 2009; Wicker et al., 2011). The so-called GenomeZipper approach (Mayer et al., 2011) relies on comparisons of chromosomal shotgun sequences with reference grass genomes (typically rice, sorghum (Sorghum bicolor) and Brachypodium) to detect syntenic regions in these reference genomes. Genes in the detected regions are selected to generate a genomic build along a marker scaffold that takes into account the sequential order of sequence-tagged genes in the reference genomes as well as the ordering deduced from the marker scaffold.

Although most of the bread wheat chromosomes have maintained the structure of ancestral species, that of chromosome 4A underwent a series of re-arrangements. Previous analyses revealed that the chromosome harbors two translocations from chromosome arms 5AL and 7BS, and that it has undergone a pericentric inversion (Figure 1) (Devos et al., 1995; Miftahudin et al., 2004; Naranjo et al., 1987) (Figure 1). The 5A translocation occurred at the diploid level in a common ancestor as it is present in wheats of all ploidy levels, including diploid wheat progenitors and related species such as T. monococcum. On the other hand, the 7BS translocation is detected in tetraploid and hexaploid wheat only, indicating its occurrence after or at the time of origin of T. dicoccoides (Devos et al., 1995). Interestingly, most of the studies on bread wheat also report the presence of a small region on the most distal part of 4AS that was not affected by the large pericentric inversion that placed most of the ancestral short arm on the modern long arm (4AL), and, as a result, large proportions of the ancestral 4AL now constitute the modern 4AS.

image

Figure 1.  The structure of wheat chromosome 4A. The structure of bread wheat chromosome 4A, as inferred by Devos et al. (1995) and Miftahudin et al. (2004). During its evolution, the chromosome first underwent a pericentric inversion, which resulted in much of the ancient long arm (excluding segment C) becoming the modern short arm. Subsequent translocations from 5AL (segment D) and 7BS (segment E) completed the rearrangement of the chromosome. The five individual segments A–E are color-coded. The additional small structural rearrangements proposed by Miftahudin et al. (2004) are not shown as they could not be confirmed in the present study.

Download figure to PowerPoint

As many as 40 genes of interest have been mapped to this chromosome to date, including some encoding resistance/tolerance to biotic and abiotic stress (Chen et al., 1995, 2005; Effertz et al., 2001; Nga et al., 2009; Paull et al., 1998; Talbert et al., 1996), and various agronomic traits (Araki et al., 1999; Bai et al., 2008; Börner et al., 2002; Keller et al., 1999; McCartney et al., 2005; Sourdille et al., 2002). Detailed information on the chromosome gene order would greatly enhance effective use of the genes in breeding programs and ultimately in their cloning and functional analysis.

Here, we report a high-resolution gene map of this chromosome, based on DNA sequence obtained from flow-sorted chromosome arms. Use has been made of the genetic marker content present in homoeologous portions of the barley genome (Hordeum vulgare) and the reference grass genomes to provide a detailed insight into gene composition and order along the length of the chromosome. This is a powerful approach for production of a high-resolution draft of gene space for the complex genome of bread wheat, including its highly rearranged chromosome 4A. The approach has important implications for the whole-genome analysis of both bread wheat and other large genomes of agriculturally important grasses such as rye (Secale cereale), fescue (Festuca ssp.) and ryegrass (Lolium ssp.).

Results

  1. Top of page
  2. Summary
  3. Introduction
  4. Results
  5. Discussion
  6. Conclusion
  7. Experimental Procedures
  8. Acknowledgements
  9. References
  10. Supporting Information

Preparation of chromosomal DNA and shotgun sequencing

Two separate DNA bulks were prepared from the mitotically dividing cells of the double di-telosomic 4A stock. The 4AS preparation contained approximately 78 000 flow-sorted telosomes, and the 4AL one contained approximately 50 000. The level of purity of these preparations, as estimated by fluorescent in situ hybridization, was 86.9% and 89.0%, respectively. Chromosome 1D comprised approximately 50% of the contaminants in the 4AL preparation, but no single chromosome predominated in the 4AS preparation. The 4AS bulk yielded 29.5 ng DNA, which was amplified in three independent multiple displacement amplification reactions to generate 16 μg DNA; similarly, the 4AL bulk produced 44.6 ng DNA, which was amplified in four reactions to yield 22.7 μg DNA. The individual multiple displacement amplification reactions for each template were combined to reduce the probability of bias introduced by multiple displacement amplification itself. The amplified DNA was used for 454 shotgun sequencing (Table 1), which produced 2 181 649 4AS reads of mean length 324 bp, representing a total of 707 Mb of sequence (NCBI sequence read archive, http://www.ncbi.nlm.nih.gov/sra, reference SRA038898.1). Given the estimated length of this arm (317 Mbp; Šafářet al., 2010), this is equivalent to a sequencing depth of approximately 2.2-fold. For chromosome 4AL, the 2 987 571 reads (mean length 302 bp) yielded 901 Mb of sequence (NCBI sequence read archive, reference SRA034928.1), equivalent to a sequencing depth of approximately 1.7-fold. Sequencing details are summarized in Table 1.

Table 1.   Shotgun sequences of wheat chromosome arms 4AS and 4AL
Parameter4AS4AL
Raw dataRepeat-masked and filteredaRaw dataRepeat-masked and filtereda
  1. aThe filter applied ensured the retention of sequences longer than 100 bp that contained at least 100 bp of non-repetitive sequence.

Number of sequences2 181 649420 7392 987 571752 981
Number of base pairs707 234 947146 653 961901 236 013239 649 872
Minimal length (bp)1810029100
Maximal length (bp)826826982982
Mean length (bp)324349302318
Repeat content (%)79.5 72.8 
GC content (%)44.746.441.441

Identification of syntenic regions in related grass genomes

The full genome sequences of Brachypodium, rice and sorghum (International Rice Genome Sequencing Project 2005, Paterson et al., 2009; International Brachypodium Initiative 2010) were used to identify regions of synteny in order to take advantage of the GenomeZipper approach (Mayer et al., 2011). The 4AS and 4AL sequences were compared by BLAST analysis against the genomic sequences of Brachypodium, rice and sorghum, as well as against the virtual barley genome (Mayer et al., 2011), to identify syntenic regions (Figures 2 and 3). The 4AL comparison highlighted regions on Brachypodium chromosomes 1 and 4, rice chromosomes 3, 6, 11 and 12, and sorghum chromosomes 1 and 10 (Figure 3). In the same way, 4AS syntenic regions were identified on Brachypodium chromosome 1, rice chromosome 3 and sorghum chromosome 1. Comparison with the barley chromosome reference produced hits on chromosomes 4H, 5H and 7H (Figure 2). The 4AS sequences identified part of chromosome arm 4HL, while the 4AL sequences matched the entire 4HS arm, as well as identifying regions on 5HL and 7HS and a small region of 4HL (Figure 2). The gene content in these regions was collated, and the syntenic boundaries were located with high precision (Tables 4 and 5). The resulting data were then used to generate a GenomeZipper-based alignment and a high-resolution genetic map of chromosome 4A.

image

Figure 2.  Comparison of the 4A shotgun sequence with that of barley. Repeat-masked 4AS and 4AL shotgun sequence reads were compared against the sequence of virtual barley chromosomes (Mayer et al., 2011). Syntenic regions on chromosomes 4H, 7H and 5H are colored red; non-syntenic regions are colored blue. Centromeres are indicated by black triangles and the arms of the chromosomes are labeled S and L. Connectors/joins indicate corresponding segments and orientation of the individual segments.

Download figure to PowerPoint

image

Figure 3.  Comparison of the 4A shotgun sequence with that of Brachypodium, rice and sorghum. Repeat-masked 4AS and 4AL shotgun sequence reads were compared with the genome sequences of Brachypodium (Bd), rice (Os) and sorghum (Sb). Syntenic regions are colored red; non-syntenic regions are colored blue. Centromeres are indicated by black triangles and the arms of the chromosomes are labeled S and L. Connectors/joins indicate corresponding regions and the orientation of the individual segments.

Download figure to PowerPoint

Table 4.   Regions in Brachypodium, rice and sorghum sharing synteny with 4AS and 4AL, as deduced from in silico mapping
 Reference genomeChromosomeStart (Mp)Stop (Mp)Number of genes
  1. Based on detected syntenic segments, the table gives the chromosome, start and stop coordinates on the respective reference genome and the number of genes located in these regions.

Wheat chromosome arm 4AS
Segment ABrachypodium160.271.8905
Rice32.614.2763
Sorghum158.869.7785
Wheat chromosome arm 4AL
Segment BBrachypodium16.511.5240
47.910.375
422.531.8291
Rice323.330.8233
112.311.9201
1114.130.7215
Sorghum17.513.9248
Segment CBrachypodium171.572.649
Rice31.92.731
Sorghum170.071.132
Segment DBrachypodium10.31.8157
Rice335.337.3123
120.12.167
Sorghum10.11.9110
Segment EBrachypodium148.150.5151
Rice60.12.6124
Sorghum100.62.3103
Table 5.   Overview of the breakpoints between the five chromosome 4A segments
SegmentBrachypodiumRiceSorghum
  1. Only the first and last three syntenic genes anchored in each segment are shown.

ABradi1g72080.1Os03g0187500Sb01g044730.1
Bradi1g72086.1Os03g0187400Sb01g044740.1
Bradi1g72092.1Os03g0187300Sb01g044750.1
Bradi1g65190.1Os03g0296700Sb01g038210.1
Bradi1g65197.1Os03g0296600Sb01g038220.1
Bradi1g65210.1Os03g0296400Sb01g038230.1
BBradi4g26690.1Os11g0150450Sb01g013770.1
Bradi4g26670.3Os11g0151600Sb01g013780.1
Bradi4g26640.1Os11g0152700Sb01g013830.1
Bradi1g13777.1Os03g0652100Sb01g013490.1
Bradi1g13850.1Os03g0648200Sb01g013540.1
Bradi1g13870.1Os03g0645100Sb01g013650.1
CBradi1g75740.1Os03g0138200Sb01g047640.1
Bradi1g75720.1Os03g0140100Sb01g047630.1
Bradi1g75707.1Os03g0141100Sb01g047610.1
Bradi1g75960.1Os03g0147900Sb01g047070.1
Bradi1g75970.1Os03g0147700Sb01g047850.1
Bradi1g76227.1Os03g0136900Sb01g047860.1
DBradi1g00227.1Os03g0861800Sb01g000210.1
Bradi1g00237.1Os03g0860900Sb01g000220.1
Bradi1g00247.1Os03g0860700Sb01g000300.1
Bradi1g02940.1Os03g0823800Sb01g002280.1
Bradi1g02950.1Os03g0822100Sb01g002300.1
Bradi1g02980.1Os03g0821633Sb01g002410.1
EBradi1g49450.1Os06g0122200Sb10g001470.1
Bradi1g49460.1Os06g0125000Sb10g001520.1
Bradi1g49470.1Os06g0125300Sb10g001530.1
Bradi1g52060.1Os06g0103300Sb10g000300.1
Bradi1g52090.1Os06g0102900Sb10g000270.1
Bradi1g52110.1Os06g0102700Sb10g000260.1

Gene content of chromosome 4A

In order to estimate the number of genes present on each 4A chromosome arm, TBLASTX comparisons were made with the Brachypodium, rice and sorghum genome sequences, based on a stringency level of at least 75% over at least 30 amino acids (Table 2). This exercise produced between 3278 and 3805 hits for 4AS, and between 3956 and 4523 for 4AL. The numbers of non-redundant matches were 4383 and 5188, respectively, giving a total of 9571 non-redundant gene matches on 4A. Given the estimated size of 4A of 856 Mb and a gene density representative of the complete wheat genome (Qi et al., 2004), this scales up to at least 61 500 genes for the A genome and >180 000 genes for bread wheat. This result contrasts with recent estimates for barley (≥32 000 genes; Mayer et al., 2011) and the B genome of wheat (38 000 genes; Choulet et al., 2010), and with our own estimate of ≥3000 genes on 4A based on a conservative synteny-driven integration approach (Table 3).

Table 2.   Tagged genes in the reference genomes
Chromosome armNon-redundant genesaNon-redundant genes (total)
BrachypodiumRiceSorghum
  1. aThe number of sequence-tagged genes located on chromosome 4A as deduced from similarity comparisons (sequence identity ≥75% and ≥30 amino acids) with reference genomes.

4AS3805327833654383
4AL4523395640695188
Table 3.   Wheat chromosome 4A GenomeZipper statistics
ParameterChromosome 4A segment4AS4AL4A
ABCDE
  1. Overview of non-redundant data points anchored along chromosome 4A. The numbers refer to the chromosomal segments A–E, which form the chromosomal zippers for 4AS and 4AL, as well as for the whole chromosome 4A. Numbers given in the individual columns give non-redundant numbers for each category. Due to small overlaps in segment and arm assignment few cases don't have an unbiased assignment and can't be resolved. Thus the non-redundant sums do not always match with the sum of individual values.

Number of markers127107164664127233360
Number of markers with associated gene from reference genome(s)92528272792114206
Number of matched barley fl-cDNAs600393291231146006591256
Number of non-redundant sequence reads10 62818 4159695556512610 62826 58437 212
Number of non-redundant ESTs9357006021720493510681996
Number of Brachypodium genes905606491571519059631865
Number of rice genes763649311901247639941754
Number of sorghum genes785248321101037854931278
Number of anchored gene loci1182111079300262118217512933

The structure of chromosome 4A

On the basis of synteny with barley, Brachypodium, rice and sorghum, it was possible to recognize five distinct regions (A–E) on chromosome 4A. There are 120 (5!) ways in which five independent segments can be ordered, but as each segment can be present in one of two possible orientations, the true number of possible arrangements is 3840 (120 × 25). To resolve the actual ordering, advantage was taken of published genetic mapping data (Devos et al., 1995; Mayer et al., 2011; Miftahudin et al., 2004). The 4AS sequence-identified a syntenic region on 4HL, while 4AL sequence-identified 4HS and a small segment of 4HL (Figures 2 and 3). Chromosome 4A is known to carry a pericentromeric inversion (Devos et al., 1995; Miftahudin et al., 2004) involving a portion of the ancient long arm (4ALanc; segment A) and the complete ancient short arm (4ASanc; segment B); this converted 4ALanc into the modern 4AS, and 4ASanc into the distal part of the modern 4AL. In addition, a small region of 4AL (segment C) appears to have not been involved in the pericentromeric inversion (Figure 2). Consequently, the gene order in segments A and B was reversed with respect to barley, but that in segment C was conserved. The segment D sequences on 4AL show homology with a distal portion of 5HL (Figure 2), consistent with genetic mapping data (Devos et al., 1995; Mayer et al., 2011; Miftahudin et al., 2004). Finally, genetic data indicated that a further translocation must have occurred between a distal segment of chromosome arm 7BS and 4A (Devos et al., 1995; Mayer et al., 2011; MickelsonYoung et al., 1995; Miftahudin et al., 2004). The evolutionary scenario proposed by Devos et al. (1995) and Miftahudin et al. (2004) allowed the orientation of segments D and E to be determined. On the basis of meiotic pairing between the distal segments of 4AS, 4BS and 4DS, Naranjo et al. (1987) have suggested retention of a small segment of 4ASancient in the distal part of modern 4AS; although no genetic evidence for this was obtained by Devos et al. (1995), two relevant EST sequences were located within this region by bin mapping (Miftahudin et al., 2004). A BLASTN comparison of these two ESTs against the present set of 4AS and 4AL sequences produced either no hits or hits with restricted sequence similarity and sequence alignment length (data not shown), so it was not possible to confirm the presence of this distal 4AS segment.

A virtual map of chromosome 4A

A map of chromosome 4A was assembled using the GenomeZipper protocol (Mayer et al., 2011) from the sequence data and synteny-based deductions (Figure 3, Table 4 and Table S1). The ordering of segments A–E was determined based on the marker map of barley (Close et al., 2009). The number of markers involved in this process ranged from 16 (segment C) to 127 (segment A) (Table 3). The 4AS arm is associated with 127 markers, and the 4AL arm with 233. Between 79 genes (segment C) and 1182 genes (segment A) were thus assigned to each of the segments, resulting in the placement of approximately 3000 genes over the whole chromosome (Table 3). The five segments varied considerably in the extent of synteny with the other grass genomes, and overall just 29% of the genes were conserved across wheat and all three sequenced genomes (Figure 4). Almost half of the genes in segment A were present in the expected location in all three reference genomes, compared to only approximately one-eighth in segment B. When the criterion for support was reduced to just one of the three heterologous genomes, the frequency of conservation across the whole chromosome with Brachypodium was 22.3%, that with rice was 22.8%, but that with sorghum was only 7.7%. At the level of the individual segments, the frequency of conservation varied by as much as threefold, with no evidence that the segments resulting from the two known translocation events (segments D and E) showed a lower level of conservation. The clearly unequal level of conservation across the various Pooideae lineages underlines the value of using more than one reference genome when attempting synteny-based deduction of gene order.

image

Figure 4.  Conservation of synteny between chromosome 4A and rice, Brachypodium and sorghum. The gene content of segments A–E of chromosome 4A was compared with that of the homologous regions in Brachypodium (Bd), rice (Os) and sorghum (Sb). The Venn diagrams show the numbers of genes shared between wheat and the reference genomes.

Download figure to PowerPoint

Translocation and inversion breakpoints

Alignment of the 4A sequence against that of the barley, Brachypodium, rice and sorghum genomes allowed precise localization of the breakpoints associated with the various rearrangements that have determined the structure of chromosome 4A (Figures 2 and 3, and Tables 4 and 5). Sequence comparison against a genome built of the barley genome as well as against the reference genomes of Brachypodium, rice and sorghum allow precise delineation and detection of the regions where the rearrangements occurred, with an almost single-gene resolution (Figures 2 and 3, and Tables 4 and 5). We analyzed the corresponding regions in the reference genomes for syntenic intervals and syntenic borders. Based on gene detection by sequence comparisons of 4AS and 4AL, reads bordering syntenic regions could be accurately identified. The regions range between 370 kb and 28 Mb, and contain between 905 genes and only 31 genes in the respective syntenic regions (Table 4). The exact positioning of syntenic borders also allows definition of the bordering genes for the individual segments and their orthologous counterparts in the reference genomes (Table 5). The approach based on flow sorting-facilitated isolation of chromosome arms, high-throughput sequencing and comparative genome analysis is thus capable of reconstructing genomes and identifying evolutionary translocation breakpoints.

Discussion

  1. Top of page
  2. Summary
  3. Introduction
  4. Results
  5. Discussion
  6. Conclusion
  7. Experimental Procedures
  8. Acknowledgements
  9. References
  10. Supporting Information

The structure of chromosome 4A

The large size and polyploidy of the bread wheat genome pose a considerable challenge for its sequencing. Current sequencing technology has the ability to acquire large amounts of sequence economically, but sequence assembly, and, most importantly, establishment of the gene order within each of the 21 chromosomes are particularly difficult, mostly because of the presence of homoeologous copies of most genes and the extent of repetitive DNA present. Fortunately, much synteny has been retained among Poaceae species in general, and among the Triticeae species in particular (Bolot et al., 2009; Devos and Gale, 2000; Moore, 1995; Salse and Feuillet, 2007). A small number of major chromosomal rearrangements are known among the Triticeae (Devos et al., 1995), but collinearity has largely been retained across the wheat and barley genomes, despite their divergence approximately 12 Myr (Gaut, 2002). Within wheat itself, chromosome 4A suffered the most significant overall re-arrangement (Figures 1 and 2) (Devos et al., 1995; Miftahudin et al., 2004), as confirmed by an extensive comparative study of the gene content of wheat and barley (Mayer et al., 2011; Qi et al., 2004). Integration of sequence and mapping data allowed recognition that chromosome 4A comprises five separate segments. Segments A, B and C originated from the pericentromeric inversion, while segments D and E arose from later interchanges with chromosomes 5A and 7B. All five segments were successfully ordered and oriented to allow subsequent GenomeZipper-based gene integration and positioning.

Identification of translocations and inversions

While assignment of 4AS-derived sequences to corresponding syntenic segments in the reference genomes and barley was relatively straightforward, the assignment was much more complex for 4AL (Figure 3). Integration of the resulting patterns and comparison with barley chromosomes 4H, 5H and 7H led to identification of five syntenic segments A–E. The orientation of segments A, B and C was evident from comparisons based on the most parsimonious single pericentromeric inversion event. On the other hand, the positioning and orientation of segments D and E, which resulted from translocations from chromosomes 5A and 7B, respectively, could not be deduced from synteny patterns alone. In conjunction with genetic mapping data and a derived order of segments and their orientation (Devos et al., 1995; Miftahudin et al., 2004), all five segments were ordered and oriented for GenomeZipper-based gene integration and positioning. Thus, by integrating genetic data with our molecular and comparative data, a conclusive order and orientation of segments and accordingly a linear order of genes could be established. This demonstrates the power of combining and integrating genetic data with chromosome next-generation sequencing-derived shotgun sequence data and comparative and bioinformatic analysis.

Gene content of wheat chromosome 4A

A rather stringent comparison between the 4A sequences and the various annotated reference genomes produced an estimated gene content on chromosome 4A of >9500 genes, a number that is rather higher than has been suggested for either barley chromosome 4H (4000) or wheat chromosome 3B (6360) (Mayer et al., 2011; Paux et al., 2008). Other estimates of gene number based on the analysis of individual chromosomes have also diverged from those based on whole-genome analyses (Mayer et al., 2009; Wicker et al., 2011). One explanation is that individual chromosomes/chromosome arms are compared with complete reference genomes, which may result in a higher rate of false-positive gene identifications due to the presence of cross-matching paralogous sequences. In an analysis of shotgun sequences from wheat homoeologous group 1 chromosomes, Wicker et al. (2011) identified a significant number of potential pseudogenes (similar to, or even exceeding the number of functional genes) that shared homology with various known genes but were not present in the syntenic regions of either Brachypodium or rice. This underlines the value of the comparative approach when attempting estimation of the number of genes present on a particular wheat chromosome.

The GenomeZipper method identified 1182 genes on 4AS and 1751 on 4AL, so a total of approximately 3000 genes supported by synteny was placed on the entire chromosome. Important in understanding the context of this estimate are (i) the sequencing depth achieved (2.2-fold for 4AS; 1.7-fold for 4AL), (ii) the expected gene detection rate (85%, based on the method described by Lander and Waterman, 1988), and (iii) that 20–25% of wheat genic sequences fail to detect a close homolog in the reference genomes (Mayer et al., 2011). Based on these considerations, we estimate the gene content of chromosome 4A to be approximately 4300 (2933/0.85 × 100/80). Assuming that the gene density on chromosome 4A is representative of the A genome as a whole, and given that its physical length is 15.6% of the entire genome, the A genome contains approximately 28 000 genes, a number largely in line with estimates for both the B genome (38 000; Choulet et al., 2010) and for barley (32 000; Mayer et al., 2011). However, due to the series of translocations of presumably gene-rich telomeric regions that shaped the modern chromosome 4A, the gene content of chromosome 4A may deviate from that of other less rearranged wheat chromosomes. Thus chromosomal shotgun sequences for other chromosomes will be helpful to refine gene estimates for the individual wheat sub-genomes.

Limitations in resolution

A high-resolution EST map has been constructed for both Ae. tauschii, the D genome donor species (Luo et al., 2009), and chromosome 3B (Paux et al., 2008). At present, only binned EST markers (Qi et al., 2004) are available for the other wheat chromosomes. Bin maps lack sufficient resolution to be used for syntenic integration and genome zipping, which is why it was necessary here to rely on the barley genetic map. The validity of this approach depends on the retention of a high degree of synteny between barley and wheat; any small-scale rearrangements will not be detected until a dense marker map of the wheat genome has been generated. Nevertheless, it was still possible to identify with high precision the boundaries between the five segments that arose as a result of the evolutionary inversion and translocations. Earlier research based on mapping of cDNA RFLP loci (Devos et al., 1995) and bin mapping (Miftahudin et al., 2004) suggested the presence of at least two other segments on chromosome 4A, but we have not been able to confirm the presence of either of these. The availability of higher-resolution genetic maps (which are certainly attainable given the volume of relevant sequence data now available) will enable confirmation of the veracity of these proposed additional structural rearrangements. A full comparative analysis awaits acquisition of genomic sequence from chromosomes 4B and 4D, and from the translocated portions of 5A and 7B. These data will enable determination of the degree of similarity between homoeologs with respect to gene content and potential loss of genes. Identification of the translocation breakpoints on chromosomes 7B and 5A may also allow recognition of molecular signatures and the molecular environment that marks these translocations.

Conclusion

  1. Top of page
  2. Summary
  3. Introduction
  4. Results
  5. Discussion
  6. Conclusion
  7. Experimental Procedures
  8. Acknowledgements
  9. References
  10. Supporting Information

We have demonstrated here that fractionation of the complex wheat genome into single chromosome arms, coupled with the analysis of shotgun sequences using GenomeZipper, provides a successful strategy for constructing a high-resolution gene-based chromosome map. The acquisition of a complete ordered gene map, and ultimately of the genome sequence itself, requires the development of a reliable physical map, the construction of which is presently being coordinated by the International Wheat Genome Sequencing Consortium. A physical map, together with chromosome survey sequences, offers an ideal means of performing a detailed analysis of chromosomal rearrangements. Further developments in sequencing efficiency should also provide opportunities to improve both chromosome coverage and gene detection rate. This will eventually enable discovery of the full genomic gene territories to enable study of the gene structure and associated non-transcribed elements such as cis elements.

Experimental Procedures

  1. Top of page
  2. Summary
  3. Introduction
  4. Results
  5. Discussion
  6. Conclusion
  7. Experimental Procedures
  8. Acknowledgements
  9. References
  10. Supporting Information

Plant material

The 4A double di-telosomic stock of bread wheat cv. Chinese Spring is a stable line in which chromosome 4A is represented by a pair of telosomes, one of which is the short arm (4AS) and the other the long arm (4AL) (Sears and Sears, 1978). Grain of this stock was kindly provided by Dr Bikram Gill (Department of Plant Pathology, Kansas State University, Manhattan, KS).

Chromosome sorting and DNA amplification

Liquid suspensions of mitotic chromosomes were prepared from seedling root tips as described by Vrána et al. (2000). Telosomes were isolated and sorted using a FACSVantage SE flow cytometer (Becton Dickinson) into 40 μl sterile deionized water. The level of purity of the sorted material was determined using fluorescence in situ hybridization based on a probe containing either the telomeric repeat Afa or [GAA]n, as described by Kubalákováet al. (2003). The flow-sorted chromosomes were treated with proteinase, and DNA was then extracted using a Microcon YM-100 column (Millipore, http://www.millipore.com/), as described by Šimkováet al. (2008). Chromosomal DNA was amplified by multiple displacement amplification using an Illustra GenomiPhi V2 DNA amplification kit (GE Healthcare, http://www.gehealthcare.com), and a Roche shotgun library (http://www.roche.com) was then created for each chromosome arm based on 5 μg multiple displacement-amplified DNA.

DNA sequencing and analysis

Sequencing of the 4AS and 4AL libraries was performed at the Lifesequencing S.L. facilities in Valencia (Spain) (http://www.lifesequencing.com/) on a Genome Sequencer FLX instrument (Roche), using titanium chemistry 454 Life Sciences Technology (Roche). Three full sequencing runs were performed for the 4AL library and two for the 4AS library. Repetitive DNA was masked using Vmatch software (http://www.vmatch.de/), using the MIPS-REDAT POACEAE version 8.6.2 repeat library as a reference (http://mips.helmholtz-muenchen.de/plant/genomes.jsp). The following parameters were applied: 70% identity cut-off, 100 bp minimal length, seed length 14, exdrop 5, e-value 0.001. To estimate the number of genes present, the repeat-filtered sequence reads were compared by TBLASTX against the coding sequences for Brachypodium (ftp://ftpmips.helmholtz-muenchen.de/plants/brachypodium/v1.2), rice (rice RAP-DB genome build 4, http://rapdb.dna.affrc.go.jp) and sorghum (version 1.4, http://genome.jgi-psf.org/Sorbi1/Sorbi1.download.ftp.html).

GenomeZipper analysis

The GenomeZipper workflow described by Mayer et al. (2011) was used, with some adjustments. Comparison and integration of the shotgun sequence into a linear gene order reference were achieved by exploiting synteny with barley, Brachypodium, rice and sorghum. The 4A segments were delineated by a BLASTN comparison of the shotgun sequence data with that of barley artificial chromosomes (Mayer et al., 2011). Only hits showing at least 85% identity and a minimum alignment of 100 bp were considered. BLASTX was used to identify homologs in the reference genomes, applying a criterion of >70% similarity and a minimum length of 30 amino acids. To position and orient genes, a selection of genes present in both the five 4A segments and the relevant syntenic regions of the other grass genome(s) was aligned using a marker-based map of barley chromosomes 4H, 5H and 7H.

Acknowledgements

  1. Top of page
  2. Summary
  3. Introduction
  4. Results
  5. Discussion
  6. Conclusion
  7. Experimental Procedures
  8. Acknowledgements
  9. References
  10. Supporting Information

We warmly acknowledge the help provided by Jarmila Číhalíková, Romana Šperková and Zdenka Dubská in chromosome sorting, as well as the helpful comments of two anonymous reviewers. This research was financially supported by the Spanish Ministry of Science and Innovation (grant numbers BIO200907443, BIO201115237 and AGL201017316), the German Ministry of Education and Research GABI Barlex project, the European Commission FP7-212019 Triticeae Genome grant, the Czech Science Foundation (awards 521/08/1629 and P501/10/1740), and the Czech Republic Ministry of Education, Youth and Sports/European Regional Development Fund (Operational Programme Research and Development for Innovations grant number CZ.1.05/2.1.00/01.0007).

References

  1. Top of page
  2. Summary
  3. Introduction
  4. Results
  5. Discussion
  6. Conclusion
  7. Experimental Procedures
  8. Acknowledgements
  9. References
  10. Supporting Information
  • Araki, E., Miura, H. and Sawada, S. (1999) Identification of genetic loci affecting amylose content and agronomic traits on chromosome 4A of wheat. Theor. Appl. Genet. 98, 977984.
  • Bai, G.H., Chen, C.X. and Cai, S.B. (2008) A major QTL controlling seed dormancy and pre-harvest sprouting resistance on chromosome 4A in a Chinese wheat landrace. Mol. Breeding, 21, 351358.
  • Berkman, P.J., Skarshewski, A., Lorenc, M.T. et al. (2011) Sequencing and assembly of low copy and genic regions of isolated Triticum aestivum chromosome arm 7DS. Plant Biotechnol. J. 9, 768775.
  • Bolot, S., Abrouk, M., Masood-Quraishi, U., Stein, N., Messing, J., Feuillet, C. and Salse, J. (2009) The ‘inner circle’ of the cereal genomes. Curr. Opin. Plant Biol. 12, 119125.
  • Börner, A., Schumann, E., Furste, A., Coster, H., Leithold, B., Roder, M.S. and Weber, W.E. (2002) Mapping of quantitative trait loci determining agronomic important characters in hexaploid wheat (Triticum aestivum L.). Theor. Appl. Genet. 105, 921936.
  • Chen, X.M., Line, R.F. and Jones, S.S. (1995) Chromosomal location of genes for resistance to Puccinia striiformis in winter-wheat cultivars Heines-Vii, Clement, Moro, Tyee, Ikes, and Daws. Phytopathology, 85, 13621367.
  • Chen, X.M., Luo, Y.H., Xia, X.C., Xia, L.Q., Chen, X., Ren, Z.L., He, Z.H. and Jia, J.Z. (2005) Chromosomal location of powdery mildew resistance gene Pm16 in wheat using SSR marker analysis. Plant Breeding, 124, 225228.
  • Choulet, F., Wicker, T., Rustenholz, C. et al. (2010) Megabase level sequencing reveals contrasted organization and evolution patterns of the wheat gene and transposable element spaces. Plant Cell, 22, 16861701.
  • Close, T.J., Bhat, P.R., Lonardi, S. et al. (2009) Development and implementation of high-throughput SNP genotyping in barley. BMC Genomics, 10, 582.
  • Devos, K. and Gale, M. (2000) Genome relationships: the grass model in current research. Plant Cell, 12, 637646.
  • Devos, K.M., Dubcovsky, J., Dvorak, J., Chinoy, C.N. and Gale, M.D. (1995) Structural evolution of wheat chromosomes 4A, 5A, and 7B and its impact on recombination. Theor. Appl. Genet. 91, 282288.
  • Doležel, J., Kubaláková, M., Paux, E., Bartoš, J. and Feuillet, C. (2007) Chromosome-based genomics in the cereals. Chromosome Res. 15, 5166.
  • Dvorak, J. and Zhang, H.B. (1990) Variation in repeated nucleotide sequences sheds light on the phylogeny of the wheat B and G genomes. Proc. Natl Acad. Sci. USA, 87, 96409644.
  • Dvorak, J., Terlizzi, P., Zhang, H.B. and Resta, P. (1993) The evolution of polyploid wheats: identification of the A genome donor species. Genome, 36, 2131.
  • Effertz, R.J., Anderson, J.A. and Francl, L.J. (2001) Restriction fragment length polymorphism mapping of resistance to two races of Pyrenophora tritici-repentis in adult and seedling wheat. Phytopathology, 91, 572578.
  • Feuillet, C., Travella, S., Stein, N., Albar, L., Nublat, A. and Keller, B. (2003) Map-based isolation of the leaf rust disease resistance gene Lr10 from the hexaploid wheat (Triticum aestivum L.) genome. Proc. Natl Acad. Sci. USA, 100, 1525315258.
  • Flavell, R.B. (1986) Repetitive DNA and chromosome evolution in plants. Philos. Trans. R. Soc. Lond. B Biol. Sci. 312, 227242.
  • Gaut, B.S. (2002) Evolutionary dynamics of grass genomes. New Phytol. 154, 1528.
  • International Brachypodium Initiative (2010) Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature, 463, 763768.
  • International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature, 436, 793800.
  • Keller, M., Karutz, C., Schmid, J.E., Stamp, P., Winzeler, M., Keller, B. and Messmer, M.M. (1999) Quantitative trait loci for lodging resistance in a segregating wheat x spelt population. Theor. Appl. Genet. 98, 11711182.
  • Kilian, B., Ozkan, H., Deusch, O., Effgen, S., Brandolini, A., Kohl, J., Martin, W. and Salamini, F. (2007) Independent wheat B and G genome origins in outcrossing Aegilops progenitor haplotypes. Mol. Biol. Evol. 24, 217227.
  • Kubaláková, M., Vrána, J., Číhalíková, J., Šimková, H. and Doležel, J. (2002) Flow karyotyping and chromosome sorting in bread wheat (Triticum aestivum L.). Theor. Appl. Genet. 104, 13621372.
  • Kubaláková, M., Valárik, M., Bartoš, J., Vrána, J., Číhalíková, J., Molnár-Láng, M. and Doležel, J. (2003) Analysis and sorting of rye (Secale cereale L.) chromosomes using flow cytometry. Genome, 46, 893905.
  • Lander, E.S. and Waterman, M.S. (1988) Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics, 2, 231239.
  • Luo, M.C., Deal, K.R., Akhunov, E.D. et al. (2009) Genome comparisons reveal a dominant mechanism of chromosome number reduction in grasses and accelerated genome evolution in Triticeae. Proc. Natl Acad. Sci. USA, 106, 1578015785.
  • Mayer, K.F., Taudien, S., Martis, M. et al. (2009) Gene content and virtual gene order of barley chromosome 1H. Plant Physiol. 151, 496505.
  • Mayer, K.F., Martis, M., Hedley, P.E. et al. (2011) Unlocking the barley genome by chromosomal and comparative genomics. Plant Cell, 23, 12491263.
  • McCartney, C.A., Somers, D.J., Humphreys, D.G., Lukow, O., Ames, N., Noll, J., Cloutier, S. and McCallum, B.D. (2005) Mapping quantitative trait loci controlling agronomic traits in the spring wheat cross RL4452 x ‘AC Domain’. Genome, 48, 870883.
  • McFadden, E. and Sears, E. (1946) The origin of Triticum spelta and its free-threshing hexaploid relatives. J. Hered. 37, 107116.
  • MickelsonYoung, L., Endo, T.R. and Gill, B.S. (1995) A cytogenetic ladder-map of the wheat homoeologous group-4 chromosomes. Theor. Appl. Genet. 90, 10071011.
  • Miftahudin, R.K., Ma, X.F., Mahmood, A.A. et al. (2004) Analysis of expressed sequence tag loci on wheat chromosome group 4. Genetics, 168, 651663.
  • Moore, G. (1995) Cereal genome evolution – pastoral pursuits with lego genomes. Curr. Opin. Genet. Dev. 5, 717724.
  • Naranjo, T., Roca, A., Goicoechea, P.G. and Giraldez, R. (1987) Arm homoeology of wheat and rye chromosomes. Genome, 29, 873882.
  • Nga, N.T.T., Hau, V.T.B. and Tosa, Y. (2009) Identification of genes for resistance to a Digitaria isolate of Magnaporthe grisea in common wheat cultivars. Genome, 52, 801809.
  • Paterson, A.H., Bowers, J.E., Bruggmann, R. et al. (2009) The Sorghum bicolor genome and the diversification of grasses. Nature, 457, 551556.
  • Paull, J.G., Chalmers, K.J., Karakousis, A., Kretschmer, J.M., Manning, S. and Langridge, P. (1998) Genetic diversity in Australian wheat varieties and breeding material based on RFLP data. Theor. Appl. Genet. 96, 435446.
  • Paux, E., Sourdille, P., Salse, J. et al. (2008) A physical map of the 1-gigabase bread wheat chromosome 3B. Science, 322, 101104.
  • Qi, L.L., Echalier, B., Chao, S. et al. (2004) A chromosome bin map of 16,000 expressed sequence tag loci and distribution of genes among the three genomes of polyploid wheat. Genetics, 168, 701712.
  • Šafář, J., Šimková, H., Kubaláková, M., Číhalíková, J., Suchánková, P., Bartoš, J. and Doležel, J. (2010) Development of chromosome-specific BAC resources for genomics of bread wheat. Cytogenet. Genome Res. 129, 211223.
  • Salse, J. and Feuillet, C. (2007) Comparative genomics of cereals. In Genomics-Assisted Crop Improvement (Rajeev, K. and Varshney, R.T., eds). New York: Springer, pp. 177205.
  • Sarkar, P. and Stebbins, G.L. (1956) Morphological evidence concerning the origin of the B genome in wheat. Am. J. Bot. 43, 297304.
  • Sears, E.R. and Sears, L.M.S. (1978) The telocentric chromosomes of common wheat. In Proceedings of the 5th International Wheat Genetics Symposium (Ramanujam, S. ed.). New Dehli: Indian Soc. Genet Plant Breed, pp. 389407.
  • Šimková, H., Svensson, J.T., Condamine, P., Hřibová, E., Suchánková, P., Bhat, P.R., Bartoš, J., Šafář, J., Close, T.J. and Doležel, J. (2008) Coupling amplified DNA from flow-sorted chromosomes to high-density SNP mapping in barley. BMC Genomics, 9, 294.
  • Sourdille, P., Cadalen, T., Gay, G., Gill, B. and Bernard, M. (2002) Molecular and physical mapping of genes affecting awning in wheat. Plant Breeding, 121, 320324.
  • Talbert, L.E., Bruckner, P.L., Smith, L.Y., Sears, R. and Martin, T.J. (1996) Development of PCR markers linked to resistance to wheat streak mosaic virus in wheat. Theor. Appl. Genet. 93, 463467.
  • Vrána, J., Kubaláková, M., Šimková, H., Číhalíková, J., Lysák, M.A. and Doležel, J. (2000) Flow sorting of mitotic chromosomes in common wheat (Triticum aestivum L.). Genetics, 156, 20332041.
  • Wang, G.Z., Miyashita, N.T. and Tsunewaki, K. (1997) Plasmon analyses of Triticum (wheat) and Aegilops: PCR-single-strand conformational polymorphism (PCR-SSCP) analyses of organellar DNAs. Proc. Natl Acad. Sci. USA, 94, 1457014577.
  • Wicker, T., Mayer, K.F., Gundlach, H. et al. (2011) Frequent gene movement and pseudogene evolution is common to the large and complex genomes of wheat, barley, and their relatives. Plant Cell, 23, 17061718.

Supporting Information

  1. Top of page
  2. Summary
  3. Introduction
  4. Results
  5. Discussion
  6. Conclusion
  7. Experimental Procedures
  8. Acknowledgements
  9. References
  10. Supporting Information

Table S1. GenomeZipper analysis of wheat chromosome 4A.

As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer-reviewed and may be re-organized for online delivery, but are not copy-edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.

FilenameFormatSizeDescription
TPJ_4808_sm_Supportinginformationlegend.doc26KSupporting info item
TPJ_4808_sm_TableS1.xls4939KSupporting info item

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.