SEARCH

SEARCH BY CITATION

Keywords:

  • breeding;
  • crops;
  • genetics;
  • genomics;
  • resequencing

Abstract

  1. Top of page
  2. Abstract
  3. I. Genomics and crop improvement
  4. II. Complexity of plant genomes
  5. III. Evolution of genome sequencing
  6. IV. Future of genome sequencing
  7. V. Application of genomics for crop improvement
  8. VI. Unlocking the potential of genetic diversity through genomic approaches
  9. Acknowledgements
  10. References

Contents

 Summary915
I.Genomics and crop improvement915
II.Complexity of plant genomes916
III.Evolution of genome sequencing917
IV.Future of genome sequencing919
V.Application of genomics for crop improvement920
VI.Unlocking the potential of genetic diversity through genomic   approaches922
 Acknowledgements923
 References923

Summary

Many challenges face plant scientists, in particular those working on crop production, such as a projected increase in population, decrease in water and arable land, changes in weather patterns and predictability. Advances in genome sequencing and resequencing can and should play a role in our response to meeting these challenges. However, several barriers prevent rapid and effective deployment of these tools to a wide variety of crops. Because of the complexity of crop genomes, de novo sequencing with next-generation sequencing technologies is a process fraught with difficulties that then create roadblocks to the utilization of these genome sequences for crop improvement. Collecting rapid and accurate phenotypes in crop plants is a hindrance to integrating genomics with crop improvement, and advances in informatics are needed to put these tools in the hands of the scientists on the ground.


I. Genomics and crop improvement

  1. Top of page
  2. Abstract
  3. I. Genomics and crop improvement
  4. II. Complexity of plant genomes
  5. III. Evolution of genome sequencing
  6. IV. Future of genome sequencing
  7. V. Application of genomics for crop improvement
  8. VI. Unlocking the potential of genetic diversity through genomic approaches
  9. Acknowledgements
  10. References

Genomics, as a scientific era, is relatively new. Advances in biology and molecular genetics technology predate the advent of genomics, for example development of cloning vectors by Paul Berg (Jackson et al., 1972), but the inception of genomics was coincident with the genesis of the human genome project (HGP), which was conceptualized and endorsed at a US Department of Energy-sponsored meeting in Santa Fe, NM (USA), in 1986. The draft sequence of the human genome arrived nearly 17 yr later (Lander et al., 2001). In many ways, the HGP was similar to the ‘race to the moon’ of the 1960s, in that it spurred technological advances that have had an impact beyond the human genome. Advances in cloning, robotics, DNA preparation, automation of DNA sequencing, computing, and informatics have led to a democratization of genomics such that producing a genome sequence is now affordable and conceivable for many, if not all, crop genomes.

Some of the questions for crop genomics are: how to sequence a crop genome; when is a sequence complete (complete enough for use); is the community prepared for a genome sequence (how will the community use it and how will it be maintained and curated?); and how best to integrate genomics with crop improvement, which in the end is the real goal? Sequencing of plant and crop genomes is becoming routine; however, the results are at various levels of completeness. Arabidopsis thaliana was the first plant genome to be sequenced (Initiative, 2000), even predating the completed human genome. And, to date, it is probably the best sequenced and assembled plant genome in terms of completeness. Rice (IRGSP, 2005) followed quickly, along with a litany of other plant/crop genomes (Ming et al., 2008; Paterson et al., 2009; Schnable et al., 2009; Schmutz et al., 2010). So, to some extent, the question has turned from ‘How do I sequence my genome of interest?’ to ‘What do I do with all this sequence’? How does one make sense of the mountain of sequence data and turn it into something useful for crop improvement? A further consideration is the preparedness or sophistication of the end-user community to effectively implement genome-based tools into their improvement pipelines.

Crop improvement has been described as an ‘art’ (Lewis, 1945; Crow, 2001), and to some extent this is still the case. However, in the face of mounting global challenges the ‘art’ needs to become empowered such that yield gains are predictable and adaptable to challenging environments. The global population is predicted to increase by as much as 50% by 2050 (United Nations, 2004), so crop production will also have to increase. However, the arable land is predicted to either stay static or even decrease over the same time-frame (Bouwman, 1997) and more marginal land will be need to be used for crop production. Exacerbating this problem is that water and crop production agents (fertilizer, in particular) will become limiting and the vagaries of climate change and shifts in weather patterns will lead to greater unpredictability in the target environments. Thus, the challenge to crop scientists and geneticists/breeders, in particular, is to take advantage of all available tools in order to tackle these issues. Genomics will be a key part of the crop improvement toolbox.

Breeding is an evolving science that traces back to the possibly unintentional domestication of plants and animals c. 10 000 yr ago, and to the breeding/hybridization societies and later Mendel and the elucidation of the laws of genetics which later merged with quantitative genetics to bring science to bear on breeding. The application of science to the breeding process was incredibly successful, as evidenced by the development of hybrid seed and the seed business (Crow, 1998) and the Green Revolution (Evenson & Gollin, 2003). Crop breeding follows a general cycle: evaluation of phenotypes (genetic diversity); selection of superior phenotypes; crossing; and back to evaluation, restarting the process. Out of this process come superior genotypes that can be tested and developed into varieties. The process is much more complex, though, as many types of phenotypes have to be evaluated (e.g. disease resistance, stress adaptation, yield, quality, etc.). Until the 1980s this process was done almost entirely at the phenotypic level with little consideration of the underlying genetic processes that contributed to the various phenotypes. With genomics, it is possible to identify all the genes in a plant and then to begin to understand the genetic properties and networks that contribute to the development of a superior plant; however, even with these tools, breeding a better variety is still a complicated process. But genomics tools and technological advances will continue to increase the rate of gain from breeding and the precision by which superior genotypes are chosen, and will be a major player in the production of enough food for a growing world population.

II. Complexity of plant genomes

  1. Top of page
  2. Abstract
  3. I. Genomics and crop improvement
  4. II. Complexity of plant genomes
  5. III. Evolution of genome sequencing
  6. IV. Future of genome sequencing
  7. V. Application of genomics for crop improvement
  8. VI. Unlocking the potential of genetic diversity through genomic approaches
  9. Acknowledgements
  10. References

Early targets for genome sequencing, apart from the human genome, were genomes that were relatively small and, therefore, easier and less expensive to sequence, such as Arabidopsis– a five-chromosome, c. 120 Mbp haploid genome plant (Meinke et al., 1998). In fact, Arabidopsis is much smaller than most plant genomes, by orders of magnitude. The average plant genome is > 6000 Mbp per haploid genome for angiosperms (Gregory et al., 2007), approximately twice the size of the human genome. Many economically important plant genomes are even larger. Wheat, for instance, is c. 15 Gbp per haploid genome and pine has at least a 26 Gbp genome (Valkonen et al., 1993). Genome size is one contributor to plant genome complexity; other contributors include polyploidy and repetitive DNA sequences, and, in particular, transposable elements. Together, these attributes of plant genomes increase the cost of sequencing and negatively impact the quality of the resulting sequence, especially as the field migrates from map-based sequencing (to be described later) to short-read whole-genome shotgun (WGS) sequencing.

Two primary factors that contribute to plant genome size and complexity are polypoidy and repetitive DNA sequences (crop examplars shown in Fig. 1). Polyploidy is the accumulation of additional sets of chromosomes through either autopolyploidy, doubling of the same genome, or allopolyploidy, two diverged genomes in the same nucleus. Increased chromosome number and DNA content are immediate consequences of polyploidy, but depending on when the polyploidy event occurred, increased chromosome number may not be immediately apparent as ancient polyploidy events are likely to be shared by sister taxa and/or diploidization of chromosome number may have occurred (reduction of chromosome number via loss and rearrangements). Most, if not all, land plants have undergone polyploidy events at various times in their evolution (reviewed in Soltis et al., 2004). For example, soybean (Glycine max) has undergone at least three polypoid events that, as a consequence of having a high-quality genome sequence (Schmutz et al., 2010), can now be examined. The first, and most difficult, event, to detect was one early in plant evolution shared by many land plants (Bowers et al., 2003). The second event was c. 45–55 million yr ago (Mya) and should be shared with legumes that diverged after that event, such as Medicago (Cannon et al., 2006). The most recent event, c. 5 Ma, was most likely an allopolyploid event (Gill et al., 2009) that was coincident with the emergence of the Glycine genus (Innes et al., 2008). Thus, the 1.1 Gbp soybean genome has relics of at least three polyploidy events that resulted in a genome that is a mosaic of duplicated segments (Schmutz et al., 2010). In the Glycine genus, however, there is an even more recent allopolyploid event that occurred in perennial species found in Australia (Doyle et al., 2002). Thus, polyploidy is a recurrent process that molds and shapes plant genomes during evolution.

image

Figure 1. Diagram of the evolutionary relationship of several major crop species showing polyploidy events (red octagons), relative genome sizes to rice (size of circles), and percentage of transposable elements (color of circle). The estimated time of divergence and polyploid events are shown based on the literature (Gaut & Doebley, 1997; Huang et al., 2002; Blanc & Wolfe, 2004; Paterson et al., 2004, 2009; Schlueter et al., 2004; Swigonova et al., 2004; IRGSP, 2005; Jaillon et al., 2007; Zhu et al., 2008; Schnable et al., 2009; Wicker et al., 2009; Abrouk et al., 2010; Choulet et al., 2010; Schmutz et al., 2010). Mya, million years ago.

Download figure to PowerPoint

In addition to polyploidy, repetitive DNA sequences and, in particular, transposable elements (TEs) compose large fractions of most plant genomes and are impediments to efficient genome sequencing. TEs have been reviewed in depth (Bennetzen et al., 2005); here we will focus only on contribution to genome obesity in plants and organization in plant genomes as it contributes to obtaining accurate genome sequences. There are several instances of rapid amplification of a few TE families that have resulted in increased genome size. Oryza australiensis, for example, is approximately twice the size of its nearest relatives as a result of the amplification of three TE families (Piegu et al., 2006). Maize is the most prominent example of genome obesity resulting from TE amplification (SanMiguel et al., 1996).

The complicating factor of TE amplification on genome sequencing is not primarily the increase in the amount of DNA to sequence, but rather the effect of many copies of the same sequence throughout the genome that make mapping and assembly difficult. If a TE family amplified recently, it can have thousands of copies scattered throughout the genome, all with very high sequence identity. If a genome is sequenced via a shotgun approach, as most genomes are nowadays, then these highly similar TEs will complicate assembly unless there are sufficient mate-pair reads that span the repeats (Fig. 2).

image

Figure 2. Diagram of whole genome shotgun (WGS) sequencing. The chromosome segment is shown across the bottom in black with transposable elements (TEs) in orange and short-read sequences above. Short reads are idealized in both coverage and arrangement. (a) Assembly of short reads results in three unordered sequence contigs, as reads from TEs cannot be placed unambiguously. (b) Using a variety of larger pieces of DNA (2–4, 6–10 and 20 Kbp shown) with mate pair sequences, the TEs can be spanned and ordered contigs result that can then be placed in a chromosomal context. inline image, repetitive DNA sequences, > 98% sequence identity; inline image, shotgun sequences.

Download figure to PowerPoint

III. Evolution of genome sequencing

  1. Top of page
  2. Abstract
  3. I. Genomics and crop improvement
  4. II. Complexity of plant genomes
  5. III. Evolution of genome sequencing
  6. IV. Future of genome sequencing
  7. V. Application of genomics for crop improvement
  8. VI. Unlocking the potential of genetic diversity through genomic approaches
  9. Acknowledgements
  10. References

Plant genome sequencing methodology paralleled the sequencing of the human genome. Arabidopsis was the first plant genome completed, in 2000 (Initiative, 2000), concurrently with the draft of the human genome project HGP. The Arabidopsis project, like the HGP, was a multi-laboratory, multi-nation collaborative project that worked together to produce a clone-by-clone sequenced and finished genome. Currently, according to the TAIR 9 release (http://www.arabidopsis.org), the genome includes 119.1 Mb of sequence. Typically, these early clone-by-clone projects started with a physical map of cosmid or bacterial artificial chromosome (BAC) clones (Mozo et al., 1999) assembled with FingerPrinted Contigs software (FPC; Soderlund et al., 1997) to form clone contigs, from which a tiling path of clones could be selected to cover the mapped genome space. Once selected, these clones could then form the basis for a large-scale distributed sequencing project and be stitched back together into chromosome-scale sequences. The rice genome (IRGSP, 2005) was just such a project and the latest release, MSU6 (http://www.gramene.org), covers 373.2 Mbp of the clonable portion of the genome.

In the wake of the successful completion of the rice genome sequence, several crop clone-by-clone plant projects were begun; however, few are yet been completed. The principal difficulty with completing these clone-based BAC projects has not been sequencing throughput, which increased dramatically over the course of the HGP, but rather the difficulty of coordinating mapping and sequencing, and meshing the progress with funding, which has caused progress on these projects to stutter over their project lifetimes. In addition, genome centers were adopting strategies developed from the sequencing of vertebrates, including the application of WGS sequencing approaches, proposed by E. Myers (Weber & Myers, 1997) and popularized by Celera.

Whole-genome shotgun sequencing promised to rapidly accelerate the acquisition of genome space by eliminating the massive library making steps required in a clone-by-clone approach and, to a certain extent, eliminating the physical mapping steps, at the cost of fidelity in the repetitive portions of the genomes and at the added cost of significant computational steps to reconstruct the genome sequence (Batzoglou et al., 2002). Essentially, the WGS strategy entails making several different-sized inserts from genomic DNA, which are then sequenced from both ends (Fig. 2). These sequences are then compared by a computer algorithm, which attempts to reconstruct a single linear piece of DNA sequence, constrained by the estimated size difference between the end reads. This strategy is, by its nature, much faster than BAC clone library-based sequencing; however, it carries significant additional potential failure points and an increased degree of difficulty in order to produce a complete genome sequence. WGS does have the advantage of allowing one to see all of the genomic sequence from an organism at once, rather than relying on a narrower view of the assembled, mapped clone contigs.

Early attempts at WGS sequencing of plants were moderately successful (poplar (Tuskan et al., 2006), Chlamydomonas (Merchant et al., 2007), grapevine (Jaillon et al., 2007), Physcomitrella (Rensing et al., 2008) and papaya (Ming et al., 2008)) and paved the way for the high-point of Sanger-based reference plant genomes produced in the previous 3 yr. We have seen the completion of draft WGS genome plant sequence projects of two key crop species, Sorghum bicolor (Paterson et al., 2009) and G. max (Schmutz et al., 2010) and the completion and annotation of at least nine other high-quality reference WGS drafts all produced at the Department of Energy Joint Genome Institute (see http://www.phytozome.net/). For the most part, these WGS assemblies include BAC end sequenced libraries and are assembled into pseudomolecules to facilitate use of these genomes as references. However, the largest plant reference sequence project produced to date was not a WGS, but rather the clone-by-clone maize (Schnable et al., 2009) genome sequencing project, begun in 2005 and completed in 2009, which covers 2 Gbp of the 3 Gbp of the Zea maize genome.

Even as the vast majority of plant genome reference sequences were being produced in the last few years, there has again been a shift in plant genome sequencing. With the introduction of short-read pyrosequencing, for example, 454 Life Sciences (Margulies et al., 2005), and sequencing by synthesis systems (for a review see (Fuller et al., 2009)), another shift in producing de novo references for plant genomes is occurring. These new systems, particularly the Illumina (previously Solexa) platform, can produce sequence data 100 times more cheaply than the Sanger-based technology used to produce the majority of plant references available today. These new projects usually include a mix of Sanger-based, long-insert fosmid or BAC end sequences combined with short contigs assembled from pyrosequencing or sequencing by synthesis systems. Much like the shift from clone-by-clone to a WGS sequencing strategy, these approaches with next-generation sequencing (NGS) have resulted in a proliferation of ongoing sequencing projects, most of which will produce references of significantly less accuracy and poorer completeness than the Sanger projects that have come before. This is because of the short length of sequence reads, lack of adequate pairing methodologies for all of the NGS platforms, and a bias against AT-rich sequences; all of these issues cause significant problems in reconstructing plant genome sequences.

To date, there have been two published plant genomes that combine pyrosequencing and Sanger-based sequencing (cucumber (Huang et al. 2009) and apple (Velasco et al., 2010)), neither of which has approached the quality and completeness of previous Sanger sequenced genomes. With the current NGS methodologies, producing such a reference is not possible. Owing to the extremely low cost of sequencing by synthesis there is already some movement towards producing plant genome sequences based solely on sequencing by synthesis; the only published example in the vertebrate world is the panda genome (Li et al., 2009). It is important to note here that SBS provides the ability to sample, at low cost, the unique space of a genome in question. However, it remains to be seen how successfully these short-read de novo WGS strategies will contribute to crop-based scientific goals.

IV. Future of genome sequencing

  1. Top of page
  2. Abstract
  3. I. Genomics and crop improvement
  4. II. Complexity of plant genomes
  5. III. Evolution of genome sequencing
  6. IV. Future of genome sequencing
  7. V. Application of genomics for crop improvement
  8. VI. Unlocking the potential of genetic diversity through genomic approaches
  9. Acknowledgements
  10. References

As NGS sequencing capability is rapidly expanding and the major producers of Sanger sequence have been reducing their capacity, we are unlikely to continue to see new Sanger-based plant reference genomes. In the near future, hybrid methods based on pyrosequencing and Sanger long pairs will likely continue to be produced. The products will capture the majority of gene space from a genome, but will suffer from unresolved repetitive sequences and a tendency to miss a substantial portion of the genetic code for an organism. These hybrid projects will likely only be a focus for a short time as the sequencing community moves toward producing de novo genomes based on sequencing by synthesis and, in particular, the Illumina sequencing platform. Although the Illumina platform was principally developed to resequence human genomes, many groups are working to adapt a genome strategy to take advantage of its incredibly low cost to produce data. In the past 2 yr, there has been a resurgence of de novo assembly algorithm development, a movement not seen since WGS was adopted by the major genome sequencing centers (Abyss (Simpson et al., 2009), Velvet (Zerbino & Birney, 2008), Allpaths (Butler et al., 2008), SOAPdenovo (http://soap.genomics.org.cn)). The widespread availability of these SBS machines and code to string together short reads into larger contigs and scaffolds, combined with the lower cost of data collections, has catalyzed plant genomics researchers to attempt genomic projects that were once viewed as impossible. It remains to be seen how complete and useful Illumina WGS sequenced plant genomes will be, as none of these projects has yet been brought to a conclusion that resembles a typical reference genome sequence.

Crop genomes will likely be particularly challenging for short-read-based sequencing, as in addition to the normal difficulties of plant repetitive sequences, many planted crop varieties can also have recent polyploidy events and high polymorphism rates. Therefore, simple-genome crop model plants are likely to be more amenable to short-read WGS sequencing. However, sequencing of polyploids, such as wheat, peanut or coffee, will be done using NGS and several approaches can be use to assemble the short-read sequence contigs/scaffolds into longer-range scaffolds that might represent individual chromosomes. The divergence time between progenitor genomes in the polyploidy will have an impact on WGS, as genomes that are not very diverged (e.g. autotetraploid or a recent allopolyploid) will confound sequence assembly as NGS short reads from the subgenomes may not be diverged enough to assign unambiguously. This is ignoring the problem of heterozygosity where the sequencing of highly heterozygous plants will essentially result in the assembly of haplotypes, and thus the sequencing depth will have to be higher to obtain sufficient coverage of each chromosome. There will still be problems in constructing completely ordered haplotypes/chromosomes as some regions will be identical (e.g. identical by descent, IBD) and will collapse the two haplotypes. However, there are new technologies on the horizon, or in maturation, that my help to solve some of these problems. Physical maps of large insert clones with restriction site sequence information may be used to help align and order NGS sequence contigs into longer-range scaffolds (van Oeveren et al., 2011), and optical mapping (essentially an ordered restriction map of a genome; Zhou et al., 2009) may be useful to align sequence scaffolds to larger chromosome regions. Perhaps even more promising are new single molecular sequencing technologies that can produce sequence reads that are tens of kilobases long (http://www.pacificbiosciences.com). These long reads may be useful to ‘string’ together the small sequence scaffolds from short-read WGS.

In the near term for crops, we will see the generation of large single nucleotide polymorphism (SNP) resources and the ability to resequence large germplasm collections for crops with reference genomes. Even without a reference genome, we will have the ability to carry out phenotypic studies in populations with direct short-read sequencing to identify collections of segregating alleles that can be used as functional markers or breeding targets. Even though, in the short term, we are likely to see a reduction in reference plant genomes, there is hope that as new technologies are developed, the ability to recover full genome sequences will once again become commonplace. New technologies such as single molecule sequencers (Pacific Biosciences (Eid et al., 2009) or nanopore technology (Clarke et al., 2009)) may provide us with the ability, once again, to generate the long, accurate sequence reads that gave us the current abundance of plant reference genomes that is necessary for truly detailed genome analysis and layering on additional functional information such as epigenetic marks, resequencing of diverse lines, etc.

V. Application of genomics for crop improvement

  1. Top of page
  2. Abstract
  3. I. Genomics and crop improvement
  4. II. Complexity of plant genomes
  5. III. Evolution of genome sequencing
  6. IV. Future of genome sequencing
  7. V. Application of genomics for crop improvement
  8. VI. Unlocking the potential of genetic diversity through genomic approaches
  9. Acknowledgements
  10. References

The inescapable threat of global climate change with its associated fluctuations in patterns of drought, heat, and flooding brings new challenges to strategies of crop improvement. The burgeoning world population that accompanies this new era for humankind makes the finding of solutions to improving food, feed, fuel and fiber more urgent than at any time in our history. Although traditional plant breeding has produced impressive gains in world food production and safety, it is unlikely that unassisted breeding will be up to the challenges at hand.

Marker-assisted selection (MAS) offers a method by which selection for specific traits can be greatly accelerated. However, many important crop traits are polygenic, have low heritability, and, by their nature, possess large genotype × environment (G × E) interactions (Fleury et al., 2010). Consequently, MAS is most successful with relatively simple traits and those inherited in a Mendelian fashion (Bouchez et al., 2002). However, genomic selection (Meuwissen et al., 2001) paired with the increase in the resolution markers and the decrease in cost, will lead to improved breeding strategies that use large amounts of genomic information, paired with estimated breeding values assigned to markers/haplotypes to expedite the breeding process and increase the rate of gain. Recent work in maize and rice will lead to the widespread application of this approach in plant improvement. In maize, high-resolution mapping in a large number of families for flowering time, a quantitative trait, uncovered a large number of small-effect quantitative trait loci (QTLs) that acted in an additive fashion to determine flowering time (Buckler et al., 2009). In rice, resequencing and phenotyping of > 500 lines allowed the identification of a large number of QTLs controlling 14 different agronomic traits (Huang et al., 2010). It is easy to see how marker information for a large number of small- and even medium- to large-effect QTLs could expedite the selection process.

The genetic mapping of QTLs has been ongoing for many years. Identification of QTLs for a given trait is relatively straightforward as long as the population in which your data are collected possesses genetic variation for the trait. These QTLs tell us within a statistical range the region of a chromosome in which gene(s) affecting the trait are likely to reside. Ultimately, though, to fully take advantage of the QTLs, it is necessary to determine the identity of the genes responsible for the variation in the trait (Mackay, 2001) and to understand the molecular basis of the QTLs (Hansen et al., 2008). The integration of genetic maps with associated QTLs, physical maps, and whole-genome sequence is a necessary aid to making these connections (a partial example is shown in Fig. 3).

image

Figure 3. Integration of genetic, physical and sequence maps to identify genes underlying important agronomic traits. QTL, quantitative trait locus.

Download figure to PowerPoint

The application of whole-genome transcriptome analyses, by either micorarray or various sequencing approaches, is beginning to provide us with an understanding of the regulation of gene expression (Hansen et al., 2008). Through transcriptomic analyses, the interactions of biochemical pathways and of QTLs are beginning to be uncovered. Indeed, a regulatory network governing flowering time has been successfully constructed in Arabidopsis using whole-genome expression analyses (Keurentjes et al., 2007). The association of gene expression data from a variety of tissues and developmental stages (Severin et al., 2010) with gene models gives us a completely new layer of information from which to begin understand gene interactions and regulation (Fig. 4).

image

Figure 4. Transcript data used to support predicted gene models. Note the differential expression among tissues. daf, days after flowering.

Download figure to PowerPoint

The availability of high-quality whole-genome sequence assemblies for major crops such as soybean (Schmutz et al., 2010) and maize (Schnable et al., 2009) creates a paradigm shifting change in how we can approach crop improvement. We now have access to all of the many thousands of genes that make up an organism. Not only does this provide a wealth of candidate genes underlying important QTLs, but the sequence itself, as a framework, also provides a means by which a virtually unlimited amount of information may be mined. The cost of resequencing genomes has plunged and promises to decline even further. Resequencing of old varieties, landraces and even more newly released cultivars has the potential to uncover allelic diversity that has not been seen before, and to draw our attention to the regions of the genome that breeders have unknowingly focused upon in their traditional breeding efforts. These allelic differences provide a rich and nearly unlimited source of polymorphisms from which to create ‘perfect’ markers or to saturate specific regions of a genome (Hyten et al., 2010).

Genome sequence alone may not tell us much about where within the genome to focus our attention in breeding programs, but genome sequence coupled with transcriptomics may tell us a lot. The fusion of genetics and genomics to better understand gene function and gene interrelations has been termed ‘genetical genomics’ (Jansen & Nap, 2001). Expression QTLs (eQTLs) are detectable when genetic variation within the genome results in changes in transcript abundance (Potokina et al., 2008; Holloway & Li, 2010). When eQTLs are correlated with traditional QTL phenotypic information, the function of the allelic variation uncovered in genome sequencing projects may begin to be discerned (Holloway & Li, 2010).

Many of the underlying factors that control complex agronomic traits such as iron homeostasis, and consequently, that contribute to the QTLs for those traits, may be the result of transcription factors (O’Rourke et al., 2009). In support of this concept, two genes important in the domestication of maize (tb1 and tga1) were cloned and both were found to be transcription factors (Doebley et al., 1997; Wang et al., 2005). The magnitude of the effect of the factors responsible for eQTLs is determined by whether they are cis- or trans-acting (Hansen et al., 2008; Holloway & Li, 2010). Cis-acting factors reside in or near the gene responsible for the eQTLs, such as a polymorphism within a promoter, or internal to the gene itself. A trans-acting factor, however, resides at a location that is not co-located with the gene whose expression is measured (Hansen et al., 2008). Cis-acting factors generally have a stronger effect. A study in wheat using doubled haploid lines and a hybridization-based Affymetrix GeneChip recently uncovered a total of 542 distinct eQTLs contributing to seed development (Jordan et al., 2007). In this study, two chromosomes were found to have a rich source of trans-regulatory factors controlling this trait. Studies of physiological disease in rat using transcriptional profiling identified many cis-acting, monogenic traits that were good candidates to explain previously mapped physiological loci (Hubner et al., 2005). A set of 73 candidate genes for hypertension alone was identified. The same approaches can easily be carried out in plants.

The adaptation of genomic-scale data analyses requires new breakthroughs in statistics and modeling (Chenu et al., 2009). Our concepts of comparative genomics need to be advanced to include comparative functional analyses, and analyses of the so-called ‘interactome’. The role of epigenetics in plant responses to the environment is only beginning to be understood and will be greatly advanced by research in crops such as rice and maize (Raghuvanshi et al., 2010).

One of the limitations of the adaptation of genomic tools to crop improvement lies in our reliance upon a reductionist approach to asking questions. Plant responses to the environment are incredibly complex and intertwined, and only the simplest steps can be explained through a single-gene approach. In order to make progress in many of the complex traits with which we work, we will need to understand interactions between stress responses and interactions among biochemical pathways (Fleury et al., 2010) and adopt a ‘systems biology’ approach. An exciting approach is the advent of high-throughput phenomics for crop plants, whereby, under controlled conditions, large numbers of plants can be screened for precise measurements of many phenotypic traits (e.g. http://www.plantphenomics.com/partners/). Coupled with NGS-based genotyping, this may help to accelerate plant improvement.

VI. Unlocking the potential of genetic diversity through genomic approaches

  1. Top of page
  2. Abstract
  3. I. Genomics and crop improvement
  4. II. Complexity of plant genomes
  5. III. Evolution of genome sequencing
  6. IV. Future of genome sequencing
  7. V. Application of genomics for crop improvement
  8. VI. Unlocking the potential of genetic diversity through genomic approaches
  9. Acknowledgements
  10. References

Crop genetic diversity, which refers to variation of the genes within a crop species, is the basis of the ability of crops to adapt to changes in their environments and to respond to natural selection. As a result of domestication and modern practices of crop production with relatively few and genetically similar high-yielding cultivars, crop genetic diversity has declined dramatically (Tanksley & McCouch, 1997; Hyten et al., 2006). Crop populations with little genetic variation are more vulnerable to new diseases, insect pests, and global climate changes. Since long-term food security is threatened by the inability of crops to quickly adapt to rapidly changing conditions (Brown & Funk, 2008; Turner et al., 2009), great effort is being devoted to enhancing the genetic diversity of elite breeding pools using mutants, landraces, and/or wild species closely related to the cultivated crop.

Before the advent of molecular genetics, plant accessions were profiled on plant morphology and phenotypic traits (Gilbert et al., 1999; Hoisington et al., 1999). Pedigree and geographical distribution analyses were also used for measuring genetic diversity (Hammer 2003). A renewed impetus toward diversity analysis based on genotype rather than phenotype was made possible by the development of modern molecular marker techniques (Tanksley & McCouch, 1997). Various molecular markers have been developed (Gupta et al., 2001) and although these markers are useful for determination of phylogenetic relationships and population structure, QTL mapping, map-based cloning, and MAS (Moose & Mumm, 2008), they are not suitable for measuring adaptive genetic diversity. Therefore, diversity analysis should, more appropriately, be based on functional genes or whole-genome sequences.

Only a few plant/crop genomes have rich genome-based databases that incorporate many levels and types of information such as QTLs, expression data, mutants, physical maps, genetic markers, and genetic diversity, to name but a few (soybean, http://www.soybase.org; rice, http://www.gramene.org; Arabidopsis, http://www.tair.org). As more crop genomes are sequenced, the need for integrated databases will continue to grow in order to curate the genome sequences in such a fashion that facilitates crop improvement. At least two roadblocks exist here: lack of continued financial support for database development and maintenance and the perceived lack of intellectual contribution of database developers/mangers; and the decreasing quality of genome sequences that make it difficult to organize additional data on top of a genome sequence. The more fragmented is the genome, the more difficult it is to create a truly functional database with layered information.

The application of NGS technologies for resequencing, assuming a reference-like genome exists, is one of the most powerful applications for crop improvement. Resequencing requires a reference genome, whereas de novo assembly does not. However, de novo assembly of plant genomes using NGS with short-read lengths is not yet a suitable tool because of the high complexity of most plant genomes as a result of extensive duplication and the presence of repeat sequences (Varshney et al., 2009). Thus, NGS technologies may be widely applied for resequencing of species that have a complete reference genome sequence, primarily for identifying SNPs useful as DNA markers (Akhunov et al., 2009; Yan et al., 2010b; You et al., 2011), examination of selection patterns either in advanced populations or during domestication (Gore et al., 2009; McMullen et al., 2009), or finding functional alleles (Thornsberry et al., 2001; Yan et al., 2010a). Genome-wide SNP genotyping is a powerful tool for association mapping and evolutionary studies (Akhunov et al., 2009). Community-developed SNP panels often have limited utility in broader sets of germplasm; however, genotyping by sequencing will overcome these limitations and provide many more polymorphic markers (Huang et al., 2010). These NGS technologies and massively developed genome-wide markers, such as RAD (restriction site-associated DNA)-based markers (Baird et al., 2008), are also deployed for the construction of high-density maps and genetic diversity analysis (Gupta et al., 2008).

The availability of NGS and whole reference genome sequences for major crops such as rice, soybean, and maize provides unique opportunities for exploring DNA-level diversity among members of a crop species and its relationship to phenotypic diversity (Paterson et al., 2010). The ultimate goal for resequencing members within a species is to understand the molecular basis for phenotype–genotype relationships. Diversity panels of hundreds to thousands of genotypes selected to sample the spectrum of diversity in a given species with reference genome sequences using NGS technologies will provide a platform for understanding existing genetic diversity, associating gene(s) with phenotypes and exploiting natural genetic diversity to help develop superior genotypes. In order to do this effectively, extensive phenotypic data will need to be collected for the diversity panels in a given species and combined with resequencing data. Collecting phenotypic data is potentially the biggest stumbling block for effective utilization of genomics technologies in advanced plant improvement. The primary reason for this is that many, if not most, phenotypic traits require an experienced eye and a skilled hand to score them effectively and consistently. Consequently, phenomics (mass collection of phenotypes) has not kept pace with advances in genomics; moreover, fewer and fewer people are being trained in disciplines that can collect relevant phenotypes.

Perhaps most importantly, a new paradigm is needed to train the next generation of plant scientists. Plant scientists are needed who are able to think in a systems biology manner. Breeders are needed who can apply genomics and develop new phenomics technologies to truly advance the improvement process and take advantage of the potential of genomics. Ancillary to this is a need for computational sciences to integrate genomic and phenotypic data in advanced ways to allow one to make predictions, and rational crosses based on these predictions. Engineers engaged with plant scientists are needed create new platforms to rapidly and accurately collect phenotypes on thousands of plants at a time: phenotypes from disease to seed composition/quality and mineral content, to plant vigor and growth processes. Improvements are being made, but rapid and transformational advances are needed if we, as plant scientists, are going to meet the challenges facing the world. As is often the case with our tool-building species, our limitations may not be the result of a lack of technology, but of a lack of our comprehension as to how best to apply that technology.

Acknowledgements

  1. Top of page
  2. Abstract
  3. I. Genomics and crop improvement
  4. II. Complexity of plant genomes
  5. III. Evolution of genome sequencing
  6. IV. Future of genome sequencing
  7. V. Application of genomics for crop improvement
  8. VI. Unlocking the potential of genetic diversity through genomic approaches
  9. Acknowledgements
  10. References

Funding from the US National Science Foundation (BIO 0822258) and the Next-Generation BioGreen21 Program (No. PJ0081172011), Rural Development Administration, Republic of Korea helped in the preparation of this manuscript.

References

  1. Top of page
  2. Abstract
  3. I. Genomics and crop improvement
  4. II. Complexity of plant genomes
  5. III. Evolution of genome sequencing
  6. IV. Future of genome sequencing
  7. V. Application of genomics for crop improvement
  8. VI. Unlocking the potential of genetic diversity through genomic approaches
  9. Acknowledgements
  10. References
  • Abrouk M, Murat F, Pont C, Messing J, Jackson S, Faraut T, Tannier E, Plomion C, Cooke R, Feuillet C et al. 2010. Palaeogenomics of plants: synteny-based modelling of extinct ancestors. Trends in Plant Science 15: 479487.
  • Akhunov E, Nicolet C, Dvorak J. 2009. Single nucleotide polymorphism genotyping in polyploid wheat with the Illumina Goldengate assay. Theoretical and Applied Genetics 119: 507517.
  • Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, Selker EU, Cresko WA, Johnson EA. 2008. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS ONE 3: e3376.
  • Batzoglou S, Jaffe DB, Stanley K, Butler J, Gnerre S, Mauceli E, Berger B, Mesirov JP, Lander ES. 2002. ARACHNE: a whole-genome shotgun assembler. Genome Research 12: 177189.
  • Bennetzen JL, Ma J, Devos KM. 2005. Mechanisms of recent genome size variation in flowering plants. Annals of Botany 95: 127132.
  • Blanc G, Wolfe KH. 2004. Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell 16: 16671678.
  • Bouchez A, Hospital F, Causse M, Gallais A, Charcosset A. 2002. Marker-assisted introgression of favorable alleles at quantitative trait loci between maize elite lines. Genetics 162: 19451959.
  • Bouwman AF. 1997. Long-term scenarios of livestock-crop-land use interactions in developing countries. Land and Water Bulletin No. 6 . Rome, Italy: Food and Agriculture Organization of the United Nations.
  • Bowers JE, Chapman BA, Rong J, Paterson AH. 2003. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422: 433438.
  • Brown ME, Funk CC. 2008. Climate. Food security under climate change. Science 319: 580581.
  • Buckler ES, Holland JB, Bradbury PJ, Acharya CB, Brown PJ, Browne C, Ersoz E, Flint-Garcia S, Garcia A, Glaubitz JC et al. 2009. The genetic architecture of maize flowering time. Science 325: 714718.
  • Butler J, MacCallum I, Kleber M, Shlyakhter IA, Belmonte MK, Lander ES, Nusbaum C, Jaffe DB. 2008. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Research 18: 810820.
  • Cannon SB, Sterck L, Rombauts S, Sato S, Cheung F, Gouzy J, Wang X, Mudge J, Vasdewani J, Schiex T et al. 2006. Legume genome evolution viewed through the Medicago truncatula and Lotus japonicus genomes. Proceedings of the National Academy of Sciences, USA 103: 1495914964.
  • Chenu K, Chapman SC, Tardieu F, McLean G, Welcker C, Hammer GL. 2009. Simulating the yield impacts of organ-level quantitative trait loci associated with drought response in maize: a “gene-to-phenotype” modeling approach. Genetics 183: 15071523.
  • Choulet F, Wicker T, Rustenholz C, Paux E, Salse J, Leroy P, Schlub S, Le Paslier MC, Magdelenat G, Gonthier C et al. 2010. Megabase level sequencing reveals contrasted organization and evolution patterns of the wheat gene and transposable element spaces. Plant Cell 22: 16861701.
  • Clarke J, Wu HC, Jayasinghe L, Patel A, Reid S, Bayley H. 2009. Continuous base identification for single-molecule nanopore DNA sequencing. Nature Nanotechnology 4: 265270.
  • Crow JF. 1998. 90 years ago: the beginning of hybrid maize. Genetics 148: 923928.
  • Crow JF. 2001. Plant breeding giants. Burbank, the artist; Vavilov, the scientist. Genetics 158: 13911395.
  • Doebley J, Stec A, Hubbard L. 1997. The evolution of apical dominance in maize. Nature 386: 485488.
  • Doyle JJ, Doyle JL, Brown AH, Palmer RG. 2002. Genomes, multiple origins, and lineage recombination in the Glycine tomentella (leguminosae) polyploid complex: histone h3-d gene sequences. Evolution 56: 13881402.
  • Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B et al. 2009. Real-time DNA sequencing from single polymerase molecules. Science 323: 133138.
  • Evenson RE, Gollin D. 2003. Assessing the impact of the green revolution, 1960 to 2000. Science 300: 758762.
  • Fleury D, Jefferies S, Kuchel H, Langridge P. 2010. Genetic and genomic tools to improve drought tolerance in wheat. Journal of Experimental Botany 61: 32113222.
  • Fuller CW, Middendorf LR, Benner SA, Church GM, Harris T, Huang X, Jovanovich SB, Nelson JR, Schloss JA, Schwartz DC et al. 2009. The challenges of sequencing by synthesis. Nature Biotechnology 27: 10131023.
  • Gaut BS, Doebley JF. 1997. DNA sequence evidence for the segmental allotetraploid origin of maize. Proceedings of the National Academy of Sciences, USA 94: 68096814.
  • Gilbert JE, Lewis RV, Wilkinson MJ, Caligari PDS. 1999. Developing an appropriate strategy to assess genetic variability in plant germplasm collections. Theoretical and Applied Genetics 98: 11251131.
  • Gill N, Findley S, Walling JG, Hans C, Ma J, Doyle J, Stacey G, Jackson SA. 2009. Molecular and chromosomal evidence for allopolyploidy in soybean. Plant Physiology 151: 11671174.
  • Gore MA, Chia JM, Elshire RJ, Sun Q, Ersoz ES, Hurwitz BL, Peiffer JA, McMullen MD, Grills GS, Ross-Ibarra J et al. 2009. A first-generation haplotype map of maize. Science 326: 11151117.
  • Gregory TR, Nicol JA, Tamm H, Kullman B, Kullman K, Leitch IJ, Murray BG, Kapraun DF, Greilhuber J, Bennett MD. 2007. Eukaryotic genome size databases. Nucleic Acids Research 35: D332D338.
  • Gupta PK, Roy JK, Prasad M. 2001. Single nucleotide polymorphisms: a new paradigm for molecular marker technology and DNA polymorphism detection with emphasis on their use in plants. Current Science India 80: 524535.
  • Gupta PK, Rustgi S, Mir RR. 2008. Array-based high-throughput DNA markers for crop improvement. Heredity 101: 518.
  • Hammer K. 2003. A paradigm shift in the discipline of plant genetic resources. Genetic Resources and Crop Evolution 50: 310.
  • Hansen BG, Halkier BA, Kleibenstein DA. 2008. Identifying the molecular basis of QTLs: eQTLs add a new dimension. Trends in Plant Science 13: 13601385.
  • Hoisington D, Khairallah M, Reeves T, Ribault J-M, Skovmand B, Taba S, Warburton M. 1999. Plant genetic resources: what can they contribute toward increased crop productivity? Proceedings of the National Academy of Sciences, USA 96: 59375943.
  • Holloway B, Li B. 2010. Expression QTLs: applications for crop improvement. Molecular Breeding 26: 381391.
  • Huang S, Li R, Zhang Z, Li L, Gu X, Fan W, Lucas WJ, Wang X, Xie B, Ni P et al. 2009. The genome of the cucumber, Cucumis sativus L. Nature Genetics 41: 12751281.
  • Huang S, Sirikhachornkit A, Su X, Faris J, Gill B, Haselkorn R, Gornicki P. 2002. Genes encoding plastid acetyl-CoA carboxylase and 3-phosphoglycerate kinase of the Triticum/Aegilops complex and the evolutionary history of polyploid wheat. Proceedings of the National Academy of Sciences, USA 99: 81338138.
  • Huang X, Wei X, Sang T, Zhao Q, Feng Q, Zhao Y, Li C, Zhu C, Lu T, Zhang Z et al. 2010. Genome-wide association studies of 14 agronomic traits in rice landraces. Nature Genetics 42: 961967.
  • Hubner N, Wallace CA, Zimdahl H, Petretto E, Schulz H, Maciver F, Mueller M, Hummel O, Monti J, Zidek V. 2005. Integrated transcriptional profiling and linkage analysis for identification of genes underlying disease. Nature Genetics 37: 243253.
  • Hyten DL, Cannon SB, Song Q, Weeks NT, Fickus EW, Shoemaker R, Specht JE, May GD, Cregan PB. 2010. High-throughput SNP discovery through deep resequencing of a reduced representation library to anchor and orient scaffolds in the soybean whole genome sequence. BMC Genomics 11: 38.
  • Hyten DL, Song Q, Zhu Y, Choi I-Y, Nelson RL, Costa JM, Specht JE, Shoemaker RC, Cregan PB. 2006. Impacts of genetic bottlenecks on soybean genome diversity. Proceedings of the National Academy of Sciences, USA 103: 1666616671.
  • Initiative TAG. 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796815.
  • Innes RW, Ameline-Torregrosa C, Ashfield T, Cannon E, Cannon SB, Chacko B, Chen NW, Couloux A, Dalwani A, Denny R et al. 2008. Differential accumulation of retroelements and diversification of NB-LRR disease resistance genes in duplicated regions following polyploidy in the ancestor of soybean. Plant Physiology 148: 17401759.
  • IRGSP. 2005. The map-based sequence of the rice genome. Nature 436: 793800.
  • Jackson DA, Symons RH, Berg P. 1972. Biochemical method for inserting new genetic information into DNA of simian virus 40: circular sv40 DNA molecules containing lambda phage genes and the galactose operon of Escherichia coli. Proceedings of the National Academy of Sciences, USA 69: 29042909.
  • Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A, Choisne N, Aubourg S, Vitulo N, Jubin C et al. 2007. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449: 463467.
  • Jansen RC, Nap J-P. 2001. Genetical genomics: the added value from segregation. Trends in Genetics 17: 388391.
  • Jordan MC, Somers DJ, Banks TW. 2007. Identifying regions of wheat genome controlling seed development by mapping expression quantitative trait loci. Plant Biotechnology Journal 8: 442453.
  • Keurentjes JJB, Fu J, Terpstra IR, Garcia JM, van den Ackerveken G, Basten Snoek L, Peeters AJM, Vreugdenhil D, Koornneef M, Jansen RC. 2007. Regulatory network construction in arabidospsi by using genome-wide gene expression quantitative trait loci. Proceedings of the National Academy of Sciences, USA 104: 17081713.
  • Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W et al. 2001. Initial sequencing and analysis of the human genome. Nature 409: 860921.
  • Lewis D. 1945. The science of plant breeding. Nature 155: 355356.
  • Li R, Fan W, Tian G, Zhu H, He L, Cai J, Huang Q, Cai Q, Li B, Bai Y et al. 2009. The sequence and de novo assembly of the giant panda genome. Nature 463: 311317.
  • Mackay TFC. 2001. The genetic architecture of quantitative traits. Annual Review of Genetics 35: 303339.
  • Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z et al. 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437: 376380.
  • McMullen MD, Kresovich S, Villeda HS, Bradbury P, Li H, Sun Q, Flint-Garcia S, Thornsberry J, Acharya C, Bottoms C et al. 2009. Genetic properties of the maize nested association mapping population. Science 325: 737740.
  • Meinke DW, Cherry JM, Dean C, Rounsley SD, Koornneef M. 1998. Arabidopsis thaliana: a model plant for genome analysis. Science 282: 662, 679–682.
  • Merchant SS, Prochnik SE, Vallon O, Harris EH, Karpowicz SJ, Witman GB, Terry A, Salamov A, Fritz-Laylin LK, Marechal-Drouard L et al. 2007. The chlamydomonas genome reveals the evolution of key animal and plant functions. Science 318: 245250.
  • Meuwissen THE, Hayes BJ, Goddard ME. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 18191829.
  • Ming R, Hou S, Feng Y, Yu Q, Dionne-Laporte A, Saw JH, Senin P, Wang W, Ly BV, Lewis KL et al. 2008. The draft genome of the transgenic tropical fruit tree papaya (Carica papaya linnaeus). Nature 452: 991996.
  • Moose SP, Mumm RH. 2008. Molecular plant breeding as the foundation for 21st century crop improvement. Plant Physiology 147: 969977.
  • Mozo T, Dewar K, Dunn P, Ecker JR, Fischer S, Kloska S, Lehrach H, Marra M, Martienssen R, Meier-Ewert S et al. 1999. A complete BAC-based physical map of the Arabidopsis thaliana genome. Nature Genetics 22: 271275.
  • van Oeveren J, de Ruiter M, Jesse T, van der Poel H, Tang J, Yalcin F, Janssen A, Volpin H, Stormo KE, Bogden R et al. 2011. Sequence-based physical mapping of complex genomes by whole genome profiling. Genome Research 21: 618625.
  • O’Rourke J, Nelson R, Grant D, Schmutz J, Grimwood J, Cannon SB, Vance C, Graham M, Shoemaker RC. 2009. Integrating microarray analysis and the soybean genome to understand the soybeans iron deficiency response. BMC Genomics 10: 376393.
  • Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A et al. 2009. The Sorghum bicolor genome and the diversification of grasses. Nature 457: 551556.
  • Paterson AH, Bowers JE, Chapman BA. 2004. Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proceedings of the National Academy of Sciences, USA 101: 99039908.
  • Paterson AH, Freeling M, Tang H, Wang X. 2010. Insights from the comparison of plant genome sequences. Annual Review of Plant Biology 61: 349372.
  • Piegu B, Guyot R, Picault N, Roulin A, Saniyal A, Kim H, Collura K, Brar DS, Jackson S, Wing RA et al. 2006. Doubling genome size without polyploidization: dynamics of retrotransposition-driven genomic expansions in Oryza australiensis, a wild relative of rice. Genome Research 16: 12621269.
  • Potokina E, Druka A, Luo Z, Wise R, Waugh R, Kearsey M. 2008. Gene expression quantitative trait locus analysis of 16 000 barley genes reveals a complex pattern of genome-wide transcriptional regulation. Plant Journal 53: 90101.
  • Raghuvanshi S, Kapoor M, Tyagi S, Khurana P, Khurana J, Akhilesh T. 2010. Rice genomics moves ahead. Molecular Breeding 26: 257273.
  • Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H, Nishiyama T, Perroud PF, Lindquist EA, Kamisugi Y et al. 2008. The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science 319: 6469.
  • SanMiguel P, Tikhonov A, Jin YK, Motchoulskaia N, Zakharov D, Melake-Berhan A, Springer PS, Edwards KJ, Lee M, Avramova Z et al. 1996. Nested retrotransposons in the intergenic regions of the maize genome. Science 274: 765768.
  • Schlueter JA, Dixon P, Granger C, Grant D, Clark L, Doyle JJ, Shoemaker RC. 2004. Mining EST databases to resolve evolutionary events in major crop species. Genome 47: 868876.
  • Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, Song Q, Thelen JJ, Cheng J et al. 2010. Genome sequence of the palaeopolyploid soybean. Nature 463: 178183.
  • Schnable P, Ware D, Fulton R, Stein JC, Wei F, Pasternak S, Liang C, Zhang J, Fulton L, Graves TA et al. 2009. The b73 maize genome: complexity, diversity, and dynamics. Science 326: 11121115.
  • Severin AJ, Woody JL, Bolon YT, Joseph B, Diers BW, Farmer AD, Muehlbauer GJ, Nelson RT, Grant D, Specht JE et al. 2010. RNA-Seq Atlas of Glycine max: a guide to the soybean transcriptome. BMC Plant Biology 10: 160.
  • Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. 2009. ABySS: a parallel assembler for short read sequence data. Genome Research 19: 11171123.
  • Soderlund C, Longden I, Mott R. 1997. FPC: a system for building contigs from restriction fingerprinted clones. Computer Applications in the Biosciences 13: 523535.
  • Soltis DE, Soltis PS, Tate J. 2004. Advances in the study of polyploidy since plant speciation. New Phytologist 161: 173191.
  • Swigonova Z, Lai J, Ma J, Ramakrishna W, Llaca V, Bennetzen JL, Messing J. 2004. Close split of sorghum and maize genome progenitors. Genome Research 14: 19161923.
  • Tanksley SD, McCouch SR. 1997. Seed banks and molecular maps: unlocking genetic potential from the wild. Science 227: 10631066.
  • Thornsberry JM, Goodman MM, Doebley J, Kresovich S, Nielsen D, Buckler ESt. 2001. Dwarf8 polymorphisms associate with variation in flowering time. Nature Genetics 28: 286289.
  • Turner WR, Oppenheimer M, Wilcove DS. 2009. A force to fight global warming. Nature 462: 278279.
  • Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A et al. 2006. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313: 15961604.
  • United Nations. 2004. World population to 2300. New York, NY, USA: Dept of Economic and Social Affairs, 49.
  • Valkonen JPT, Nygren M, Ylönen A, Mannonen L. 1993. Nuclear DNA content of Pinus sylvestris (L.) as determined by laser flow cytometry. Genetica 92: 203207.
  • Varshney RK, Nayak SN, May GD, Jackson SA. 2009. Next-generation sequencing technologies and their implications for crop genetics and breeding. Trends in Biotechnology 27: 522530.
  • Velasco R, Zharkikh A, Affourtit J, Dhingra A, Cestaro A, Kalyanaraman A, Fontana P, Bhatnagar SK, Troggio M, Pruss D et al. 2010. The genome of the domesticated apple (Malus × domestica Borkh.). Nature Genetics 42: 833839.
  • Wang H, Nussbaum-Wagler T, Li B, Zhao Q, Vigouroux Y, Faller M, Bomblies K, Lukens L, Doebley JF. 2005. The origin of the naked grains of maize. Nature 436: 714719.
  • Weber JL, Myers EW. 1997. Human whole-genome shotgun sequencing. Genome Research 7: 401409.
  • Wicker T, Taudien S, Houben A, Keller B, Graner A, Platzer M, Stein N. 2009. A whole-genome snapshot of 454 sequences exposes the composition of the barley genome and provides evidence for parallel evolution of genome size in wheat and barley. Plant Journal 59: 712722.
  • Yan J, Kandianis CB, Harjes CE, Bai L, Kim EH, Yang X, Skinner DJ, Fu Z, Mitchell S, Li Q et al. 2010a. Rare genetic variation at Zea mays crtRB1 increases beta-carotene in maize grain. Nature Genetics 42: 322327.
  • Yan JB, Yang XH, Shah T, Sanchez-Villeda H, Li JS, Warburton M, Zhou Y, Crouch JH, Xu YB. 2010b. High-throughput snp genotyping with the goldengate assay in maize. Molecular Breeding 25: 441451.
  • You FM, Huo N, Deal KR, Gu YQ, Luo MC, McGuire PE, Dvorak J, Anderson OD. 2011. Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence. BMC Genomics 12: 59.
  • Zerbino DR, Birney E. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research 18: 821829.
  • Zhou S, Wei F, Nguyen J, Bechner M, Potamousis K, Goldstein S, Pape L, Mehan MR, Churas C, Pasternak S et al. 2009. A single molecule scaffold for the maize genome. PLoS Genetics 5: e1000711.
  • Zhu W, Ouyang S, Iovene M, O’Brien K, Vuong H, Jiang J, Buell CR. 2008. Analysis of 90 mb of the potato genome reveals conservation of gene structures and order with tomato but divergence in repetitive sequence composition. BMC Genomics 9: 286.