Patterns found in the whole genome sequences of model organisms are often discussed in terms of whole genome duplication (WGD). Clusters of similar genes, sometimes in similar linear order, are interpreted as duplicated blocks in the genome, and when these are frequent, WGD seems a parsimonious explanation for their origin (e.g. Blanc et al. 2003). In addition, molecular clocks are used to estimate the timing of gene duplication events and when many duplicate genes share approximately the same estimated age, WGD could be indicated (e.g. Blanc & Wolfe 2004). This logic points to many model organisms, such as Tetraodon (Jaillon et al. 2004), Arabidoposis (Blanc et al. 2003), Oryza (Blanc & Wolfe 2004) and Vitis (Jaillon et al. 2007) being ancient polyploids.
The hypothesis of ancient WGD may help us in understanding patterns within genomes, but the extent to which these patterns can, in turn, inform our detailed understanding of the role of WGD in evolution may be limited. Statistical evidence for a WGD does not allow us to identify an individual gene — even if in a duplicated block — as a product of WGD (Durand & Hoberman 2006). Inferences of ancient WGD may themselves be open to some uncertainty. Molecular clock analysis alone cannot fully distinguish between WGD and competing hypotheses (Blanc & Wolfe 2004) and analyses detecting WGD using gene clusters typically have a random null model for gene order (Durand & Hoberman 2006), although we know that co-expressed genes tend to be clustered (Hurst et al. 2004). Gu & Huang (2002) showed that small-scale tandem duplications are a more parsimonious explanation for the patterns found in 103 paralog blocks in Arabidopsis thaliana than WGD (although they took this as evidence that the parsimony test was misleading), and Hughes et al. (2003) showed transposable elements (which could mediate segmental duplication) to be more frequent in putative duplicate blocks than would be expected by chance. Analysis of the salmonid transcriptome and transposon activity points to a more complex history than can be explained by the commonly accepted single, ancestral WGD (BF Koop, J deBoer, G Brown and WS Davidson, unpublished).
Progress in our understanding of the role of WGD in evolution therefore requires genomic data from polyploids that can be unambiguously identified by comparison of chromosome numbers or genome size in closely related species, or by direct observation of multivalents during meiosis. Currently, such genomic data are largely restricted to crop species: for example, the whole genomes of bread wheat and potato are being sequenced, and extensive resources exist for cotton (e.g. Udall et al. 2007). Several natural plant species have been developed as useful models for studying different aspects of polyploid evolution, but these are ‘nonmodel’ organisms in the sense that few genomic resources are available in them. The best resourced are: Arabidopsis suecica, which has the model organism A. thaliana as one progenitor (Chen et al. 2004); polyploids of Glycine subgenus Glycine, sister group to the cultivated soybean (Doyle et al. 2004), the genome sequence of which is currently being sequenced; Senecio cambrensis, for which anonymous cDNA microarrays exist (Hegarty et al. 2005); and Tragopogon miscellus with 2000 expressed sequence tags. For other systems such as Mercurialis annua (Pannell et al. 2004) and Spartina anglica (Ainouche et al. 2004), only a handful of genetic markers are available. No natural allopolyploid is currently subject to whole genome sequencing.
This means that in genome-duplication research there is a gap between good evolutionary models (where polyploids are natural, well-characterized, and of known parentage) and good genetic models (for which genomic resources are available). The development of genomic resources for existing natural evolutionary models is therefore the logical way forward for research on WGD (Soltis et al. 2004). At the same time, studying natural systems pays an occasional dividend in the serendipitous discovery of a useful model within an already well-resourced system. Polyploids in particular often occur as cryptic species, especially autopolyploids (Soltis et al. 2007). In this issue, Sweigart et al. characterize an unexpected allopolyploid in the genus Mimulus, found during a study of postzygotic reproductive isolation between M. guttatus and M. nasutus. This promises to be a useful addition to the suite of natural polyploid models.
As a genetic model, this allopolyploid will soon have extensive resources. The whole genome of the closely related diploid M. guttatus is currently being sequenced. Sequencing of expressed sequence tags is also underway: 200K from an annual alpine ecotype of M. guttatus, 32K from a perennial, coastal ecotype of M. guttatus, and 32K from M. nasutus. Microarrays are being developed, which will allow comparative genomic hybridization of diverse genotypes. A collection of maps, markers and extensive sequence data are already available at http://www.mimulusevolution.org.
As an evolutionary model, the newly characterized polyploid is especially interesting. The genus Mimulus has long provided a tractable model for research programmes in ecological and evolutionary genetics, due to its wide diversity in habitat preference, life history, mating system and floral form (Wu et al. 2008). The newly characterized allopolyploid exhibits features common in polyploids of other genera, such as more than one origin and a shift in sexual system. An estimated 11.5% of speciation events in Mimulus have involved WGD (Beardsley et al. 2004); hence, there is potential for comparative studies of related polyploids.
To be able to take full advantage of this system, the parentage of the allopolyploid (which deserves a species name of its own) should be known with certainty. Sweigart et al. show convincingly that one parent is M. nasutus, but better phylogenetic evidence — from wider taxon sampling and perhaps further genes — is needed before we can be certain that M. guttatus is, as seems likely, the other parent. Data from chloroplast genomes are needed to discern which species is the maternal parent, or if the allopolyploid has formed reciprocally. It is intriguing that the morphology of the tetraploid resembles M. nasutus more closely than M. guttatus (Fig. 1). Could this be due to loss or silencing of homeologs from the other parent at key floral morphology genes? Uncovering the genetic basis of these traits could shed light on features of polyploid evolution currently under study in other systems (e.g. Tate et al. 2006).
Together, these mean that the Mimulus model system should soon be able to provide valuable insights into the ecological and evolutionary genetics of natural polyploidy. As the body of data from this and other polyploid systems grows, we will be able to answer questions such as: do allopolyploids evolve in a similar fashion across different genera and across multiple origins within genera? Are certain groups of genes preferentially lost or silenced after WGD across genera? To what extent are emerging patterns in recent polyploid genomes tending towards those found in putative palaeo-polyploids? Thus, light will be shed on the patterns of genome evolution in a wide variety of other model organisms, and the genetic factors leading to polyploid success.