Diploid and polyploid reticulate evolution throughout the history of the perennial soybeans (Glycine subgenus Glycine)


Author for correspondence: Jeff J. Doyle Tel: +1 607 255 7972 Fax: +1 607 255 7979 Email: jjd5@cornell.edu


The perennial soybeans (Glycine subgenus Glycine), are the sister group of the annual cultivated soybean (G. max). Among the approximately 20 species are diploids and polyploids, the former confined to Australia and neighboring islands and the latter more widespread. Although most subgenus Glycine species reproduce predominantly by selfing in cleistogamous flowers, phylogenetic evidence exists for reticulate evolution throughout the history of the subgenus. The entire genus is a paleopolyploid, and could possibly be allopolyploid, though there is as yet no evidence for a hybrid origin. Incongruence among the major nuclear genome groups in nuclear and chloroplast gene trees can be explained by several ancient introgressions. Within the B-genome group there is substantial incongruence between chloroplast and nuclear single copy gene trees that is explained better by introgressive hybridization than by stochastic sorting of ancestral lineages. Several allopolyploids originated by hybridization among a subset of genome groups to form a single large interconnected polyploid complex. A number of allopolyploid combinations have arisen recurrently, some bidirectionally. Some recurrent polyploids show evidence of lineage recombination, indicating that their populations comprise a single biological species. Neopolyploidy has involved hybridization among a subset of subgenus Glycine genome groups, and appears to have occurred recently, whereas hybridization at the diploid level has occurred throughout the history of the group.


Glycine is the legume genus that includes G. max, the cultivated soybean, which like its wild progenitor, G. soja, is an annual plant native to north-eastern Asia. These two species comprise subgenus Soja, one of the two subgenera of Glycine (Table 1). The diversity of the genus is concentrated in Australia, where all of the approximately two dozen perennial species of subgenus Glycine occur. Because subgenus Glycine is the secondary germplasm pool for the cultivated soybean and harbors desirable agronomic traits such as drought tolerance and disease resistance, the group has been collected extensively. This has resulted in a germplasm collection of over 2000 accessions, which in turn has provided the foundation for much biosystematic research, ranging from crossing studies to molecular phylogenetic analyses.

Table 1.  Taxonomic, nuclear genome, and chloroplast genome (plastome) classifications of diploid species of Glycine
genome groupSpeciesPlastome group
  1. All are 2n = 40 except G. tomentella D1 (2n = 38). 1G. tabacina also refers to an allopolyploid species (see Fig. 5). 2G. tomentella‘races’ (D1-D5B) are listed here. These are genetically differentiated taxa (species) that have not yet been accorded formal taxonomic recognition. Not all have been incorporated into the nuclear genome designations: the D5B race is part of the H-genome clade in nuclear gene phylogenies (e.g. Brown et al., 2002) and is thus called Hb here; the D5A race is sister to that clade in phylogenies and hence is called ‘Ha’ here. 3G. hirticaulis contains both diploid and polyploid cytotypes.

SojaGG. maxG
GG. sojaG
GlycineAG. argyreaA
AG. canescensA
AG. clandestinaA
AG. latrobeanaA
AG. peratosaA
AG. rubiginosaA
A (tom D4)2G. tomentella D4A
BG. latifoliaB
BG. microphyllaB
BG. tabacina1B
B′G. stenophitaB
CG. curvataC
CG. cyrtolobaC
D (tom D3)G. tomentella D3A
E (tom D1)G. tomentella D1A
FG. falcataA
HG. hirticaulis3A
HG. pindanicaA
HG. pulleniiA
Hb (tom D5B)G. tomentella D5BA
Ha (tom D5A)G. tomentella D5AA
IG. albicansA
IG. aphyonotaA
IG. lactovirensA

Thanks to this history of study, which goes back to the early 1970s, more is now known about evolution in Glycine than in all but the best-studied angiosperm genera. Hybridization is an important phenomenon in plant evolution (Grant, 1981; Arnold, 1997), so it is not surprising that reticulation contributes to the complexity of evolutionary patterns in the genus. The fingerprint of hybridization is incongruence between independent markers, whether molecular or morphological. However, although incongruence is a necessary consequence of reticulate evolution it is not sufficient evidence for hypothesizing hybridization. Stochastic forces, notably sorting of ancestral polymorphisms (‘lineage sorting’) can produce the same pattern as hybridization and introgression (Neigel & Avise, 1986; Pamilo & Nei, 1988). In subgenus Glycine there is considerable evidence of incongruence among diploid taxa, and in at least some cases this appears more likely to have been caused by reticulate evolution rather than to drift and fixation (Doyle et al., 1999a).

This might seem surprising, given the biology of the genus. All Glycine species produce both cleistogamous flowers, adapted for selfing, and chasmogamous flowers for outcrossing. The majority of seed produced in natural populations is from self-pollination, in agreement with the observation of abundant autogamous seed production in the greenhouse. Patterns obtained from polymorphic DNA and isozyme markers show that homozygosity is common, but diploid heterozygotes are encountered (Doyle et al., 1999a; Brown et al., 2002). This predominance of autogamy would seem to be a formidable barrier to hybridization, but the widespread occurrence of the dual mating system in subgenus Glycine indicates that this mixed evolutionary strategy is quite stable in the subgenus and has persisted over time. This, in turn, implies that appreciable levels of outcrossing must have occurred throughout the history of the subgenus. In several species the showy chasmogamous flowers are scented, suggesting insect pollination, and heterozygosity has indeed been found in genetic studies. Estimates of outcrossing rate in natural populations vary from 0.11 in montane populations of G. clandestina to 0.50 in coastal populations of G. argyrea (Schoen & Brown, 1991). Heterozygosity in natural populations of G. argyrea was estimated to be H = 0.25 from allozyme data, compared with the panmictic expectation for these loci of h = 0.32 (Brown et al., 1986). Comparable estimates of G. canescens (part of the same species complex as G. clandestina) were H = 0.01 vs h = 0.31 (Brown et al., 1990), but this estimate was for residual heterozygosity in selfed seed of greenhouse-grown plants, and natural levels of heterozygosity are probably closer to 2–5%. Clearly, the mating system does allow opportunities for cross-pollination between species.

Polyploidy is increasingly recognized as a ubiquitous force in plant evolution, as evidenced by findings of major duplications in the small genome of Arabidopsis thaliana, some of them old enough to have taken place in an ancestor of many angiosperm families (Vision et al., 2000; Blanc et al., 2003). All but the most strict autopolyploids are the products of some form of reticulation, and allopolyploidy in the taxonomic sense of the term involves hybridization among differentiated species. Glycine provides examples of both paleo- and neopolyploidy, with numerous instances of recent allopolyloidy in subgenus Glycine, some of which have led to expansion of the subgenus out of Australia into the Pacific, as far north as Taiwan and the Ryukyu Islands (Doyle et al., 1990b).

Here we summarize evidence for reticulate evolution in Glycine. Topics include the paleopolyploid, perhaps allopolyploid, origin of the entire genus, reticulate evolution of major genome groups in subgenus Glycine, incongruence between chloroplast DNA and nuclear genes within species complexes, and the origin and evolution of the extensive neoallopolyploid complex.

Paleopolyploidy: is Glycine an allopolyploid genus?

‘Diploid’Glycine species in both subgenera are 2n = 40, with the exception of a single 2n = 38 species in subgenus Glycine. By contrast, most members of the tribe Phaseoleae, to which Glycine belongs, are 2n = 20 or 22 (Goldblatt, 1981). Evidence of an ancient duplication exists in the soybean genome (Zhu et al., 1994; Shoemaker et al., 1996), and recent comparisons of putatively homoeologous sequences suggest a date of around 15 mya for the polyploid event (Schlueter et al., 2003). Zhu et al. (1994) refer to Glycine as ‘a diploidized tetraploid generated from an allotetraploid ancestor’ but the origin of that ancestor is unknown.

Kumar & Hymowitz (1989) suggested that true diploid (2n = 20) Glycine either had not yet been collected, had been misclassified in another genus, or had become extinct. Alternatively, they suggested, diploid Glycine never existed, and the genus was formed by allopolyploidy involving two other genera. Over a decade later, it seems unlikely that a true diploid Glycine remains to be identified, because the intensive collecting since 1989 has revealed several new species, but none with chromosome numbers lower than 2n = 38. Also, much more is now known about phylogenetic relationships in Glycine and allied phaseoloid legumes (Kajita et al., 2001; Lee & Hymowitz, 2001), making it unlikely that misclassified Glycine species exist. Notably, Sinodolichos, once suggested as a possible congener of Glycine (Lackey, 1981), has been shown to be less closely related to Glycine than are other genera, such as Teramnus (Doyle et al., 2003).

The hypothesis of allopolyploidy can be tested by reconstructing phylogenies of nuclear genes that are duplicated in Glycine but are single copy in related genera (Fig. 1). An example of such a gene is the nuclear gene encoding the chloroplast-expressed isozyme of glutamine synthetase (ncpGS). Phylogenetic analyses showed that the two paralogues of Glycine ncpGS were sister to one another (Doyle et al., 2003), and together this clade was sister to ncpGS genes from Teramnus, a genus whose chloroplast genome is sister to that of Glycine (Lee & Hymowitz, 2001). This pattern does not support an allopolyploid origin of Glycine, but neither does it prove that Glycine is autopolyploid. Extinction of the descendents of genome donors would result in the homoeologous genes being sisters to one another, a pattern that mimics that of autopolyploidy (Fig. 1a).

Figure 1.

Hypothetical gene tree topologies for allopolyploidy (a), autopolyploidy (b), and simple gene duplication (c, d). Circled ‘p’ indicates a gene duplication in Glycine caused by polyploidy; boxed ‘d’ represents a simple gene duplication that does not involve the whole genome. In trees that involve polyploidy (a, b), the gene is single copy in allied diploid genera, because the polyploid event only involves Glycine. This is not true when a simple gene duplication predates the divergence of genera (c), in which case paralogous and orthologous genes exist in all three genera. However, a simple gene duplication in Glycine results in a topology indistinguishable from autopolyploidy for any single gene (compare b and d). Extinction of allopolyploid genome donors (Teramnus and Amphicarpaea in ‘a’) would also result in an identical topology.

It is also possible that ncpGS is the product of an independent duplication that had nothing to do with the polyploid event. Duplications and losses of genes –‘birth and death’– are common features of eukaryotic genomes (Lynch & Conery, 2000), and not all involve polyploidy. The divergence date estimated for the two paralogues of ncpGS is around 4.5 mya (Doyle et al., 2003), which is only about a third of the estimate based on synonymous substitution distances calculated from over 250 pairs of duplicated genes (Schlueter et al., 2003). This does not necessarily rule out polyploidy as the cause for the ncpGS duplication, however. Zhang et al. (2002) showed that Arabidopsis genes presumably duplicated by a single polyploid event exhibited a greater than tenfold range of divergence rates. Moreover, as Gaut & Doebley (1997) point out for maize, the complexity of polyploid formation and the subsequent process of diploidization can lead to different genes showing very unequal coalescence times. Phylogenetic analysis of other genes in the ‘polyploid’ class identified from soybean will test whether the pattern found for ncpGS is the common pattern. If so, then it seems likely that the genome donors of Glycine, whether one or more diploid Glycine species or two different genera, are extinct.

A coarse-scale look at the phylogeny of subgenus Glycine: concordant genome groups and incongruent relationships among them Much of the earliest biosystematic work on subgenus Glycine involved analysis of fertility in artificial interspecific hybrids (Putievsky & Broué, 1979; Newell & Hymowitz, 1983; Grant et al., 1984). This work resulted in the identification of diploid ‘genome groups’ whose species are interfertile with other species in the same genome group but are separated from species in other genome groups by sterility barriers. The monophyly of the genome groups was strongly supported by the first molecular phylogenetic study of the genus, which used chloroplast DNA maps based on restriction fragment length polymorphisms (cpDNA RFLP; Doyle, Doyle & Brown, 1990a). Subsequent studies using cpDNA (Doyle et al., 1990c), nrDNA ITS (Nickrent & Doyle, 1995; Kollipara et al., 1997), or low copy nuclear genes (Doyle et al., 1996; Doyle et al., 1999a; Brown et al., 2002; Doyle et al., 2002) also supported the naturalness of these groups, and phylogenetic evidence now is used to relate new species to genome group clades even when crossing data are not available (e.g. Doyle et al., 1999a). Hymowitz et al. (1998) recognized nine genome groups, one for subgenus Soja (G-genome) and eight in subgenus Glycine (A–F, H, I). The genome group concept still requires some refinement, primarily because taxonomic nomenclature has not caught up with biosystematic knowledge of the subgenus, particularly with respect to ‘G. tomentella’, which even at the diploid level is clearly polyphyletic (Table 1, Fig. 2; Brown et al., 2002).

Figure 2.

Glycine gene trees. Terminal taxa are genome groups as listed in Table 1, with the exception that G. latrobeana is shown separately from other A-genome species on the chloroplast tree (a; see text for explanation). The A-plastome group is the group derived from the ancestor indicated by a black circle on the chloroplast gene tree (a). The F-genome (G. falcata) is indicated by an asterisk on all of the trees to illustrate differences among these topologies.

Although the genome groups are well-defined, the relationships among them are far from clear. From the very beginning, phylogenetic analyses have produced unexpected results. The first objectively constructed hypothesis of relationships among the genome groups, from cpDNA RFLP maps, surprisingly placed the single F-genome species, G. falcata, as part of a strongly supported ‘A-plastome’ clade (plastome = plastid genome). The A-plastome clade included chloroplast genomes of species classified in the A, D, and E nuclear genome groups (Doyle et al., 1990a) as well as those of members of the H and I genomes and the G. tomentella D5A ‘race’ (Doyle et al., 1990c; Table 1, Fig. 2a). None of these species were thought to have a close relationship with G. falcata. Indeed, G. falcata differs markedly from all other species in the subgenus morphologically and ecologically, as well as in its biochemical profile for trypsin inhibitors (Mies & Hymowitz, 1973; Kollipara et al., 1995), fatty acids (Chavan et al., 1982), leaf flavonoids (Vaughan & Hymowitz, 1984), and phytoalexins (Keen et al., 1986), and in the large size of its nrDNA intergenic spacer (Doyle & Beachy, 1985). G. falcata is so distinctive that it was thought to be sister to the remainder of subgenus Glycine, or perhaps even more distantly related to the other perennial species than is subgenus Soja.

Doyle et al. (1990a) rationalized the cpDNA placement by noting that ‘distinctive’ features are autapomorphies, which do not provide grouping information. The grouping of G. falcata with A-genome species thus was not inconsistent with the many differences between them, any more than was the classic and widely discussed finding that the highly autapomorphic Heterogaura (Onagraceae) is nested within a derived clade of Clarkia (Sytsma & Gottlieb, 1986). Moreover, it was argued, interfertility is an ancestral (plesiomorphic) condition (Rosen, 1979), one that can be lost by apomorphic changes, so the inability of G. falcata to cross with A-genome species could also be explained by autapomorphic changes in G. falcata.

However, subsequent phylogenetic studies of the nrDNA ITS (Nickrent & Doyle, 1995; Kollipara et al., 1997) and histone H3-D (Doyle et al., 1996) did not support the cpDNA placement. Instead, both of these nuclear sequences placed G. falcata as an early diverging lineage in the subgenus, in agreement with intuitive views based on its distinctiveness (Fig. 2b,c). For histone H3-D, analyses of different alignments all placed G. falcata as sister to the remainder of the subgenus, with varying degrees of support (Doyle et al., 1996). Published nrDNA ITS strict consensus trees placed G. falcata, the C-genome, and the remainder of the subgenus in a trichotomy (Kollipara et al., 1997; Singh et al., 1998). Analyses that include much greater taxon sampling have weak support for a sister relationship between G. falcata and the C-genome (J. T. Rauscher et al., unpublished; Fig. 2d).

Phylogenies are now available from four additional low copy nuclear loci: two paralogues of ncpGS (Doyle et al., 2003) and two paralogues of a nuclear copy of cytochrome oxidase subunit 2 (nu-cox2; J. J. Doyle et al., unpublished). Based on a relatively small sample of taxa, both of the cox2 paralogues place G. falcata as sister to the remainder of the subgenus with strong support, but provide little additional resolution among genome groups. The situation for ncpGS was more complex. One paralogue (ncpGS-1) included a strongly supported relationship between G. falcata and the two A-genome species sampled (Fig. 2d). Taken by itself, this would agree with the cpDNA tree. However, cpDNA data also placed the ‘DEHI clade’ (comprising the D, E, H, and I genome group species plus the G. tomentella D5A ‘race’) as part of the A-plastome group (Fig. 2a), whereas these taxa were not part of the ncpGS-1 clade that included G. falcata and the A-genome species. Instead, in the ncpGS-1 tree the DEHI group was strongly supported as being sister to the C-genome. The second paralogue (ncpGS-2) placed the two C-genome species and G. falcata as sister groups (Fig. 2e), with strong support as opposed to the weak support for the same grouping with nrDNA ITS (Fig. 2c).

Thus there appear to be two taxa whose relationships are in need of resolution: G. falcata and the clade comprising the two C-genome species. The latter clade is either strongly supported as sister to the B-genome (cpDNA; Fig. 2a), sister to G. falcata (ncpGS-2 or, less strongly, nrDNA ITS; Fig. 2e,c), sister to the DEHI clade (ncpGS-1; Fig. 2d), or not closely related to any other genome group (histone H3-D; Fig. 2b). The diversity of relationships involving these two taxa makes it difficult to hypothesize any simple evolutionary scenario to account for the phylogenetic patterns in the various gene trees. There is always the possibility that one or more of the gene trees are incorrect, even as gene trees. The topologies for histone H3-D and nrDNA ITS have been relatively stable as many new taxa have been added. The original RFLP map-based analyses on which Fig. 2(a) is based remain the most comprehensive published studies of cpDNA in Glycine. An analysis of the chloroplast gene rps16 in Phaseoleae subtribe Glycininae included sequences from single Glycine species of the A, B, F, and G genomes and showed the same groupings as the RFLP tree: G. falcata and the A-genome species grouped with 100% bootstrap support; this group was sister to the other subgenus Glycine species sampled (from the B-genome); and this subgenus Glycine clade was sister to G. max (Lee & Hymowitz, 2001). A recent phylogenetic analysis of Glycine cpDNA based on four noncoding regions did not include G. falcata, but did group the B- and C-genomes with strong support (Sakai et al., 2003). Similar results are obtained using other noncoding chloroplast sequences but with larger sample sizes; in addition these analyses group G. falcata as part of an A-plastome clade (J. T. Rauscher et al., unpublished).

If each of the above phylogenies is accurate as a gene tree, and if each has captured some aspect of the evolutionary history of subgenus Glycine, then several events must be postulated to account for the incongruence among their topologies. Doyle et al. (1996) discussed various hypotheses that could account for incongruence between the histone H3-D and cpDNA phylogenies for G. falcata, notably the ‘usual suspects’: lineage sorting or hybridization with or without introgression (Wendel & Doyle, 1998). More complex explanations, or at least more events, are required to explain all of the differences among the gene trees now available (Fig. 2). One hypothetical scenario among many alternatives is presented here as an example of how a relatively small number of events can produce a multiplicity of gene tree topologies (Fig. 3). This scenario assumes that the histone H3-D topology represents the true species phylogeny, and hypothesizes four introgression events to obtain all of the gene trees observed to date. Two of these introgression events are required to produce the cpDNA topology. For five taxa (in this case the A, B, C, DEHI, and F genomes) there are 105 possible rooted topologies. The four introgression events in this hypothesis (Fig. 3) produce 14 different topologies, of which four are observed among gene trees thus far reconstructed. Presumably these topologies are all not equally represented in the genomes of subgenus Glycine species. The number would depend on many factors, including selection, rate of recombination, and the number of generations since introgression (Baird, 1995; Rieseberg et al., 2000; Martinsen et al., 2001).

Figure 3.

One possible evolutionary scenario for reconciling gene trees with species histories in subgenus Glycine, involving introgression of nuclear genes. Genome groups are terminal taxa; ‘a’ is the ancestor of the A, D, E, H, and I genomes. Introgression events are indicated by numbered light arrows in dashed boxes, with the taxon at the tip of the arrowhead being the recurrent parent that retains some alleles received from the nonrecurrent parent. Thus, in event 1, a hybridization event between taxa B and C is followed by recurrent crosses to taxon C. After many generations, most loci in taxon C will have only C alleles (CC), but at some loci taxon C will be polymorphic for alleles that originally entered C from taxon B (CB). One or the other of these alleles ultimately becomes fixed (arrows with black circles). Through a series of four such events, major incongruent features of the topologies of the five nuclear genes can be explained, as can nuclear genes with the topology of the chloroplast genome. The history of the chloroplast genome is shaped by the direction of the cross, and introgression is assured if the recurrent parent is the pollen donor. Note that only relevant topologies are followed to completion (e.g. only one of the two outcomes of fixation in event 4). Also, additional topologies could be generated by events 3 and 4 using the remaining topologies resulting from event 2, and from event 4 on numerous other topologies. Additional events must be postulated to account for other differences in the chloroplast gene tree involving the ADEHI clade (see Fig. 2).

Heiser noted in 1973 (Heiser, 1973; p. 352) that ‘Since most studies of introgression are based on a few morphological characters it could well be that introgression is more extensive than indicated by conspicuous morphological features’. He considered stabilized introgressant species to be more common than cases of dispersed introgression involving widespread gene flow between two distinct species, but suggested (p. 362) that the latter phenomenon possibly ‘only appears rare because introgression brings about very subtle changes (“a trickle of genes”) that are seldom if ever detected by ordinary methods of investigation.’ He speculated that with the modern tools of biochemistry, more cases of introgression might be identified. Molecular tools have indeed confirmed hypothesized diploid hybrid species (Rieseberg et al., 2000) and have also revealed possible cases of ancient hybridization (e.g. Wendel et al., 1995; Cronn et al., 2002). Some well-differentiated Glycine diploid species may have hybrid histories, and could represent cases where either many or few genes show evidence of hybridity.

Incongruence between cpDNA and histone H3-D in the B-genome diploid group: lineage sorting or hybridization? Detailed studies of molecular sequence variation have been conducted in two species complexes of subgenus Glycine. In the B-genome group, surveys of cpDNA RFLP variation revealed considerable diversity, with 20 haplotypes found among 65 core B-genome accessions (representing three named species and numerous unnamed accessions) and three additional haplotypes in the nine G. stenophita (called ‘G. sp. aff. G. tabacina’) accessions sampled (Doyle et al., 1990d). None of the three named core B-genome taxa were found to be represented by a monophyletic set of haplotypes. Histone H3-D allelic sequence variation was studied in 44 of the same accessions, and 23 alleles were identified, 19 in core B-genome plants (Doyle et al., 1999a). The histone and cpDNA trees were found to be statistically incongruent (Doyle et al., 1999a). The histone allele phylogeny was more concordant with species relationships than was the chloroplast haplotype tree because only in the histone tree did alleles from G. microphylla and G. latifolia form monophyletic groups. By contrast, the morphologically coherent G. microphylla was found to possess two very different groups of chloroplast haplotypes (Groups I and II). Group II haplotypes were found in most other core B-genome accessions, whereas Group I haplotypes were confined almost exclusively to G. microphylla.

The two most common explanations for incongruence at low taxonomic levels are lineage sorting and hybridization-introgression, and it is well-known that these very different processes can produce very similar patterns (Neigel & Avise, 1986). In this case, the Group II haplotype in G. microphylla could either be a retained polymorphism shared with species such as G. latifolia, or it could be as a result of hybridization between the two species. Lineage sorting and hybridization hypotheses differ from one another in when, relative to speciation, the event takes place (Fig. 4). In lineage sorting, the polymorphism predates speciation, whereas by definition species must already have diverged from one another before interspecific hybridization can occur. Thus, the lineage sorting interpretation of the B-genome cpDNA haplotype tree is that the divergence between the Group I and Group II haplotypes gives the date of the origin of the polymorphism, whereas the divergence between the Group II hapotype of G. microphylla and the most closely related Group II haplotype in another species is tracking the species tree, and thus gives the date of divergence of G. microphylla from that species. In the hybridization interpretation, the divergence of the Group I and Group II haplotypes is tracking the species tree, and thus the date of speciation is the time of divergence between the two haplotypes. The divergence between the Group II haplotype of G. microphylla and the most closely related Group II haplotype in another species gives the date of hybridization between G. microphylla and that species (Fig. 4).

Figure 4.

Lineage sorting vs hybridization in the B-genome of subgenus Glycine. (a) Species relationships inferred from the histone H3-D phylogeny. (b) Chloroplast DNA gene tree. G. microphylla is polymorphic for two chloroplast types (cp Group I and cp Group II haplotypes), one of which (Group II) is shared with other B-genome species. Times t1 and t2 are indicated by circles at nodes. (c) The meaning of times t1 and t2 when the topology in ‘b’ is explained by lineage sorting vs introgression. The time of divergence between G. microphylla and other core B-genome taxa is estimated from the histone H3-D tree; both other times are estimated from cpDNA analyses. Time estimates are inconsistent with the lineage sorting model.

Thus, by estimating the dates of speciation and of the polymorphism, one can test whether the data favor one hypothesis. In this case, the divergence of G. microphylla was estimated from the histone H3-D data, because G. microphylla (like other taxa) was represented by a monophyletic group of alleles at that locus, so a species tree could be hypothesized. This date, between 0.75 and 1.5 mya, was considerably larger than the divergence estimate for Group I and Group II haplotypes, which was 0.4 mya. Group II haplotypes of G. microphylla and other taxa were too similar for an estimate to be made. Thus, the data were most consistent with hybridization, rather than lineage sorting (Doyle et al., 1999a). Certainly, hybridization is a reasonable hypothesis, given the fact that the inclusion of these species in the B-genome group was initially based on their ability to form fertile artificial hybrids. Lineage sorting also remains a possible explanation. Indeed, the two processes are not mutually exclusive, and recently diverged species such as these may be expected to share polymorphisms and also hybridize (Doyle et al., 1999a).

The A-genome complexDoyle et al. (1990a) found very little cpDNA variation among species with A-type chloroplast genomes. That study identified two equally parsimonious trees, differing in the placement of G. latrobeana, a core A-genome species. In one tree, G. latrobeana was sister to G. falcata, and this clade was sister to a clade comprising the remaining core A-genome species (G. argyrea, G. canescens and G. clandestina) along with the D and E genome G. tomentella accessions. These D, E, and core-A taxa all had nearly identical plastid genomes. In the other tree, G. falcata, G. latrobeana, and the remainder of the A-plastome group formed a trichotomy. In a subsequent study, three newly described I-genome (G. albicans and G. lactovirens) and H-genome (G. hirticaulis) species were found to have chloroplast genomes that were nearly identical to the core A-plastome group, emphasizing the uniqueness of the chloroplast genome of G. latrobeana. However, it is only the chloroplast genome that suggests this placement for G. latrobeana. Histone H3-D phylogenies have consistently placed G. latrobeana with the other core A-genome species (Doyle et al., 1996; Brown et al., 2002), as have nrDNA ITS analyses (Kollipara et al., 1997; Singh et al., 1998; J. T. Rauscher et al., unpublished data).

Thus the composition of the A-plastome group is anomalous by virtue of its including G. falcata, as was discussed above. Furthermore, the A-plastome topology is unexpected because of its placement of G. latrobeana relative to other taxa, particularly the DEHI clade, which in histone and ITS studies form a clade sister to the A-genome group (Fig. 2b,c) rather than being included with its members (Fig. 2a). There currently are no explanations for these differences. The weak support for a sister-group relationship between the chloroplast genomes of G. latrobeana and G. falcata offers few clues because both are involved in different aspects of the mystery. But it is possible that G. latrobeana, and perhaps the DEHI group, were involved in some reticulate events involving A-genome species, events now preserved in the chloroplast genomes of these taxa.

In each case discussed thus far, the chloroplast genome has provided the evidence for possible reticulate evolution. This also appears to be true among taxa of the core A-genome (G. clandestina, G. canescens and allies). There is good agreement between nrDNA ITS and histone H3-D for nearly all accessions, and the groups identified from sequence data show overall good correspondence with isozyme-based groupings within G. canescens (Brown et al., 1990) and with named or informal taxa (J. J. Doyle et al., unpublished). In nuclear gene trees, accessions of G. argyrea form a monophyletic clade, along with some accessions of G. clandestina. However, a large chloroplast deletion is found in only some accessions of G. argyrea, as well as in some (but not all) accessions of a different clade of G. clandestina, suggesting a pattern of incongruence between chloroplast and nuclear gene trees in the core A-genome that is reminiscent of the reticulation found in the B-genome complex (Doyle et al., 1999a).

Frozen reticulation: the fixed hybrids of the subgenus Glycine allopolyploid complex All of the evidence for reticulation discussed thus far has been indirect, and open to alternative interpretations, notably the stochastic sorting of ancestral polymorphisms. But incontrovertible evidence of hybridization in Glycine, frozen in time, exists in the form of numerous allopolyploids. The distribution of allopolyploidy in the subgenus complements the inferences of hybridization at the diploid level in revealing the extent of reticulation in the subgenus.

Two polyploid taxa in the subgenus, G. tabacina (2n = 80) and G. tomentella (2n = 78, 80), have been known and studied for many years (Newell & Hymowitz, 1978; Grant et al., 1984; Doyle et al., 1986; Singh et al., 1987; Doyle et al., 1990b; Doyle et al., 1990e), and one of the species discovered in the late 1980s, G. hirticaulis, was also found to include 2n = 80 populations (Tindale & Craven, 1988). Molecular systematic studies, mainly using histone H3-D data, have shown that there is a single extended allopolyploid complex, with the various polyphyletic taxa called ‘G. tabacina’ or ‘G. tomentella’ linked by the G. tomentella D4 diploid genome (Fig. 5; Doyle et al., 2002, 2004). Both of these polyploids comprise several taxa that merit recognition as distinct species by virtue of sharing, at most, one of their two homoeologous genomes. This has been recognized formally in the case of the tabacina polyploid complex, by the separation of G. pescadrensis from G. tabacina. Much remains to be done in the tomentella complex, however. In addition to being polyphletic at the diploid level (Table 1; Brown et al., 2002), G. tomentella at the polyploid level consists of six ‘races’ (tomentella T1-T6), each with a different combination of diploid genomes (Fig. 5; reviewed in Doyle et al., 2004).

Figure 5.

The Glycine subgenus Glycine allopolyploid complex. Bold type denotes polyploid taxa; roman type denotes diploids. Boxes represent closely related diploid taxa belonging to the same genome group, which is indicated by a circled letter. D1-D5, T1-T6 are ‘races’ of G. tomentella; genome designations (AA, BB, B′B′) are given for G. tabacina and G. pescadrensis and their progenitors. Lines link polyploids with their diploid genome donors; lines with arrows indicate that the diploid taxon contributed both the nuclear and chloroplast genome.

Allopolyploidy, though extensive in the subgenus, is not universal. There is no evidence of any polyploidy involving the F, C, or I genomes (Fig. 6). By contrast, the A, B, and, especially, the D, E, and H genome groups have all participated extensively in hybridization resulting in allopolyploidy (Figs 5, 6). The majority of these crosses have been between genome groups. There is only one clear example of an allopolyploid formed within a genome group: G. tabacina was formed by crosses between members of the core B-genome (e.g. G. microphylla, G. latifolia) and G. stenophita, which is distinctive enough molecularly from these taxa to be considered the B′ genome (Doyle et al., 1999a) but which is similar enough genetically to form partially fertile hybrids with G. microphylla (A. H. D. Brown et al., unpublished). G. hirticaulis may be a taxonomic autopolyploid or perhaps an allopolyploid involving closely related genomes but remains poorly studied. This lack of allopolyploidy among very close relatives is in marked contrast with the evidence for hybridization at the diploid level in the A- and B-genome groups. Hybridization within genome groups thus appears to have resulted in introgression at the diploid level, rather than in allopolyploidy.

Figure 6.

The phylogenetic distribution of genome donors to polyploid taxa in subgenus Glycine. Phylogenetic relationships follow those reconstructed from histone H3-D. Letters represent diploid genome groups, following Fig. 2 except that the B-genome is subdivided into the B′ genome (G. stenophita) and a core group (remaining species), and G. tomentella D5A is labeled ‘Ha’ to show its close relationship with the H-genome in histone H3-D trees. The remaining G. tomentella diploid genomes are designated both by genome group and by race (D1, etc.). In each column, a ‘-’ indicates that no polyploid is known either within a genome group (e.g. F × F) or between genome groups (e.g. C × B). Where hybridization between genomes has resulted in allopolyploids, these are labeled by species name (tab = G. tabacina; pes = G. pescadrensis) or G. tomentella polyploid race (T1-T6). The number of independent origins estimated for each polyploid is shown in parentheses below the polyploid. G. hirticaulis (hir) has not been thoroughly studied and its origin is shown provisionally.

Hybridization resulting in allopolyploidy has occurred most commonly among pairs of reproductively isolated but closely related taxa that comprise the D, E, and H genome group (Figs 5, 6). There are also three allopolyploid combinations involving hybridization between more divergent genomes, notably the two crosses involving the A-genome G. tomentella D4 diploid. One of these hybridizations, with the D-genome G. tomentella D3 race, produced the tomentella T2 polyploid, whereas a second wide cross with the B′-genome G. stenophita produced G. pescadrensis. The tomentella T5 polyploid also was formed by a phylogenetically wide hybridization, involving a cross between a second A-genome species, G. clandestina, and an E-genome diploid tomentella (Figs 5, 6). Many of these allopolyploids appear to be quite young, as inferred from the presence of identical nuclear gene alleles or chloroplast haplotypes in polyploids and their diploid progenitors (Doyle et al., 2004). For example, polyploid G. tabacina is estimated to have arisen within the last 30 000 yr (Doyle et al., 1999b).

The distribution of allopolyploidy indicates that hybridization has been extensive in the recent history of the subgenus. However, a full sense of the frequency of hybridization is not captured by merely listing the different genome combinations of known allopolyploids. Evidence from allele or haplotype networks has shown that most, if not all, allopolyploids in subgenus Glycine arose more than once (Doyle et al., 2004), by the kind of recurrent polyploid formation that is now appreciated as being the rule, rather than the exception (Soltis & Soltis, 1999; Wendel, 2000). Some of the polyploid combinations have been produced as many as six different times from the same basic genome combination, but involving different diploid genotypes (Fig. 6).

Seven different chloroplast haplotypes were identified in polyploid G. tabacina, six of them identical to haplotypes found among core B-genome diploids (Doyle et al., 1990e). Three different histone H3-D alleles were identified in the same group of polyploid accessions, again a subset of the variation found in core B-genome diploids (Doyle et al., 1999a,b). Although the histone H3-D alleles and chloroplast haplotypes found in G. tabacina polyploid accessions all occur in B-genome diploids, most of the two-locus genotypes do not. The fact that the most common chloroplast haplotype in polyploid G. tabacina is the Group I haplotype found in most G. microphylla accessions suggested that G. microphylla was a progenitor of G. tabacina. But the situation is quite different for histone H3-D, where the common allele found in G. microphylla is not found in G. tabacina at all.

An explanation for this is that diploid and polyploid taxa are dynamic, and that evolution in one or both has occurred subsequent to the formation of the polyploid. Evidence from incongruence between cpDNA and histone H3-D allele relationships suggested hybridization and introgression in the core B-genome group. If so, then it is reasonable that genotypic combinations that existed even 30 000 yr ago might no longer exist, having been replaced by new combinations. The above discussion of the core B-genome group focused on introgression of Group II haplotypes into G. microphylla, which would not account for the situation in G. tabacina polyploids. However, a diploid hybrid between taxa with Group I and Group II haplotypes, with a Group I chloroplast haplotype but histone alleles of a different B-genome species, could have been an ancestor for G. tabacina. The failure to sample diploids with this genotype could be due simply to drift and fixation at the diploid level subsequent to the polyploid event.

An alternative explanation is that one of the several different B-genome progenitors of G. tabacina was a plant with the genotype of a modern G. microphylla, and thus G. tabacina polyploids with both histone H3-D alleles and chloroplast genomes found in extant G. microphylla existed at one time. Australian G. tabacina exists as a single polymorphic biological species, in which crossing has occurred among plants with different origins, and lineage recombination has created numerous different multilocus genotypes (Doyle et al., 1999b). During this process, more chloroplast haplotypes have been retained than have histone H3-D alleles, despite the fact that the effective population size is lower for uniparentally transmitted organellar genomes than for the nuclear genome (Birky, 2001). Crossing among different polyploid genotypes, coupled with drift and fixation, could have led to the situation in which not only are the original genotypic combinations contributed by G. microphylla absent from polyploid G. tabacina, but the G. microphylla histone alleles are missing entirely from the polyploid.

The important lesson for systematists working with polyploids is that alleles are not the same thing as ancestors – another corollary of the fundamental difference between gene trees and species trees. It is possible to identify alleles or haplotypes shared between a polyploid and a given diploid species at a single genetic locus, but it cannot be said with certainty that this diploid species was the donor of that allele or haplotype to the polyploid. Finding multilocus genotypes shared by the polyploid and a diploid would be more compelling evidence of the identity of the diploid genome donor. The G. tomentella part of the polyploid complex provides an example of this for nuclear genes. All individuals belonging to the T4 tomentella polyploid taxon combine D3 and D5B diploid genomes, and this combination appears to have originated more than once, as indicated by different combinations of alleles in polyploid plants (Doyle et al., 2002). Allele networks for the D3 homoeologues of histone H3-D, nrDNA ITS, and nu-cox2 all show similar patterns in which two major groups of T4 accessions occur, one of which consistently groups with the same D3 accession (Doyle et al., 2002; Rauscher, Doyle & Brown, 2004; J. J. Doyle et al., unpublished). In this case it is reasonable to hypothesize that a plant related to this particular diploid accession contributed the D3 genome to the progenitor of this group of T4 accessions. But this inference is possible only because there appears to have been no hybridization between T4 polyploids that combine the same two basic genomes, but originated independently from different diploid genotypes. The original polyploid genotypes have not been scrambled by lineage recombination.

Chloroplast DNA has provided some surprising results in studies of polyploidy in subgenus Glycine, as it did at the diploid level. In both G. tabacina and G. tomentella, a larger number of polyploid origins were inferred from shared cpDNA haplotypes than from shared nuclear gene alleles. In the case of G. tomentella the added surprise was that in several cases haplotypes from both diploid progenitors were observed in different polyploid accessions (J. T. Rauscher et al., unpublished). Chloroplast DNA is maternally inherited in Glycine (Hatfield et al., 1985), meaning that the crosses that produced these polyploids occurred in both directions, with both nuclear genome donors serving as maternal parent. That only some polyploids show evidence of bidirectional origins is one of several contrasts observed among these closely related taxa (Doyle et al., 2004).


Reticulation has been an important process in the evolution of Glycine subgenus Glycine, probably throughout its history. Existing data are equivocal about whether the polyploid event responsible for the 2n = 40 genome of the genus was fundamentally auto- or allopolyploid. However, because even most autopolyploids presumably involve hybridization rather than spontaneous doubling of a single individual (Lewis, 1980) it is likely that the event was reticulate even if the result was a cytogenetic or taxonomic autopolyploid.

Incongruence between various gene trees is observed among diploid Glycine taxa, from the major genome groups down to individuals within species. Such incongruence can be accounted for in various ways, but in each case reticulation is a viable explanation. The plethora of gene tree topologies for the genome groups, in particular, is suggestive of a long history of hybridization in the genus, and it is possible that some species are stabilized hybrids. Hybridization between genome groups has continued to the present, as documented by recently formed allopolyploids such as G. pescadrensis, which was formed by a cross between A and B genome taxa. In G. tabacina, histone H3-D alleles from diploids and polyploids are identical, and this is true for many alleles in other polyploids of the subgenus. Divergence between alleles from polyploids and their putative diploid progenitors is observed for some G. tomentella polyploids, but this could be a result of low sampling of the diploids. In any event, there are no cases in subgenus Glycine where alleles or haplotypes from a polyploid have diverged so substantially from those of its diploid progenitors that it is impossible (or even difficult) to identify the genome donors. Thus there do not appear to be any polyploids representing intermediate stages in the evolution of the subgenus.

Estimates for the origin of one of the young polyloids, G. tabacina, around 30 000 yr ago, coincide roughly with a period of environmental and vegetational change in Australia that in part could have been anthropogenic in origin (Hope, 1994; Doyle et al., 2002). The formation and establishment of hybrids, either at the diploid or polyploid level, is facilitated by disturbance and the creation of new, open habitats (Stebbins, 1971; Grant, 1981), so the correspondence of these dates may not be a matter of chance. Perhaps other polyploids were formed previously but were unable to establish themselves in earlier, stable conditions. If some of the diploid species of the subgenus are stabilized hybrids, it would appear that these earlier conditions favored diploid hybridization over allopolyploidy.


We are grateful for the work of other participants in our Glycine research, particularly Carole Harbison, Amy Casselman, Raymond Mak, Simon Joly, and Sue Sherman-Broyles. Work on Glycine has been supported by grants from the US National Science Foundation Systematic Biology Program, most recently NSF DEB-0089483. We thank Bernard Pfeil and three anonymous reviewers for useful comments on the manuscript. We also are grateful to Loren Rieseberg and Jonathan Wendel for encouragement and for organizing the symposium at which this work was presented.