•Whole-genome duplication coupled with hybridization is of prime importance in plant evolution. Here we reinvestigate Müntzing’s classical example of allopolyploid speciation; the first report of experimental synthesis of a naturally occurring allopolyploid species, Galeopsis tetrahit.
•Various molecular markers (cpDNA, NRPA2, amplified fragment length polymorphisms (AFLPs)) and flow cytometry were surveyed in population samples of subgenus Galeopsis, including two allopolyploid species and their potential diploid parents.
•The presence of two divergent copies of single-copy NRPA2 confirms the allopolyploid origins of G. tetrahit and Galeopsis bifida. However, the two allopolyploids do not share the same maternal genome, as originally suggested by Müntzing. The results support independent origins, but not recurrent formation, of the two allotetraploids. Data further indicate frequent gene flow and introgression within ploidy levels, but less so between ploidy levels.
•Our results confirm and elaborate on Müntzing’s classical conclusion about allopolyploid origins of G. tetrahit and G. bifida. We address questions of general interest within polyploidy research, such as recurrent formation, gene flow and introgression within and between ploidy levels.
Polyploidy (whole-genome duplication) has long been recognized as an important process in plant evolution (Stebbins, 1940, 1971). Polyploidy generates genetic, chromosomal and morphological variation and, coupled with hybridization (i.e. allopolyploidy), often leads to speciation (Leitch & Leitch, 2008; Soltis & Soltis, 2009; Soltis et al., 2010). Within a few generations a new species may arise, which is more or less reproductively isolated from the diploid progenitors. As such, polyploid evolution represents the only widely accepted mode of sympatric speciation (Hendry, 2009). The application of molecular tools has made us aware of the universality of polyploid speciation in all angiosperm lineages. New examples of plant groups with evidence of ancient and/or recent polyploidy are continually being published. In this study we address an old example of polyploid speciation, which is actually the first report (Müntzing, 1932) of an experimental synthesis of a naturally occurring allopolyploid species, Galeopsis tetrahit.
Between 1923 and 1945, Arne Müntzing, Professor in genetics at Lund University, Sweden, devoted most of his research to cytological, morphological, and experimental investigations of the genus Galeopsis (Müntzing, 1927, 1929, 1930a,b, 1932). He succeeded, via a two-step process involving a triploid bridge, to synthesize an allotetraploid plant, which in chromosome number and morphology was very similar to the naturally occurring tetraploid (2n = 32) G. tetrahit (Müntzing, 1930a,b, 1932). From a cross between the diploids (2n = 16) Galeopsis pubescens (mother) and Galeopsis speciosa (father), he obtained a F1 hybrid with irregular meiosis which was almost sterile. Nevertheless, selfing the F1 hybrid produced a F2 generation with some sterile diploids and one pollen-sterile triploid plant. Backcrossing of the triploid plant to a G. pubescens father resulted in a single seed, which gave rise to a tetraploid F3 plant. This plant produced 70% good pollen, had regular meiosis with 16 bivalents, and was cross-compatible with the wild growing G. tetrahit. Müntzing’s work has been regarded as pioneer research in the field of polyploidy and has been referred to in textbooks ever since (Coyne & Orr, 2004; Futuyma, 2009).
Galeopsis tetrahit belongs in a small and morphologically distinct genus of erect annual herbs within Lamiaceae subfamily Lamioideae (Harley et al., 2004). Depending on the taxonomical circumscription, Galeopsis comprises from nine to 12 species, and the distribution of the genus is temperate Eurasian with centre of species richness in Europe (Meusel et al., 1978). Generic monophyly and a phylogenetic position of Galeopsis close to the resurrected genus Betonica was shown in two recent molecular phylogenetic investigations of subfamily Lamioideae that included three (Scheen et al., 2010) and eight (Bendiksby et al., 2011) species of Galeopsis.
Reichenbach (1830–1832) divided Galeopsis into two subgenera, Galeopsis and Ladanum. Galeopsis tetrahit belongs within subgenus Galeopsis, which is readily distinguished from subgenus Ladanum by the presence of rigid hairs and swollen stem nodes. The subgeneric division is also supported by phytochemistry (Tomas-Bárberán et al., 1991), crossing experiments (Müntzing, 1930a), and chloroplast DNA sequence data (Bendiksby et al., 2011). Subgenus Galeopsis comprises four species currently recognized by the world checklist of Lamiaceae and Verbenaceae (Govaerts et al., 2010): Galeopsis bifida (temperate and boreal Europe and Asia), G. pubescens (mainly temperate Central and Eastern Europe), G. speciosa (mainly temperate Europe and Western Asia), and G. tetrahit (mainly temperate and boreal Europe). A previously recognised taxon, Galeopsis sulphurea or G. speciosa ssp. sulphurea, which is currently listed as a synonym of G. speciosa (Govaerts et al., 2010), was included in the present study. Material from this taxon, which is endemic to the south-western Alps, will in the following be referred to as G. sulphurea.
Müntzing (1927) reported for the first time the chromosome base number x =8 for the genus Galeopsis, the diploid chromosome number 2n = 16 for G. pubescens and G. speciosa, and the tetraploid chromosome number 2n = 32 for G. bifida and G. tetrahit. These chromosome numbers have later been confirmed by Müntzing (1930a, 1932) and others (e.g. Pogan et al., 1982; Wieffering, 1983). Müntzing (1932) hypothesized an allotetraploid origin of G. bifida similar to that of G. tetrahit based on circumstantial evidence (i.e. morphological similarity and cross-compatibility between the two tetraploids).
Interspecific crossing experiments revealed that crosses between the two subgenera and between taxa with different ploidy levels were not successful, and Müntzing (1930a) defined three nonintercrossing ‘natural’ groups: (1) subgenus Ladanum (2n = 16), (2) G. pubescens and G. speciosa (2n = 16), and (3) G. tetrahit and G. bifida (2n = 32). Later, however, he found that the sterility barrier between the last two groups could be overcome by colchicine treatment of the diploids producing autotetraploids that could be hybridized with natural tetraploids in both directions (Müntzing, 1941, 1943).
The aim of the present study is to test Müntzing’s hypotheses about the origin of the tetraploid species, G. tetrahit and G. bifida, using various molecular markers. Reticulate evolutionary patterns involving hybrid origin of polyploid species can be successfully disentangled using DNA sequence data of low-copy nuclear genes, especially when the allopolyploidization is of a recent date and both parental homoeologs are still present in the polyploid genome (Brysting et al., 2007, 2011; Fortune et al., 2008; Mason-Gamer, 2008). Chloroplast DNA sequences, on the other hand, provide information about only one of the parental genomes (the maternal when the plastid is maternally inherited, as is assumed to be the case in most, but not necessarily all, angiosperm groups). Past events of chloroplast capture (via hybridization) may, however, be indicated from incongruent chloroplast vs nuclear phylogenies (Rieseberg et al., 1996; Frajman & Oxelman, 2007). DNA sequence data mirror only a minute fraction of the organism’s genetic diversity, whereas DNA fingerprinting data, such as amplified fragment length polymorphisms (AFLPs), are presumed to reflect the entire genome. We have generated both DNA sequence data (chloroplast and nuclear) and, including a larger population sampling, also AFLPs for the two tetraploid species and their potential diploid progenitors. Flow cytometry analyses have been used to ascertain ploidy levels and variation in nuclear DNA content for selected samples.
We specifically ask the following questions: Did the two Galeopsis tetraploids originate by allopolyploidization? Do the two Galeopsis tetraploids share the same parental genomes, G. pubescens and G. speciosa? Are the two tetraploids the result of a single polyploidization event with subsequent divergence or did they originate in two separate events? Are the known morphological disparities between G. tetrahit and G. bifida due to reciprocal parentage? Does hybridization and gene flow take place within and between ploidy levels in nature?
Materials and Methods
Plant material was sampled from 19 localities all over Europe representing the tetraploids, Galeopsis tetrahit L. and G. bifida Boenn., and their supposed diploid parental species, G. speciosa Mill. and G. pubescens Besser (Table 1). We included two populations of G. sulphurea Jord. and some potential hybrid plants, representing gene flow within (tetrahit × bifida) and between ploidy levels (pubescens ×bifida). For AFLP analysis (158 samples), we included individuals from two or more populations of all taxa, except for G. sulphurea of which material from only one population was available. For DNA sequencing, we selected, based on initial AFLP results, multiple samples of all taxa (47 samples). For rooting, we included two samples of subgenus Ladanum (Gilib.) Rchb. For the flow cytometry analyses, we selected 38 samples, including all taxa and some potential hybrid plants (Table 1).
Table 1. List of species and populations with indication of how many and which samples were analysed by amplified fragment length polymorphism (AFLP), flow cytometry (FCM) and sequencing, respectively
GenBank accession number
1The number of samples from a population that came out as admixed in the structure analysis is reported in brackets in the AFLP column.
2 Individuals indicated as hybrid plants by the sequence data are indicated with an asterisk.
3Different NRPA2 homoeologs from the same accession are indicated with superscript a and b.
We estimated relative fluorescence intensities by flow cytometry on silica-dried leaves and the two-step methodology according to Doležel et al. (2007). Bellis perennis L., 2C = 3.38 pg (Schönswetter et al., 2007), was used as primary internal standard for the expected diploids, and Pisum sativum L., ‘Ctirad’, 2C = 9.09 pg (Doležel et al., 1998) as secondary internal standard for the expected tetraploids because of peak overlapping of tetraploids and B. perennis. Detailed description of the method is available in the Supporting Information, Methods S1.
We performed a one-way ANOVA with taxon as classification factor and relative fluorescence intensity as dependent variable to test whether taxa differed in relative fluorescence intensity. To test which species differed from each other, we used Tukey’s multiple comparison test. We used R version 2.11.1 (R Development Core Team, 2010) for the statistical tests.
DNA extraction DNA was extracted as described in Bendiksby et al. (2011), but we eluted the DNA twice in separate tubes. DNA aliquots obtained from the first elution step, containing the highest DNA concentration, were used for AFLP analysis, whereas those from the second elution step were used for DNA sequencing.
AFLPs We used the AFLP procedure of Vos et al. (1995) with several modifications. In the selective amplification, fluorescent labeled EcoRI primers (Applied Biosystems, Carlsbad, CA, USA) with three selective nucleotides and unlabelled MseI primers with three selective nucleotides (6FAM-EcoRI-ACA/MseI-CAC; VIC-EcoRI-AAG/MseI-CTC; NED-EcoRI-AGC/MseI-CTG) were used. Selective PCR products were pooled and purified with Sephadex G-50 Superfine Resin (GE Healthcare, Chalfont St. Giles, United Kingdom), added to 12.8 μl double-distilled H2O and 0.2 μl ET-ROX 550 bp size standard (GE Healthcare, Chalfont St. Giles, United Kingdom), denatured at 95°C for 5 min and cooled on ice. The PCR fragments were separated on a 48 capillary sequencer (MegaBACE 1000; GE Healthcare, Chalfont St. Giles, United Kingdom) according to the manufacturer’s instructions. Detailed description of the AFLP procedure is available as Methods S1.
PCR amplification and DNA sequencing We produced DNA sequences of four chloroplast DNA (cpDNA) regions (rps16 intron, trnL intron, trnL-trnF spacer, and trnS-trnG spacer), and one low-copy nuclear gene (NRPA2, encoding the second largest subunit of RNA polymerase I). The plastid regions were amplified as described in Bendiksby et al. (2011).
We used the nested PCR procedure and degenerate primers described by Popp & Oxelman (2004) to obtain NRPA2 sequences from two samples of each species. The PCR products were cloned using the TOPO-TA cloning kit (Invitrogen Dynal AS, Oslo, Norway) following the manufacturer’s manual with half of the recommended volumes. We amplified eight to 16 colonies and sequenced four to eight products. From conserved exon regions in the resulting NRPA2 matrix, we developed a nondegenerate primer pair (5′–3′ direction): A2Gal-5F (ATTGATATGCCATTCTCAGGAGTCAC) and A2Gal-1265R (GTATACAGGGCCAATAAATATTTCG). NRPA2 appears to be single-copy in Galeopsis, and PCR products from diploids could be sequenced directly using these nondegenerate NRPA2 primers. About half the length of the parental NRPA2 homoeologs, presumed to be present in the two tetraploid species, was successfully amplified and sequenced from both tetraploids using the following parental-homoeolog specific primers (5′–3′ direction): spe29-F (AGGAATGCGACCTGATCTTATC) in combination with spe693-R (TATGGCGTAGACTGTCCCGCTTTTGGA), and pub29-F (TGGAATGCGACCTGATCTTATA) in combination with pub669-R (GAAGAAGTAATATGATTATGATCCT).
We used the following cycling conditions for the PCR amplifications: 95°C for 10′, 31 (cpDNA) to 34 (NRPA2) cycles of 95°C for 30″, 60°C for 30″, 72°C for 1′, followed by 72°C for 10′ and a final hold at 10°C. The PCR products were purified and sequenced as described in Bendiksby et al. (2011).
GenBank accession numbers for all sequences are listed in Table 1.
AFLP data scoring and analyses As recommended by Bonin et al. (2004), we duplicated and processed 17 DNA samples twice to enable the estimation of an error rate. We scored the electropherograms in DAx 8.0 (Data Acquisition & Data Analysis Software; Van Mierlo Software Consultancy, Eindhoven, the Netherlands) with ET-ROX 550 bp as internal size standard. We used the implemented peak-search function in DAx with subsequent manual manipulations to control for uneven peak intensities. Manual correction is necessary because skewed profiles can originate from technical aspects, such as capillary quality and minor differences in PCR product amounts. We set the scoring bins manually and exported a presence/absence matrix of AFLP markers.
We calculated the error rate (Bonin et al., 2004) as the ratio of mismatches (scoring of 0 vs 1) over matches (1 vs 1) in AFLP profiles of the 17 replicated individuals. We performed NeighborNet (NN) analyses using the Dice coefficient (Dice, 1945) in splitstree4 (Huson & Bryant, 2006) on the total dataset (AFLP1) and a reduced dataset excluding admixed samples (AFLP2; see the Results section). Using the same program and similarity measure, Neighbor-Joining (NJ) analysis with 1000 bootstrap replicates was conducted on the AFLP2 dataset.
We carried out population mixture analyses using Bayesian clustering in structure 2.2 with the approach developed for dominant AFLP markers (Pritchard et al., 2000; Falush et al., 2007). The dataset was converted into structure format with the R-script AFLPdat (Ehrich, 2006). We applied the admixture model within the recessive allele model with uncorrelated allele frequencies and did 10 replicate runs for each K from K = 1 to K = 10 on the freely available Bioportal, University of Oslo, Norway (http://www.bioportal.uio.no), using a burn-in of 2 × 106 iterations followed by 5 × 106 additional Monte Carlo Markov Chain iterations. Choice of K was based on calculations of the log probability of the data (LogeP(D)) for each run using the R-script Structure-sum (Ehrich, 2009).
Sequence alignment and analyses We aligned DNA sequences manually using bioedit version 22.214.171.124 (Hall, 1999). First, we analysed all regions separately, using parsimony (see below, this paragraph) with 1000 replicates. Then we concatenated congruent (cpDNA; see the Results section) datasets, coded insertions/deletions (indels) as present/absent and added these to the matrices as additional, unordered characters using seqstate (Müller, 2005) following the simple indel coding of Simmons & Ochoterena (2000). We used the Akaike information criterion in mrmodeltest (Nylander, 2004) to estimate the optimal substitution model for each region. We performed parsimony and Bayesian phylogenetic analyses with and without coded indels using tnt 1.1 (Goloboff et al., 2003) and MrBayes 3.1.2 (Huelsenbeck & Ronquist, 2001; Ronquist & Huelsenbeck, 2003), respectively, as described in Bendiksby et al. (2011). We performed parsimony jackknifing (Farris et al., 1996) in tnt with 36% removal probability, 2000 replicates and output in absolute frequencies with a cut-off at 50%. We performed NeighborNet analyses from DNA sequences in splitstree4 using Jukes–Cantor distances.
Nuclear DNA content variation
Flow cytometric analyses yielded relative fluorescence intensities for 38 Galeopsis samples (Table 2). The average coefficients of variation of the G0/G1 peaks of the sample analysed and internal reference standard were 3.18% (range 2.35–4.62) and 2.12% (range 1.27–4.27), respectively. The mean relative fluorescence intensities of the diploid taxa differed significantly from each other (P <0.05 in Tukey’s multiple comparisons test; see Table S1), except for G. sulphurea and G. pubescens. The largest nuclear DNA content was found in G. speciosa (0.642 ± 0.023), intermediate in G. pubescens (0.546 ± 0.010) and lowest in G. sulphurea (0.516 ± 0.006). The tetraploid taxa had mean nuclear DNA contents that differed significantly from each other, with that of G. tetrahit (0.950 ± 0.027) somewhat higher than that of G. bifida (0.913 ± 0.017). The G. tetrahit ×bifida hybrids from Slovenia (PS06-001, 002, 004) had relative fluorescence intensities overlapping with those of G. tetrahit, whereas the hybrids from Altenberg, Austria (MB06-122-125, 128, 129) were overlapping in nuclear DNA content with both G. tetrahit and G. bifida. The two G. pubescens × bifida hybrids from Altenberg, Austria (MB06-111, 112) had mean nuclear DNA content that was larger (but not significantly so) than G. pubescens from the same locality, as well as other localities investigated (Table 2).
Table 2. Mean and standard deviation (SD) of relative fluorescence intensities from the flow cytometry analyses
Relative fluorescence intensity mean ± SD (range)
Number of samples
0.516 ± 0.006 (0.511–0.523)
0.546 ± 0.010 (0.534–0.556)
0.642 ± 0.023 (0.619–0.665)
0.913 ± 0.017 (0.888–0.941)
0.950 ± 0.027 (0.919–0.981)
Galeopsis pubescens × bifida (Altenberg)
0.575 ± 0.007 (0.570–0.580)
Galeopsis tetrahit × bifida (Altenberg)
0.942 ± 0.014 (0.921–0.960)
Galeopsis tetrahit × bifida (Slovenia)
0.969 ± 0.006 (0.965–0.976)
The error rate was estimated to 4%. Markers that were not reproducible in replicates, or that were present in or absent from only one sample (possibly owing to PCR errors), were removed. In the final total dataset (AFLP1), 203 polymorphic markers were scored in 163 samples (including replicates of four samples from Hurum, Norway; MB04-33-36).
In the structure analysis of the AFLP1 dataset, K = 6 was chosen as the most appropriate number of groups. Although no congruent results were obtained apart from K = 2, the log probability of the data (LogeP(D)) reached a clear plateau at K = 6 (Fig. S1). The result from one of the 10 replicates at K = 6 (the one with the highest LogeP(D) value) is shown in Fig. 1. Apart from groups corresponding largely to the four species (G. sulphurea was included with five samples from one population and grouped with G. speciosa), there were two additional groups (groups 5 and 6; Fig. 1) consisting primarily of samples that were characterized by > 10% admixture and identified a priori mostly as pubescens × bifida, tetrahit × bifida or tetrahit. Whereas only a few samples were assigned in low percentage to group 6, several samples were assigned to group 5, but for only four samples with > 95% assignment. As the Bayesian clustering resulted in high admixture (> 5%) for several samples, a second dataset was compiled, AFLP2, containing 123 samples that were not or < 5% admixtured (excluding four samples with < 5% admixture from group 5). These samples are, in the following text, referred to as pure.
In the NN analyses of AFLP1 (Fig. 2a) most samples clustered according to taxa, but many samples that showed high admixture in the Bayesian clustering fell in between the main groups (Fig. 2a). The NN of AFLP2 (Fig. 2b) showed a clear internal structure with most groups/taxa supported by bootstrapping also in the Neighbor-Joining analysis (not shown). A cluster of G. sulphurea was strongly supported (100%), and northern European samples of G. speciosa clustered with support (86%). No significant intraspecific structure was evident within G. bifida, G. tetrahit or G. pubescens.
The number of markers scored per sample was highly variable (Table 3). Among the pure samples, G. pubescens had, on average, 19.3 markers per sample, G. tetrahit had 25.7, G. speciosa had 32.5, G. sulphurea had 31.2 and G. bifida had 49.8. The admixed samples had, on average, more markers than the pure samples (Table 3). The tetraploid G. tetrahit shared 57% of its markers with diploid G. speciosa, 43% with diploid G. sulphurea, and 49% with diploid G. pubescens (several of these markers were shared between two or all three diploids; not shown). For G. bifida, the corresponding numbers were 46%, 27% and 27%. In the AFLP2 dataset, 72 markers were unique to single taxa, mostly accounted for by G. bifida (26) and G. speciosa (22), with only six unique markers in G. tetrahit and G. pubescens, respectively (Table 4). The highest number of shared markers between two taxa (not found in any other taxa) was found between G. speciosa and G. bifida (14) (Table 5). In the AFLP1 dataset, 19 markers were present only in the 123 pure samples, whereas 47 markers were found only in the 40 admixed samples (not shown).
Table 3. Mean and standard deviation (SD) of amplified fragment length polymorphism (AFLP) markers per individual calculated for groups of admixed samples, as indicated by the Bayesian clustering, and species (pure samples)
No. of samples
1Four samples < 5% admixed from structure group 5 are included among the admixed samples.
> 10% admixed
< 5% admixed (structure group 5)1
all admixed samples
all pure samples (< 5% admixed)
Table 4. Summary of the unique amplified fragment length polymorphism (AFLP) marker analysis of pure samples
All markers1 No.
Unique markers1 No.
Unique markers %
1Markers with < 5% frequency were excluded to correct for hidden admixture and errors.
Table 5. Number of shared unique amplified fragment length polymorphism (AFLP) markers (i.e. markers that are shared by two taxa and absent from all others) among pure samples1
1Markers with < 5% frequency were excluded to correct for hidden admixture and errors.
DNA sequence phylogenies and networks
Lengths in base pair of the aligned DNA regions were: rps16 intron 884; trnL-F region 881; trnS-trnG spacer 595 and NRPA2 1277. All chloroplast regions produced congruent topologies and were concatenated to a single 2360 bp long matrix before final analysis (referred to as cpDNA hereafter). The nuclear-encoded NRPA2 generated a different topology and was analysed separately. The indel matrices that were added to the cpDNA and NRPA2 datasets included 20 and 29 characters, respectively. Parsimony informative characters in the matrices with/without coded indels were: cpDNA 68/51 and NRPA2 79/56. Selected nucleotide substitution models of the aligned DNA regions were: GTR+G for rps16 and trnS-trnG; GTR+G+I for the trnL-F region; and, HKY for NRPA2.
Polymorphic character states and/or what might be interpreted as recombinant sequences (mixing G. tetrahit and G. bifida sequence types) were detected in some NRPA2 sequences, and six samples were excluded before the final phylogenetic analyses to avoid branch collapse (G. pubescens AT06-074, PS11737; G. speciosa MB06-098, PS11738; G. bifida AT111177; G. tetrahit × bifida MB06-122). All samples (except the two outgroup taxa) were, however, included in the NN analyses.
The Bayesian 50% majority rule consensus trees from the analyses of the indel coded cpDNA and NRPA2 datasets are presented (Figs 3a, 4a). Phylogenies obtained from the datasets with/without coded indels were congruent but variously resolved and supported. Indel coded datasets provided better resolution and support regardless of method used. Parsimony rescaled consistency (RC) indices were high and homoplasy indices (HI) low in all analyses and identical or highly similar between analyses with/without coded indels (cpDNA: RC = 0.91/0.92 and HI = 0.07/0.06; NRPA2: RC = 0.98/0.93 and HI = 0.01/0.04). Parsimony (not shown) vs Bayesian analysis of the same dataset, cpDNA or NRPA2, provided nearly identical topologies.
The general picture
In both phylogenies (Figs 3a, 4a), samples of the same species formed well-supported groups, although in some cases unresolved with respect to other species. Considering only diploids, G. speciosa and G. sulphurea were more closely related than any of them were to G. pubescens. The tetraploid G. tetrahit formed a strongly supported group with G. sulphurea in the cpDNA phylogeny (Fig. 3a), whereas the two G. tetrahit NRPA2 homoeologs grouped with G. sulphurea (moderately supported) and G. pubescens, respectively (Fig. 4a). The tetraploid G. bifida was supported as sister to the clade consisting of G. speciosa, G. sulphurea and G. tetrahit in the cpDNA phylogeny but this relationship was supported only when coded indels were included (Fig. 3a), and G. pubescens was in turn sister to this group. The two G. bifida NRPA2 homoeologs grouped with G. speciosa and G. pubescens, respectively (Fig. 4a). Whereas all samples grouped according to taxon delimitation in the NRPA2 phylogeny (Fig. 4a), two samples of G. pubescens from Altenberg, Austria (MB06-104, 109), fell out in a strongly supported clade together with G. speciosa in the cpDNA phylogeny (Fig. 3a). Apart from this, there were generally low levels of intraspecific variation in both cpDNA and NRPA2, as shown by largely unresolved relationships and little variation in terminal branch lengths among conspecific samples (Figs 3a, 4a). Only in G. speciosa, was some phylogenetically informative intraspecific variation detected, with two Austrian samples (AT06-002, PS11377) forming a supported clade in the NRPA2 phylogeny, which was sister to a clade consisting of the remaining G. speciosa samples as well as the G. bifida samples (Fig. 4a). In the cpDNA phylogeny, one of these samples (PS11377) grouped with the two misplaced G. pubescens samples, forming a sister clade to the remaining G. speciosa samples (Fig. 3a).
The NN gave the same overall picture (Figs 3b, 4b), with the major split in the NRPA2 network separating the G. speciosa/G. sulphurea sequences from G. pubescens (including the tetraploid homoeologs clustering together with each of these two groups), and a further split separating G. sulphurea/G. tetrahit from G. speciosa/G. bifida sequences (Fig. 4b). In the cpDNA network (Fig. 3b), G. tetrahit grouped together with G. sulphurea, whereas G. bifida was well separated from all diploid species. The patterns obtained in the phylogenies with regard to intraspecific variation in G. speciosa and the two misplaced G. pubescens samples (in cpDNA) were confirmed in the networks (Figs 3b, 4b).
The presumed hybrids
Several plants collected from localities where different species occur in sympatry were presumed to be hybrids, and some of these were included in our molecular analyses. The G. pubescens × bifida samples (MB06-111-113) formed a divergent group in the AFLP1 analysis (Fig. 2a), whereas in the NRPA2 analyses they came out together with G. pubescens (Fig. 4a,b). Several presumed pubescens × bifida and tetrahit × bifida hybrid plants were shown to be admixed (Fig. 1) and fell in between the main groups in the AFLP1 NN (Fig. 2a). Five tetrahit × bifida samples (MB06-122 from Altenberg, Austria; PS06-002 and PS06-004 from Slovenia; MB04-035 and MB06-149 from Norway) were included in the DNA sequence datasets. In the cpDNA phylogeny, they all came out together with G. bifida (Fig. 3a,b), whereas in the NRPA2 NN (Fig. 4b), the affinities of the homoeologs varied: one homoeolog of the Austrian sample (MB06-122) grouped with G. bifida and the other held an intermediate position between G. tetrahit and G. bifida; the two Norwegian samples (MB04-35, MB06-149) grouped with G. tetrahit; and NRPA2 homoeologs of the two samples from Slovenia (PS06-002, 004) grouped with G. bifida and G. tetrahit, respectively.
The NRPA2 sequences of five samples not presumed to be hybrid plants contained polymorphic nucleotides or seemed to be recombinants: G. pubescens (PS11737, AT06-074), G. speciosa (PS11738, MB06-098) and G. bifida (AT111177). However, all these samples grouped according to taxon identification in the NN (Fig. 4b), except for one (MB06-098) that came out in an intermediate position between the two G. speciosa sequence types.
Müntzing did highly important work on the genus Galeopsis, using cytological and experimental methods to provide information on chromosome numbers, reproductive biology, sterility barriers within and between taxa, segregation patterns, and not least, artificial synthesis of the allotetraploid G. tetrahit. Since the time of Müntzing, powerful molecular methods have become available for unravelling reticulate evolution and testing hypotheses about allopolyploidization. We have revisited the origin of the two tetraploids, G. tetrahit and G. bifida, using population sampling and various molecular markers to ask questions on parentage, recurrent formation and hybridization within and between ploidy levels.
Two allopolyploids – with different origins
Our results confirm Müntzing’s hypothesis that G. tetrahit and G. bifida are of allotetraploid origin. The presence of two divergent copies of the nuclear single-copy NRPA2 gene, which group with orthologous sequences of different diploid progenitors, is compelling evidence for allopolyploidy. Extinction of progenitor taxa, or alternatively extinction of one or the other of the parental homoeologs in an allopolyploid genome, may lead to situations where it is difficult or impossible to reconstruct such reticulate patterns (Doyle & Egan, 2010). However, a relatively young system, as represented by the tetraploids of subgenus Galeopsis, where both parental contributions are still present, is ideal for testing hypotheses and answering questions that are of general interest when it comes to natural polyploid populations.
Based on interspecific crossings and morphological and cytological studies, Müntzing concluded that the allopolyploids G. tetrahit and G. bifida share the same two diploid parental genomes, G. speciosa and G. pubescens. There are several reasons why the same two diploid parental species might give rise to two morphologically and genetically distinct allopolyploid species. Multiple origins of allopolyploids seem to be a common scenario in natural populations (e.g. Soltis & Soltis, 1999; Soltis et al., 2004). Given that genetically divergent genotypes of one or both parents are involved, the result could either be a genetically, and morphologically, variable allopolyploid (e.g. Meimberg et al., 2009; Symonds et al., 2010), or if differences are large and gene flow between individuals resulting from separate origins is limited, the result could be distinct lineages with a varying degree of reproductive isolation (Doyle et al., 2004). Reciprocal formation of an allopolyploid, where one parental species act as mother in one crossing and as father in another, may further affect not only morphology but also reproductive isolation between two separate formations, as could be the case in allotetraploid Tragopogon miscellus Ownbey (Ownbey & McCollum, 1953; Soltis et al., 2004). Our intention was to find out whether any of these explanations could account for the situation in subgenus Galeopsis.
Our NRPA2 sequence data confirm that the diploid G. pubescens is involved as one of the parental lineages in both tetraploids because, in both cases, one homoeolog groups with G. pubescens (Fig. 4). Galeopsis bifida forms a distinct (although not well-supported) clade whereas G. tetrahit does not. This might indicate that different G. pubescens genotypes were involved in the origin of the allotetraploids. Alternatively, the formation of G. bifida may have happened before that of G. tetrahit, and longer time with post-polyploidization evolution could explain why G. bifida is genetically more divergent from the parental species. With regard to the second parent, it becomes clear that the two tetraploids not only have separate origins, but that different parental genomes are involved. The second NRPA2 homoeolog of G. bifida groups with G. speciosa, confirming Müntzing’s conclusion that G. speciosa is the second parent. The second G. tetrahit homoeolog, on the other hand, groups with G. sulphurea (Fig. 4). The G. sulphurea plants collected from the south-western Alps at the French–Italian border (AT06-031-035 and AT111657) were characterized by purely yellow flowers; morphologically similar to what was, in 1848, described as a separate species, G. sulphurea, or in 1891 as a subspecies, G. speciosa ssp. sulphurea. This taxon has, however, in later treatments been synonymized with G. speciosa (e.g. Govaerts et al., 2010). In the DNA sequence phylogenies (Figs 3a, 4a), G. sulphurea is a supported sister to G. speciosa. A similar pattern is evident from the AFLP data (Fig. 2). The G. sulphurea plants also proved to have a significantly lower nuclear DNA content than the G. speciosa plants (Table 2). The origin of the G. speciosa material used in Müntzing’s crossings is unclear from his publications (Müntzing, 1930a, 1932). In one of the crossings, he used a Swedish G. speciosa biotype but it is not clear whether this or another unknown biotype was the one actually involved in the artificial tetraploid. Müntzing (1930a) never mentioned G. sulphurea, and it does not appear as if the morphological variation of his G. speciosa material included this taxon.
In all reported experiments, Müntzing (1930a) crossed female G. pubescens with male G. speciosa. He did not try the reciprocal cross allegedly because of differences in flowering time. Our results suggest that G. sulphurea is the most likely organellar parent of G. tetrahit, as samples of these two taxa group with high support in the cpDNA phylogeny (Fig. 3a). Galeopsis pubescens is distantly placed, as sister to all other taxa of the subgenus, and more likely contributed the paternal genome. For G. bifida, identification of the organellar parent is less obvious from its position in the cpDNA phylogeny (Fig. 3a). We would expect it to group with the parent that contributed the chloroplast. However, G. bifida is poorly supported as sister to a clade consisting of G. speciosa, G. sulphurea and G. tetrahit, and G. pubescens is the sister to all of these (Fig. 3a). Although grouping with G. pubescence in the NRPA2 phylogeny, G. bifida appears genetically distinct (Fig. 4a). In combination with the cpDNA results, this might indicate that the actual contributing genotype has not been sampled. Either way, reciprocal parentage seems not to be the only or main reason for the morphological differentiation between the two tetraploids, but rather different parental genomes (G. speciosa vs G. sulphurea) involved in their origin.
It is, thus, clear that G. tetrahit and G. bifida did not result from a single polyploidization event with subsequent divergence, but from at least two independent polyploidization events. The next question is whether each of the two allotetraploids has resulted from a single or multiple allopolyploidization events. According to Müntzing (1930a), crosses between G. speciosa and G. pubescens proceeded without difficulty, and one might expect that ongoing hybridization between these two predominantly outcrossing (Müntzing, 1930a) species could have resulted in recurrent origin of the allopolyploid species in nature. Despite a thorough sampling of multiple samples from several populations of all species (including areas where they overlap in distribution), we found no clear evidence for recurrent origins of the two allopolyploids, which seems to be unusual among natural plant polyploids.
Hybridization in nature
Natural hybrids both within and between ploidy levels have been reported for subgenus Galeopsis (Briquet, 1893; Hegi, 1927; Benoit & Stace, 1975). However, all Müntzing’s (1930a) crossing experiments between ploidy levels failed, and he was convinced that all reports of hybrids between ploidy levels (e.g. G. pubescens × bifida and G. tetrahit × speciosa) were wrong and most likely could be ascribed to segregation products from crosses between either allotetraploids (G. tetrahit × bifida) or diploids (G. pubescens × speciosa). Crossings between taxa within ploidy levels were compatible and resulted in at least partially fertile and polymorphic progeny. He suggested that spontaneous hybridization in nature with subsequent segregation of hybrid progeny could explain the observed morphological polymorphism of taxa.
Our molecular data support that hybridization takes place between the taxa in subgenus Galeopsis. Two G. pubescens samples from Altenberg in Austria (MB06-104, 109), which grew side-by-side with G. speciosa, share cpDNA with G. speciosa (Fig. 3a). The same samples, however, behave as pure G. pubescens in the AFLP analyses (Fig. 2) and the NRPA2 phylogeny (Fig. 4a).
Potential tetrahit × bifida hybrid plants were collected from localities where the two taxa grow together (e.g. Hurum in Norway and Altenberg in Austria). These samples share cpDNA with G. bifida, and have either two G. tetrahit NRPA2 homoeologs (Hurum: MB04-035, 149), or one G. tetrahit and one G. bifida homoeolog (Altenberg: MB06-122). In the AFLP network, these samples attain intermediate positions between G. tetrahit and G. bifida (Fig. 2a).
Two samples from Slovenia (PS06-002, 004) show similar indications of introgression in the sequence data but behave as pure G. tetrahit in the AFLP analyses. Other samples from the same locality attain intermediate positions in the AFLP analyses. Also several other samples collected as either G. bifida or G. tetrahit from localities where the two species grow in sympatry turned out to be admixed in the structure analyses.
Even though reproductive isolation between ploidy levels is usually very strong (Soltis et al., 2010), unidirectional gene flow from diploids to polyploids has been predicted (Stebbins, 1971) and shown to occur, often through triploid bridges (Husband & Sabara, 2003; Kim et al., 2008; Slotte et al., 2008; Chapman & Abbott, 2010). Müntzing concluded, from years of crossing experiments in Galeopsis, that quantitative differences in chromosome number are more important for the origin of incompatibility barriers than genetic differentiation (Müntzing, 1930a, 1969). He found, however, that the ploidy level barrier could be overcome by producing autotetraploids of the diploids, which could then be hybridized without difficulty with the natural tetraploids in both directions (Müntzing, 1941).
In our data there is weak evidence for hybridization involving different ploidy levels. Three samples (MB06-111-113), which, based on morphology, were suspected to be hybrids between G. pubescens (2×) and G. bifida (4×), did not group with the otherwise tight group of G. pubescens samples in the AFLP analyses, indicating hybrid origin (Fig. 2a). The flow cytometry analyses showed that these samples also had higher DNA nuclear content than pure G. pubescens (Table 2). However, the NRPA2 phylogeny showed no evidence of hybridization (Fig. 4). Also, some G. speciosa samples from a mixed population with G. tetrahit from Altenberg, Austria (MB06-134, 136, 139), which we suspected could be hybrids, showed signs of introgression in the AFLP analyses (Fig. 1). Introgression from tetraploids into diploids has otherwise rarely been shown (Thórsson et al., 2001; Ståhlberg, 2009), and a more focused study is necessary to find stronger evidence and the mechanisms involved.
Genomic reorganization and DNA methylation
Other studies have reported additivity of parental AFLP markers in allopolyploids (e.g. Hedrén et al., 2001; Liu et al., 2001). We did not observe clear additive patterns in the Galeopsis allotetraploids or in recent hybrids. Even though sharing large parts of their genome with the diploids, G. tetrahit shared only five unique markers with G. pubescens and none with G. sulphurea, whereas G. bifida shared two unique markers with G. pubescens and 14 with G. speciosa (Table 5). Both tetraploids had unique markers not found in any of the parental species, with an especially high percentage in G. bifida (31%, Table 4). Amplified fragment length polymorphism markers unique to hybrids or allopolyploids are rarely reported and, when so, usually in low frequency and referred to as undetected parental polymorphisms (e.g. van Raamsdonk et al., 2000; Salmon et al., 2005).
Another notable aspect of our AFLP result is the variable number of markers per sample that seemed neither correlated to ploidy level nor to genome size (Table 4). For example, more markers were found in diploid G. speciosa (33) than in tetraploid G. tetrahit (26). A positive correlation between number of markers and ploidy level or genome size has been reported from studies on several polyploid groups (e.g. Lihova et al., 2004; Fay et al., 2005). Our divergent results could, at least in part, be caused by preferential elimination of homoeologous loci, which has been documented, for example, in allopolyploid Tragopogon (Koh et al., 2010).
A possible explanation of the AFLP patterns observed in Galeopsis could be methylation sensitivity of the restriction enzymes used (see REBASE and citations therein; Roberts et al., 2010). DNA methylation has been shown to impair or block the restriction activity of EcoRI, leading to fewer fragments from highly methylated genomes. The low number of copies found for example in tetraploid G. tetrahit might, thus, be explained by a highly methylated genome, whereas the G. bifida genome seems to be less methylated. This may also explain the high number of unique markers found in the two allotetraploids which were not present in the parental species.
Hybridization has been shown to cause demethylation in plants (Madlung et al., 2002; Paun et al., 2007). The higher numbers of markers (including many unique ones) in the admixed samples may, thus, be a result of loss or redistribution of methyl cytosine after recent hybridization and introgression. This could also explain the clustering of these samples in the NN (Fig. 2a) as well as the two additional structure groups (5 and 6; Fig. 1) that might result from clustering of hybrids with unique markers in stead of recognizing them as admixed.
Our results mostly confirm Müntzing’s conclusions regarding the allopolyploid origin of G. tetrahit and G. bifida, but the application of various molecular markers on a large population sample has also brought us somewhat further. It is clear that the allotetraploids do not share the same maternal genome, with G. speciosa contributing to G. bifida and G. sulphurea contributing to G. tetrahit. Even though G. sulphurea has been considered a synonym of G. speciosa in most recent treatments, the plants collected as such in the present study are clearly genetically distinct from the remaining G. speciosa samples. Therefore, it seems reasonable to re-establish G. sulphurea as a diploid taxon within subgenus Galeopsis, either at specific or subspecific level. Disregarding the taxonomic status of G. sulphurea, it is evident that the two tetraploids have resulted from independent polyploidization events and that the morphological disparities between them result from different parentage. Moreover, both tetraploids appear to have originated only once, as opposed to recurrent origins usually reported for natural polyploids. Müntzing (1932) suggested that the tetraploids originated in Central and Eastern Europe where the parental species have overlapping distribution. Our data more specifically place the origin of G. tetrahit in the south-western Alps where G. sulphurea is endemic. Our results also confirm Müntzing’s conclusions that frequent hybridization and introgression takes place within ploidy levels in subgenus Galeopsis, but some indications of hybridization between ploidy levels are also found.
Cecilie Mathiesen and Lisbeth Thorbek are thanked for assistance in the laboratory, Bozo Frajman, Heidi Solstad, Knut Rydgren, Peter Schönswetter, Reidar Elven, and Tore Berg for collecting plant material, Marte H. Jørgensen for help with analyses and valuable comments on the manuscript, and Victor A. Albert for contributing to the initial planning of the project. The Galeopsis project was supported by grants from the Research Council of Norway (no. 154145), UNIFOR, and the Women’s Equal Rights Committee at UOO.