Intraspecific variation in mitochondrial genome sequence, structure, and gene content in Silene vulgaris, an angiosperm with pervasive cytoplasmic male sterility

Authors


Author for correspondence:

Daniel B. Sloan

Tel: +1 203.737.3116

Email: daniel.sloan@yale.edu

Summary

  • In angiosperms, mitochondrial-encoded genes can cause cytoplasmic male sterility (CMS), resulting in the coexistence of female and hermaphroditic individuals (gynodioecy).
  • We compared four complete mitochondrial genomes from the gynodioecious species Silene vulgaris and found unprecedented amounts of intraspecific diversity for plant mitochondrial DNA (mtDNA).
  • Remarkably, only about half of overall sequence content is shared between any pair of genomes. The four mtDNAs range in size from 361 to 429 kb and differ in gene complement, with rpl5 and rps13 being intact in some genomes but absent or pseudogenized in others. The genomes exhibit essentially no conservation of synteny and are highly repetitive, with evidence of reciprocal recombination occurring even across short repeats (< 250 bp). Some mitochondrial genes exhibit atypically high degrees of nucleotide polymorphism, while others are invariant. The genomes also contain a variable number of small autonomously mapping chromosomes, which have only recently been identified in angiosperm mtDNA. Southern blot analysis of one of these chromosomes indicated a complex in vivo structure consisting of both monomeric circles and multimeric forms.
  • We conclude that S. vulgaris harbors an unusually large degree of variation in mtDNA sequence and structure and discuss the extent to which this variation might be related to CMS.

Introduction

Plant mitochondrial genomes exhibit an intriguing mixture of conservative and dynamic evolutionary patterns (Li et al., 2009; Mower et al., 2012b). On the one hand, their coding regions experience some of slowest rates of nucleotide substitution ever documented (Wolfe et al., 1987; Drouin et al., 2008). On the other, they exhibit frequent structural rearrangements (Palmer & Herbon, 1988), rapid turnover of intergenic sequences (Allen et al., 2007), and an ongoing process of gene loss and functional transfer to the nucleus (Adams et al., 2002b).

Although there are only a handful of angiosperm species for which the mitochondrial genomes of multiple individuals have been completely sequenced, it is clear that plant mitochondria can exhibit substantial intraspecific variation in genome size and structure, resulting from large sequence duplications and frequent rearrangements (Allen et al., 2007; Fujii et al., 2010; Chang et al., 2011; Darracq et al., 2011; Davila et al., 2011; Bentolila & Stefanov, 2012). By contrast, these whole-genome data have also shown that other properties of plant mtDNA are highly conserved within species, with low degrees of nucleotide diversity and only modest differences in genome ‘complexity’ (i.e. total sequence content after exclusion of duplicated regions). In addition, mitochondrial gene content appears to be highly conserved within species (e.g. Allen et al., 2007).

Studies of the genus Silene (Caryophyllaceae) have revealed a number of atypical patterns of mtDNA evolution that distinguish it from other angiosperm lineages. High degrees of nucleotide polymorphism have been documented in Silene mitochondrial genes (Stadler & Delph, 2002; Sloan et al., 2008; Touzet & Delph, 2009). Silene spp. also exhibit unusually high rates of nucleotide substitutions in mtDNA (Mower et al., 2007; Sloan et al., 2009). In a handful of species, accelerated mitochondrial substitution rates are a genome-wide phenomenon (Sloan et al., 2012a), but in many others, including S. vulgaris, they are restricted to a subset of genes (Sloan et al., 2009). Whole mitochondrial genome sequencing in Silene has also revealed exceptional variation in genome size, structure, and gene content among species (Sloan et al., 2010, 2012a). However, the necessary data have not been available to determine how these traits vary within Silene species on a genome-wide scale.

The evolution of mtDNA in Silene is of particular interest because of the prevalence of gynodioecy within the genus. This breeding system, which is characterized by the coexistence of both hermaphroditic and female (i.e. male-sterile) individuals within a population, has been observed in numerous Silene species and is thought to represent the ancestral state for the genus (Desfeux et al., 1996). In gynodioecious species, gender is often (but not always) determined by a combination of mitochondrial-encoded cytoplasmic male sterility (CMS) genes and nuclear restorers that counteract their effects (Schnable & Wise, 1998; Hanson & Bentolila, 2004). This is consistent with reciprocal crossing studies in Silene that have found evidence for complex cytonuclear interactions (Charlesworth & Laporte, 1998; Taylor et al., 2001; Garraud et al., 2011).

The bladder campion Silene vulgaris represents the most extensively studied example of gynodioecy in this genus. In contrast to species such as Beta vulgaris, there is as yet no evidence of a nonsterilizing cytoplasm in S. vulgaris that produces hermaphroditic individuals regardless of the nuclear background. In other words, it appears that all of the S. vulgaris mitochondrial genotypes examined to date are capable of inducing male sterility. Chimeric open reading frames (ORFs) have been implicated as the causal agents of CMS in a number of angiosperms (Schnable & Wise, 1998). Although the specific mechanisms for CMS or its restoration have not been determined in S. vulgaris, a chimeric mitochondrial ORF has been identified that shows higher transcript abundance in females than in hermaphrodites (Štorchová et al., 2012).

Here, we report the complete sequencing of four mitochondrial genomes from different populations of S. vulgaris. Our results provide the first genome-wide intraspecific comparison of the unusual patterns of mitochondrial DNA (mtDNA) evolution observed in Silene and reveal unprecedented intraspecific variation in plant mitochondrial genome content. Based on these findings, we discuss the nature of mtDNA structure in S. vulgaris and the relationship between CMS and mitochondrial genome evolution.

Materials and Methods

Source material and mtDNA extraction

Silene vulgaris Moench (Garcke) is a short-lived perennial that has a widespread distribution in its native Eurasia and has been introduced and become invasive in other regions, including North America (Taylor & Keller, 2007). For mitochondrial genome sequencing, we selected four S. vulgaris families (Table 1) that were known to harbor divergent atp1 haplotypes (Fig. 1). The mitochondrial genome sequence from one of these families (SD2) was reported previously (Sloan et al., 2012a). For each of the four samples, mtDNA was extracted from multiple plants from a single maternal family.

Figure 1.

Silene vulgaris population sampling. A bootstrapped maximum likelihood phylogenetic tree constructed with RAxML v7.0.4 (Stamatakis, 2006) using partial atp1 sequences from the four families analyzed in this study (in bold) and previously published sequences from 40 additional S. vulgaris populations (Sloan et al., 2008).

Table 1. Silene vulgaris collections used for mitochondrial genome sequencing
CodeLocation
KOVKovary Meadows (near Prague), Czech Republic
MTVMountain View, VA, USA
S9LSimmonsville, VA, USA
SD2Stuarts Draft, VA, USA

The mtDNA extraction methods used for SD2 and S9L families were described previously (Sloan et al., 2010, 2012a). Similar methods were used for mtDNA extraction from the KOV and MTV families (Leino et al., 2005). For each of these two families, c. 2 g of flower buds (1–3 mm) were ground in a mortar with 20 ml of grinding buffer (0.3 M mannitol, 50 mM Tris-HCl (pH 8.0), 5 mM EDTA, 0.5% BSA, 1% polyvinylpyrrolidine (PVP) and 20 mM cysteine). The suspension was filtered through two layers of Miracloth and centrifuged twice for 3 min at 8800 g (4°C). The resulting supernatant was centrifuged for 15 min at 12 000 g (4°C). The resulting pellet was resuspended with a fine paintbrush in 5 ml of DNase buffer (0.3 M mannitol, 50 mM Tris-HCl pH 8.0, 15 mM MgCl2), followed by incubation on ice for 90 min with DNase I (3 mg dissolved in 300 μl). The sample was brought up to a volume of 20 ml with wash buffer (0.3 M mannitol, 50 mM Tris-HCl (pH 8.0), 25 mM EDTA) and centrifuged again for 15 min at 12 000 g (4°C). The pellet was resuspended in 0.3 ml of wash buffer, and 50 μl of proteinase solution (1 mg proteinase K, 5 mM CaCl2, 50 mM Tris-HCl (pH 8.0)) was added followed by incubation for 15 min at 37°C to inactivate the DNase. Isolation of nucleic acids was performed immediately with the mitochondrial suspension using a sorbitol extraction protocol (Štorchová et al., 2000).

Mitochondrial genome sequencing, assembly, and annotation

The four mtDNA samples were used to construct 3kb paired-end libraries, and each was sequenced on one-quarter of a plate on a Roche 454 GS-FLX platform with Titanium reagents. Library construction and sequencing were performed at the University of Virginia's Genomics Core Facility for the SD2 and S9L families and at the DNA Sequencing Center at Brigham Young University for the KOV and MTV families. Genome assembly and annotation were performed as described previously (Sloan et al., 2012a). The three newly reported mitochondrial genomes (KOV, MTV, and S9L) were assembled solely with 454 data, which is known to have a high error rate in estimating the length of homopolymers. To ensure that these errors did not result in the misidentification of intact protein genes as pseudogenes, we used PCR and Sanger sequencing to check 10 cases of homopolymer-related frameshift mutations identified in our original assemblies. In every case, these were found to be 454 sequencing errors. Annotated genome sequences were deposited in GenBank (Supporting Information, Table S1).

Sequence analysis

Characterization of intergenic regions, repetitive sequences, and intragenomic recombinational activity were conducted as described previously (Sloan et al., 2012a). Briefly, intergenic sequences were extracted and searched against multiple BLAST databases to identify sequences with homology to plastid genomes, other plant mitochondrial genomes, and the full GenBank nr/nt database. Each mitochondrial genome was also searched against itself with NCBI-BLASTN to identify repeated sequences. The frequencies of alternative genome conformations resulting from repeat-mediated recombination were quantified by comparing the number of paired-end sequencing reads that mapped consistently to the genome assembly vs the number of conflicting paired-end reads that could be explained by recombination event between two repeats (Sloan et al., 2012a).

For all angiosperms with multiple available mitochondrial genomes, shared sequence content between each intraspecific pair of genomes was identified with BLASTN v2.2.24 + (-task blastn -dust no -gapopen 5 -gapextend 2 -reward 2 -penalty -3 -word_size 9; raw score cutoff of 70).

To identify chimeric ORFs (i.e. those containing fragments of other protein genes) in the S. vulgaris mitochondrial genomes, we extracted all ORFs longer than 300 bp from intergenic regions and searched them against a BLAST database of mitochondrial protein genes from S. vulgaris. We considered any ORFs containing one or more gene fragments of at least 30 bp to be chimeric. All identified chimeric ORFs were searched for potential transmembrane domains using TMHMM 2.0c (Krogh et al., 2001).

To identify single nucleotide polymorphisms (SNPs), protein-coding nucleotide sequences from all four genomes and a set of outgroups were aligned with MUSCLE v3.7 (Edgar, 2004). Unalignable 5′ and 3′ ends were trimmed. The numbers of synonymous and nonsynonymous polymorphic sites within S. vulgaris were determined with DnaSP v5 (Librado & Rozas, 2009). Relative rate tests were performed in MEGA v5 (Tamura et al., 2011) for the three most highly polymorphic genes (atp1, atp6, and cox1) as well as for a concatenated set of all other protein genes. These tests were performed on all six possible pairwise combinations of the four S. vulgaris haplotypes, using Silene latifolia as an outgroup. Significance was assessed using a Bonferroni-corrected P-value of 0.0083 (= 0.05/6).

Southern blot hybridizations

We performed multiple Southern blots to investigate genome structure and confirm patterns of repeat-mediated recombination inferred from paired-end sequence data. Blots were performed with total cellular DNA extracted from fresh tissue from individual plants, using either a modified CTAB protocol (Doyle & Doyle, 1987) for the SD2 and S9L families or a sorbitol extraction method (Štorchová et al., 2000) for the KOV and MTV families. Samples containing c. 3 μg of DNA were digested with either EcoRI-HF (New England BioLabs, Frankfurt, Germany), Sal I-HF (New England BioLabs), BamHI (MBI Fermentas, Vilnius, Lithuania), or HindIII (MBI Fermentas). Additional samples were treated with 10 units of PlasmidSafe exonuclease (Epicentre, Madison, WI, USA), which digests linear but not circular DNA molecules. Exonuclease reactions were performed in 1 × PlasmidSafe Buffer with 1 mM ATP in a total volume of 20 μl for 1 h at 37°C. DNA samples that had not been digested with any enzyme were also included for comparison in some experiments. Samples were electrophoresed overnight (16–18 h) at 1.5 V cm−1 on a 0.9% agarose gel and transferred to a positively charged nylon membrane (Roche) by capillary blotting. Probes targeting regions of interest were PCR-amplified from either genomic DNA or PCR products cloned with the Promega pGEM-T Easy vector (Table S2). Probes were labeled with digoxigenin (DIG) and hybridized as previously described (Sloan et al., 2010).

Results

Genome size and gene content

Each 454 sequencing run produced between 51 and 128 Mb of paired-end sequence data, providing deep coverage (> 50×) for all four mitochondrial genomes. The assembled genomes range in size from 361 to 429 kb with genome complexities ranging from 313 to 386 kb after excluding all but a single copy of any perfectly duplicated repeat sequences > 100 bp in length (Table 2; Fig. S1).

Table 2. Summary of four sequenced Silene vulgaris mitochondrial genomes
 KOVMTVS9LSD2
  1. a

    Duplicate genes are included in length and coverage statistics but excluded from reported counts.

  2. b

    Two of the plastid-derived tRNA genes in both KOV and SD2 are the results of recent transfers and are likely nonfunctional.

  3. c

    Intron lengths only include cis-spliced introns.

  4. d

    Excludes regions of plastid origin.

  5. e

    Excludes regions of plastid origin and regions conserved in other plant mitochondrial genomes.G + C, guanine + cytosine.

Genome size (kb)361429422427
Genome complexity (kb)313384379386
Circular chromosomes (autonomous)6 (4)4 (2)7 (4)4 (1)
% G + C content42.041.842.041.8
Protein genesa25242625
tRNA genesa6446
Native3333
Plastid-derived3b113b
rRNA genesa3333
Introns19191919
 cis-spliced13131313
 trans-spliced6666
Genic content (kb) (% coverage)a53 (14.7)49 (11.5)49 (11.6)48 (11.2)
Exonic36 (9.9)32 (7.5)32 (7.6)31 (7.2)
Intronicc17 (4.7)17 (4.0)17 (4.1)17 (4.0)
Intergenic content (kb) (% coverage)308 (85.3)380 (88.5)373 (88.4)379 (88.8)
Plastid-derived26 (7.3)5 (1.1)3 (0.6)10 (2.3)
Conserved with other plant mtDNAd59 (16.3)77 (18.0)78 (18.4)75 (17.5)
Conserved with GenBank nr/nte7 (1.9)6 (1.5)5 (1.1)5 (1.1)
Uncharacterized216 (59.8)292 (67.9)287 (68.3)289 (67.9)
Repetitive content (kb) (% coverage)101 (28.0)101 (23.5)99 (23.3)80 (18.8)
Large repeats, > 1 kb87 (24.0)80 (18.7)76 (18.0)57 (13.3)
Small repeats, ≤ 1 kb15 (4.0)20 (4.8)23 (5.3)23 (5.5)

All four genomes share a core set of 24 protein genes that are almost universally conserved among angiosperm mitochondrial genomes, including the maturase matR, the transport membrane protein mttB, four genes involved in cytochrome C biogenesis, and 18 subunits of the oxidative phosphorylation and ATP synthesis complexes I, III, IV, and V (Adams et al., 2002b). By contrast, the genomes are largely devoid of 17 additional protein genes (15 ribosomal proteins and two complex II subunits) that were present in the common ancestor of angiosperms but have been lost independently in several different angiosperm lineages (Adams et al., 2002b; Mower & Bonen, 2009). Of these 17 genes, only rpl5 and rps13 are present and intact in any of the sequenced genomes, and neither is conserved in all four. Both of these ribosomal protein genes are present in the S9L mitochondrial genome. By contrast, rpl5 is entirely absent from the MTV and SD2 genomes, and rps13 appears to be a pseudogene in the KOV and MTV genomes. In the MTV genome, the rps13 coding sequence has been disrupted by a nonsense point mutation that introduces a stop codon at amino acid position 23, whereas the KOV copy has been fragmented into two pieces located on different chromosomes. Attempts to amplify a full-length rps13 copy from KOV cDNA failed, suggesting that the gene has not acquired a novel trans-splicing intron. In addition to rpl5 and rps13, some genomes contained pseudogene fragments of rps3 (S9L) and rps14 (KOV and S9L).

Silene mitochondrial genomes generally contain a reduced set of tRNAs relative to other angiosperms (Sloan et al., 2012a). The four S. vulgaris mitochondrial genomes share a core set of only four tRNAs: trnE-TTC, trnI-CAT, trnY-GTA, and a plastid-derived copy of trnW-CCA. The KOV and SD2 genomes also contain additional intact tRNA genes of plastid origin, but these appear to be the result of very recent DNA transfers and may not be functional within the mitochondria. All four S. vulgaris genomes share the typical complement of three rRNA genes (rrn5, rrn18, and rrn26).

Intergenic content

Intergenic sequences constitute > 85% of each of the four S. vulgaris mitochondrial genomes (Table 2). Comparisons among these genomes found that each shares only about half of its total sequence content with any other genome (Table 3), suggesting a rapid rate of turnover in mitochondrial intergenic sequence within S. vulgaris. This pattern contrasts with the findings from all other angiosperms analyzed to date. All intraspecific genome pairs from five other species (Arabidopsis thaliana, B. vulgaris, Brassica napus, Oryza sativa, and Zea mays) share > 80% of total sequence content, with the majority sharing > 95% (Table S3). Interestingly, after S. vulgaris, the second highest amounts of intraspecific variation were observed in B. vulgaris, another species with a high frequency of CMS in natural populations (Table S3; Darracq et al., 2011).

Table 3. Shared sequence content between pairs of Silene vulgaris mitochondrial genomes
 KOV (%)MTV (%)S9L (%)SD2 (%)Any (%)All (%)
  1. The values in each cell represent the percentage of the genome in the corresponding row that is present in the genome in the corresponding column. The matrix is asymmetrical because the four genomes differ in size and repetitiveness. The column labeled ‘Any’ reports the percentage of each genome that was found in any of the other three genomes. The ‘All’ column reports the percentage of each genome shared by all four genomes.

KOV54.256.453.174.935.7
MTV44.259.858.280.029.0
S9L47.161.154.981.629.4
SD244.358.254.177.329.1

The bulk of the intergenic content in S. vulgaris shows little similarity to mtDNA sequences from other angiosperms (Table 2), and detectable homology declines rapidly with increasing phylogenetic distance from S. vulgaris (Fig. 2). Although the origins of most of the mitochondrial intergenic sequence are uncertain, some regions are clearly derived from the nuclear or plastid genomes. The plastid-derived sequence within the mitochondrial genomes provides a good illustration of the intraspecific variation in intergenic sequence content. The total amount of plastid-derived sequence varies 10-fold, ranging from 2.6 kb in S9L to 26.4 kb in KOV. Except for the small region associated with the ancient functional transfer of trnW-CCA to the mitochondrial genome, plastid-derived sequences appear to be of very recent origin based on a high degree of nucleotide identity with the S. vulgaris plastid genome (Sloan et al., 2012b). Furthermore, most plastid-derived regions are unique to a single genome, suggesting that S. vulgaris mtDNA experiences frequent acquisition and rapid turnover of sequence from other genomic compartments.

Figure 2.

Conservation of intergenic sequence in Silene vulgaris mitochondrial DNAs (mtDNAs) compared with all other completely sequenced seed plant mitochondrial genomes.

As typically observed in angiosperm mtDNA, each of the four S. vulgaris mitochondrial genomes contains dozens of ORFs in intergenic regions. Most of these are not conserved among related species and do not exhibit homology to any known protein genes, suggesting that they are nonfunctional. We found only three intergenic ORFs (> 300 bp) that are present in all four S. vulgaris genomes. These include a 318 bp ORF without any identifiable homologs in the GenBank nr database, a 372 bp ORF with similarity to nuclear transposable elements in angiosperms, and an ORF of variable length (351–456 bp) with similarity to ORFs in the mitochondrial genomes of S. latifolia (Sloan et al., 2010) and Beta spp. (Darracq et al., 2011). Although, in this last case, the sequence is conserved in multiple species, the reading frame has been disrupted by frameshift indels, suggesting that it is not functional as a protein-coding sequence. This ORF is located c. 400 bp upstream of the cox3 start codon. Such ‘conserved syntenic’ regions are commonly found flanking functional genes and may play a role in gene regulation and expression (Alverson et al., 2010).

Chimeric ORFs consisting of fragments of other mitochondrial protein genes (particularly ATP synthase genes) are often involved in CMS (Schnable & Wise, 1998; Hanson & Bentolila, 2004). A chimeric mitochondrial ORF containing fragments of atp1 and cox2 was recently reported in S. vulgaris (Štorchová et al., 2012). We performed a genome-wide search for mitochondrial chimeric ORFs (Fig. S2) and found that only one of the four S. vulgaris genomes (MTV) contains the previously reported chimera. The MTV genome also contains an 876 bp ORF with short fragments of cob and atp9 at its 5′ end. The S9L genome contains three chimeric ORFs, including a copy of the 876 bp ORF found in the MTV genome. The MTV and S9L copies of this ORF are identical in length but differ by five SNPs. The other two chimeric ORFs in the S9L genome (363 and 462 bp in length) are not found in any of the other three genomes, and each contains a fragment of an ATP synthase gene (atp1 or atp6). The SD2 genome contains only a single 534 bp chimeric ORF, which is the result of a partial duplication of nad4 at one end of a large (3.2 kb) repeat. The ORF includes the first 446 bp of this gene followed by a sequence of unidentifiable origin. With the exception of the 363 bp ORF in the S9L genome, all of the identified chimeric ORFs are predicted to contain one or more transmembrane domains, a common characteristic of CMS genes in other angiosperms (Hanson & Bentolila, 2004). The KOV genome lacks any chimeric ORFs that match our search criteria.

In addition to these chimeric ORFs, the MTV and SD2 genomes each contain an ORF of c. 3 kb in length with sequence similarity to group B DNA polymerases, and the KOV, MTV, and SD2 genomes each contain an ORF ranging from 336 to 3162 bp in length with sequence similarity to a DNA-directed RNA polymerase. Both of these polymerases exhibit homology to linear mitochondrial plasmids found in plants, fungi, and other eukaryotes (Robison & Wolyn, 2005; Shutt & Gray, 2006; Goremykin et al., 2009; Sloan et al., 2010). In addition, MTV and S9L share identical 309 bp ORFs with sequence similarity to mitovirus-derived RNA polymerases found in other angiosperm mitochondrial genomes (e.g. Alverson et al., 2011).

Genome structure: rearrangements, repeats, and recombination

Despite frequent structural rearrangements in angiosperm mitochondrial genomes, large regions of conserved synteny are typically observed within species (e.g. Allen et al., 2007; Bentolila & Stefanov, 2012). By contrast, the four S. vulgaris mitochondrial genomes are so rearranged that they have almost no conserved synteny beyond the level of individual genes (Figs 3, S3).

Figure 3.

Lack of conserved synteny in Silene vulgaris mitochondrial genomes. Colored blocks represent regions of conserved synteny in the S9L and SD2 genomes, with the height of the colored bars indicating the degree of sequence similarity. Sequences in inverted orientation in S9L (relative to SD2) are shown below the line. Vertical black lines demarcate the boundaries between chromosomes. The figure was generated with MAUVE v2.3.1, using an LCB weight of 750 (Darling et al., 2010). A MAUVE alignment of all four genomes is provided (Supporting Information Fig. S3).

The highly repetitive nature of the S. vulgaris mitochondrial genomes likely contributes to the high frequency of rearrangements in these genomes. Large repeats (> 1 kb) represent between 13.3 and 24.0% of total mtDNA sequence, with each genome containing numerous repeat families, many of which contain more than two copies. Repeats of this size undergo high-frequency intragenomic recombination in plant mitochondria, resulting in the coexistence of multiple alternative genome conformations at relatively equal frequencies (Palmer & Shields, 1984; Sloan et al., 2010; Arrieta-Montiel & Mackenzie, 2011; but see Mower et al., 2012a).

In contrast to the high level of recombinational activity in large repeats, recombination involving repeats of only a few hundred bp or fewer appears to be rare in angiosperm mitochondria (Arrieta-Montiel & Mackenzie, 2011). Furthermore, when recombination has been observed between small repeats, it has typically been asymmetric, resulting in the accumulation of only one of the two possible recombination products (Davila et al., 2011). However, we found evidence that many of the small repeats in S. vulgaris undergo recombination, often in a symmetrical fashion. Conflicts in the mapping positions of paired-end sequence data can be used to quantify the frequency of alternative genome conformations resulting from repeat-mediated recombination (Alverson et al., 2011; Mower et al., 2012a; Sloan et al., 2012a). We found numerous examples of paired-end conflicts that were consistent with recombination between repeats of < 250 bp in length, and in most cases, we identified reads supporting both recombination products (Table S4). A Southern blot analysis of one of these small repeats confirmed the presence of both low-frequency recombination products (Fig. 4). By contrast, analysis of a pair of 1.0 kb repeats found that the alternative products coexist at relatively equal abundance as expected for large repeats experiencing high-frequency recombination (Fig. S4).

Figure 4.

Reciprocal recombination between a pair of small repeats. The left portion of the figure summarizes the experimental design and the predicted restriction fragments associated with recombination across a pair of 216 bp repeats in the Silene vulgaris MTV genome (red boxes). Numbers in parentheses indicate the amount of unique read pairs supporting the corresponding conformation. The right portion of the figure shows the results of Southern blot hybridizations, confirming the presence of the reference genome conformation (dark bands) as well as the recombinant products (faint bands) in total cellular DNA extracted from floral bud tissue from three different individuals from the MTV maternal family. The same membrane was probed with sequences corresponding to both regions flanking the repeat to confirm the presence of both reciprocal recombination products. The left and right probes were designed to hybridize to the recombinant 1 and recombinant 2 fragments, respectively. Both probes also hybridize to the reference 1 fragment.

Genome structure: multichromosomal organization

Unlike most angiosperms, some Silene species have multiple autonomous-mapping mitochondrial chromosomes that exhibit little or no evidence of recombination with each other (Sloan et al., 2012a). Similar results have also been found for the mitochondrial genome of cucumber (Alverson et al., 2011). All four S. vulgaris mitochondrial genomes contain multiple chromosomes, though they differ in number and sequence content. In each case, most (> 75%) of the mitochondrial genome can be assembled into a main chromosome, which contains numerous recombinationally active repeats and is analogous to the ‘master circle’ typically reported for plant mitochondrial genome assemblies. The remaining genome content assembles into distinct chromosomes that can be mapped as circles. These can be divided into two categories, autonomous and nonautonomous (Table S1). The nonautonomous chromosomes have repeated sequences that exhibit evidence of substantial amounts of recombination with duplicate copies in the main chromosome. However, we report these as separate chromosomes, because paired-end sequence data indicate that the independent subcircle conformation is numerically dominant over the larger, integrated conformation. By contrast, autonomous chromosomes share only very short repeats with the main chromosome and exhibit evidence of only rare recombination.

All four S. vulgaris mitochondrial genomes share one homologous autonomous chromosome. This is the smallest chromosome in each genome and varies in size from 4.9 to 6.5 kb as a result of large indels. The chromosome does not contain any intact genes, and the only identifiable elements are small, duplicated fragments of the large subunit rRNA (rrn26 ). This small chromosome is the only autonomous chromosome in the SD2 genome, but the other genomes contain as many as three additional autonomous chromosomes, ranging in size from 8.8 to 27.2 kb. Unlike the smallest chromosome, none of these is homologous with autonomous chromosomes in other genomes, and most of them contain intact genes.

To provide greater insight into the structure of autonomous chromosomes, we used a set of Southern blot experiments targeting the small chromosome present in all four genomes. When DNA was digested with BamHI (a restriction enzyme predicted to cut the small chromosome once), each sample produced a single band consistent with the expected linearized size of the chromosome (Fig. 5).

Figure 5.

Small chromosome structure. Southern blot hybridizations probed with a sequence conserved in the smallest chromosome (which ranges in size from 4.9 to 6.5 kb) of all four Silene vulgaris genomes. Total cellular DNAs were extracted from leaf tissue for each of the four genome families. (a) Comparison of DNA samples digested with either BamHI (predicted to cut small chromosome once) or SalI (not predicted to cut small chromosome) or not digested at all. The size of the smallest chromosome from each genome is indicated in parentheses. (b) Comparison of DNA samples treated with an exonuclease that digests linear but not circular DNA with samples that were not digested at all. The black arrow indicates the band corresponding to the main population of molecules recovered from DNA extractions (see the 'Results' section).

By contrast, when DNA was digested with an enzyme that lacks any recognition sites within this chromosome (SalI) or not digested at all, a much more complicated pattern of hybridization was observed, indicating the coexistence of multiple alternative forms of these chromosomes. Each sample contained a band migrating faster than the linearized fragment observed after BamHI digestion (Fig. 5a). The rate of gel migration for these bands was proportional to the chromosome size from the corresponding genome assembly, suggesting that they represent the predicted supercoiled circular form of the chromosome. Each sample also contained an additional band running at a proportional but slower rate relative to the BamHI fragment (Fig. 5a). This band likely represented either a supercoiled circular dimer or a nicked/relaxed form of the circular monomer. Each hybridization also revealed multiple bands of much higher molecular weight, most likely representing higher order concatamers (Fig. 5). Samples from all four families shared one strong band migrating at a rate that was independent of the monomer chromosome size and corresponded to the main population of molecules recovered from total cellular DNA extractions. Exonuclease treatment resulted in preferential digestion of this band (Fig. 5b), suggesting that it consisted largely of linear molecules. Each sample also contained multiple bands that ran above this level and appeared to shift in proportion to the monomer chromosome size (Fig. 5). These bands showed little or no effect of exonuclease treatment, suggesting that they were circular or otherwise resistant to exonuclease digestion (Fig. 5b).

Nucleotide polymorphism

Silene vulgaris mitochondrial genes vary substantially in their degree of nucleotide polymorphism (Fig. 6). At one extreme, atp1 contains a total of 64 SNPs (4.2% of all sites) across the four genomes. At the other extreme, one-quarter of the protein genes contain no SNPs at all. Overall, of the 144 SNPs found in protein genes, 104 (72.2%) are in one of three genes (atp1, atp6, and cox1) that account for only 15% of the total coding sequence (Fig. 6).

Figure 6.

Variation in synonymous and nonsynonymous single nucleotide polymorphisms (SNPs) among Silene vulgaris mitochondrial protein genes; each point represents a different gene. The number of polymorphic (i.e. segregating) sites is expressed per synonymous or nonsynonymous site as estimated by DnaSP.

In these highly polymorphic genes, the accumulation of nucleotide substitutions appears to have been clustered in certain haplotypes, with the S9L genome containing the most highly divergent sequence for all three genes. For each gene, relative rate tests showed that the S9L haplotype has experienced significantly more nucleotide substitutions than at least one of the other haplotypes. However, the increased sequence divergence in S9L does not appear to be a genome-wide phenomenon. In a relative rate analysis of all other protein genes (concatenated), S9L did not differ significantly from any of the other genomes.

The rate heterogeneity among S. vulgaris atp1 lineages (Fig. 1) has been described previously (Sloan et al., 2008). The asymmetry in atp6 is also particularly striking. The S9L atp6 haplotype differs by 16 SNPs relative to the other three haplotypes, which are all identical to each other. A comparison with multiple outgroups (S. latifolia, Silene conica, B. vulgaris, and Vitis vinifera) suggests that the KOV/MTV/SD2 haplotype is the ancestral state (or very close to it) with most or all of the 16 differences reflecting changes that have occurred in the S9L haplotype (Fig. 7). Interestingly, however, eight of the SNPs are clustered in a single 16 bp region, and all eight are shared with another species, Silene noctiflora, that is otherwise highly divergent from S. vulgaris because of recent and dramatic accelerations in its mitochondrial substitution rate (Mower et al., 2007; Sloan et al., 2012a).

Figure 7.

Nucleotide polymorphism in atp6. All 16 sites that are variable among the four Silene vulgaris genomes are shown. Dots in the alignment indicate sequence identity with the S. vulgaris KOV/MTV/S9L haplotype. The red text highlights a 16 bp region with eight single nucleotide polymorphisms (SNPs) that are shared between S. vulgaris S9L and Silene noctiflora.

Discussion

Intraspecific mtDNA variation in a gynodioecious angiosperm

The exceptional intraspecific variation in mitochondrial genome structure, content, and nucleotide sequence raises questions about the relationship between this diversity and the prevalence of CMS in S. vulgaris. Collectively, the four sequenced genomes contain a handful of chimeric ORFs. Given that expression of such ORFs is often associated with male sterility (Schnable & Wise, 1998), they represent promising candidates for future determination of CMS mechanisms in this species. Notably, one of the genomes (KOV) does not contain any chimeric ORFs > 300 bp in length, suggesting that the potential for other CMS mechanisms in S. vulgaris should also be considered.

One intriguing possibility is that the observed mitochondrial diversity in S. vulgaris may be a source of cytonuclear incompatibilities or even novel CMS mechanisms. For example, all four S. vulgaris mitochondrial genomes have a different complement of intact protein genes, and the effect of mitochondrial gene loss could vary depending on the nuclear background.

Mitochondrial genes can become redundant in the presence of nuclear-encoded homologs that evolve either by direct mtDNA transfer to the nucleus (Adams et al., 2000) or by duplication and retargeting of a pre-existing nuclear homolog (Adams et al., 2002a). Although functional replacement by a nuclear gene can lead to the eventual loss of the original mitochondrial copy, there may be a transition period in which both homologs are retained and expressed (Sandoval et al., 2004; Choi et al., 2006). The coexistence of mitochondrial and nuclear homologs creates an opportunity for coevolution between the two genomic compartments.

The pseudogenization of rps13 in two of the S. vulgaris mitochondrial genomes appears to be the result of an ongoing functional transfer to the nucleus and so may provide an example of this cytonuclear coevolution. A previous transcriptome analysis in S. vulgaris identified transcripts corresponding to a novel mitochondrial-like rps13, in addition to evidence for transcription and RNA editing of the mitochondrial-encoded copy of this gene (Sloan et al., 2012c; D. B. Sloan, unpublished). The novel copy is highly similar to its mitochondrial-encoded counterpart (93% nucleotide identity) and is presumably located in the nucleus. Its predicted protein product contains an extended N-terminal region, identified as a potential mitochondrial targeting peptide by multiple prediction algorithms (Claros & Vincens, 1996; Emanuelsson et al., 2000; Small et al., 2004; Hoglund et al., 2006). It is possible that a mitochondrial genome lacking an intact copy of rps13 may experience impaired function in a nuclear background in which the recently transferred copy of rps13 is not highly expressed or properly targeted.

In contrast to the apparent mitochondrion-to-nucleus transfer of rps13, transcriptome sequencing did not find evidence of a mitochondrial-like copy of rpl5 in the nucleus despite the fact that this gene is entirely absent from two of the sequenced S. vulgaris mitochondrial genomes. However, transcriptome data from the S. vulgaris KOV family did confirm that, when present in the mitochondrial genome, this gene is transcribed and C-to-U edited at predicted RNA editing sites (K. Müller & H. Štorchová, unpublished). In addition, S. vulgaris transcripts have been identified that correspond to three divergent paralogous copies of the nuclear-encoded cytosolic counterpart of this ribosomal protein, as well as a single nuclear-encoded, plastid-like homolog (Sloan et al., 2012c; D. B. Sloan, unpublished). Therefore, rpl5 may be in the process of being replaced by a duplicated copy of an ancient nuclear homolog, as has been observed in some other cases of plant mitochondrial gene loss (Adams et al., 2002a). Interestingly, the rpl5 gene is also absent from the K-type male-sterile cytoplasm of wheat (Triticum aestivum) but present in its male-fertile counterparts (Liu et al., 2011). This polymorphism reflects the artificial introduction of a foreign cytoplasm (from Aegilops kotschiyi) in the CMS line. The identification of natural intraspecific variation in mitochondrial gene content within S. vulgaris makes this species a potentially valuable system for future investigations of cytonuclear coevolution during the process of organelle gene loss.

Our results also extend earlier population genetic studies that have found surprisingly high degrees of nucleotide polymorphism in Silene mitochondrial genes, particularly in gynodioecious species (Stadler & Delph, 2002; Houliston & Olson, 2006; Barr et al., 2007; Sloan et al., 2008; Touzet & Delph, 2009). A full understanding of the patterns of nucleotide polymorphism in these genomes must take into account the complex recombinational mechanisms that are active in plant mtDNA (Stadler & Delph, 2002; McCauley & Ellis, 2008; Hao et al., 2010; Arrieta-Montiel & Mackenzie, 2011). For example, although the exact origin of the 16 bp region in atp6 with eight shared SNPs between S. noctiflora and S. vulgaris S9L is not clear (Fig. 7), the clustering of substitutions in such a small region suggests the involvement of a recombinational mechanism rather than numerous independent point mutations. This small region represents half of the SNPs in the highly polymorphic atp6 gene, supporting the conclusion that gene conversion with (sometimes distantly) related sequences may account for a significant fraction of nucleotide sequence evolution in plant mitochondrial genomes (Hao et al., 2010; Mower et al., 2010). Notably, these genomes contain small, divergent gene fragments that are sometimes found as components of chimeric ORFs and may serve as sources of nucleotide substitutions resulting from rare gene conversion events with full-length gene copies (Štorchová et al., 2012).

Although the degree of mitochondrial polymorphism in Silene has often been interpreted as a consequence of gynodioecy and selection on CMS elements (Ingvarsson & Taylor, 2002; Stadler & Delph, 2002; Touzet & Delph, 2009), the potential role of this variation as a cause of CMS should also be considered (Darracq et al., 2011). Some species may be gynodioecious because their mitochondrial genomes are so variable, rather than the other way around. This idea is consistent with recent findings in Beta vulgaris of higher rates of mitochondrial sequence and structural evolution in sterilizing relative to nonsterilizing cytoplasms (Darracq et al., 2011). High rates of genome rearrangement are a likely source of novel chimeric ORFs (Schnable & Wise, 1998; Arrieta-Montiel & Mackenzie, 2011) and expression patterns (Elansary et al., 2010), which may be important drivers in the evolution of CMS.

Angiosperm mitochondrial genome structure

The in vivo structure of angiosperm mitochondrial genomes has been a long-term source of interest and uncertainty. Although plant mtDNA is known to undergo frequent intragenomic recombination that could generate a theoretically infinite set of alternative multimeric and subgenomic forms, assemblies are conventionally reported as a single ‘master circle’ conformation, representing the complete sequence content. This approach is supported by the fact that angiosperm mitochondrial genomes typically map as circles and evidence that they can exist in a supercoiled form (Palmer, 1988). However, attempts to observe the in vivo structure of these molecules more directly have indicated that they exist as a mixture of different forms dominated by linear and branched molecules (Bendich, 1996; Backert & Borner, 2000).

The small size of the autonomous-mapping chromosomes identified in S. vulgaris mtDNA makes them particularly amenable to structural analysis. The smallest of these chromosomes is present in all four of the genomes analyzed in this study. While we confirmed the existence of the small circular chromosome structure predicted from genome assembly, we found that most copies exist in a variety of higher-molecular-weight forms, including what appears to be a large linear population. However, because this population comigrates with the dominant size class of genomic fragments from our total cellular DNA preparations, we cannot determine whether the linear fragments exist in vivo or simply represent fragments of larger molecules that were broken during DNA extraction. Regardless, our results clearly demonstrate the coexistence of both monomeric and multimeric chromosomal forms. Further work will be needed to assess whether these alternative structures simply interconvert as a byproduct of mitochondrial recombination activity or whether their existence reflects a role in different functional processes such as DNA replication or gene expression.

In addition to the presence of small autonomous-mapping chromosomes, the highly repetitive and recombinational nature of the S. vulgaris mitochondrial genomes further strains the conventional ‘master circle’ representation of plant mitochondrial genomes. The master circle was originally devised in the context of relatively simple genome structures, many of which contained only a single pair of recombining repeats (e.g. Palmer & Shields, 1984). We found that the S. vulgaris mitochondrial genomes contain numerous repeats, and deep sequencing and Southern blots confirmed active recombination even between pairs of short repeats (Fig. 4; Table S4). This unusually high level of recombinational activity likely contributes to the extensive genome rearrangement observed in this species.

With the exception of the separate autonomous and nonautonomous chromosomes described earlier, we generally followed the master-circle convention and reported the major fraction of the genome as a single, circular chromosome despite the presence of numerous recombining repeats within this sequence and evidence of alternative structures (Fig. S5). It is important to emphasize, however, that these mapping constructs are not necessarily representative of in vivo structure (Bendich, 1993). As recently demonstrated with the deep-sequencing analysis of the Mimulus guttatus mitochondrial genome, the existence of a circular chromosome as the major form of mtDNA may be unlikely even in genomes for which sequenced contigs can be connected into a single circular assembly (Mower et al., 2012a).

Although the broader discipline of genomics is largely accustomed to the concept of a single genome, it is clear that plant mtDNA exists as a population of multiple alternative genome structures. In some cases of extreme recombinational activity in plant mtDNA, circular representations have been abandoned altogether in favor of a stylized network in which sequences are linked by multiple alternative connections (Hecht et al., 2011). As the availability of deep-sequencing data grows, and increasingly complex examples of plant mitochondrial genome organization are identified, an important challenge will be to develop tools capable of analyzing, interpreting, and reporting these data in a network-based context that better reflects their biological reality and captures the full information content of the genome assemblies. Such tools should aid in further elucidating the mechanisms responsible for rapid structural evolution in these fascinatingly complex organelle genomes.

Acknowledgements

We thank Laura Bergner, Kateřina Haškovcová, and Ludmila Busínská for laboratory assistance, Peter Fields for providing the S9L seed collection, Craig Coleman for advice on 454 sequencing, John Chuckalovcak for 454 library construction and sequencing, Brian Sanderson for help with sequence analysis, Jeff Palmer for helpful discussion regarding our Southern blot data, and Andy Alverson for comments on an early version of this manuscript. We are also grateful to Janis Antonovics, who (long ago) suggested to us the hypothesis that intermediate stages of intracellular gene transfer could be a source of CMS. This work was supported by the National Science Foundation (DEB-1050331, MCB-1022128), the Grant Agency of the Czech Republic (GAČR 521/09/0261) and the Ministry of Education, Youth and Sports of the Czech Republic (MŠMT LC06004 and MŠMT Kontakt ME09035). D.R.T. and H.S. supervised this work equally.

Ancillary