Shifts in the evolutionary rate and intensity of purifying selection between two Brassica genomes revealed by analyses of orthologous transposons and relics of a whole genome triplication

Authors


For correspondence (e-mail liusy@oilcrops.cn or maj@purdue.edu).

Summary

Recent sequencing of the Brassica rapa and Brassica oleracea genomes revealed extremely contrasting genomic features such as the abundance and distribution of transposable elements between the two genomes. However, whether and how these structural differentiations may have influenced the evolutionary rates of the two genomes since their split from a common ancestor are unknown. Here, we investigated and compared the rates of nucleotide substitution between two long terminal repeats (LTRs) of individual orthologous LTR-retrotransposons, the rates of synonymous and non-synonymous substitution among triplicated genes retained in both genomes from a shared whole genome triplication event, and the rates of genetic recombination estimated/deduced by the comparison of physical and genetic distances along chromosomes and ratios of solo LTRs to intact elements. Overall, LTR sequences and genic sequences showed more rapid nucleotide substitution in B. rapa than in B. oleracea. Synonymous substitution of triplicated genes retained from a shared whole genome triplication was detected at higher rates in B. rapa than in B. oleracea. Interestingly, non-synonymous substitution was observed at lower rates in the former than in the latter, indicating shifted densities of purifying selection between the two genomes. In addition to evolutionary asymmetry, orthologous genes differentially regulated and/or disrupted by transposable elements between the two genomes were also characterized. Our analyses suggest that local genomic and epigenomic features, such as recombination rates and chromatin dynamics reshaped by independent proliferation of transposable elements and elimination between the two genomes, are perhaps partially the causes and partially the outcomes of the observed inter-specific asymmetric evolution.

Introduction

Brassica is one of the most economically important genera within the Brassicaceae family (Labana and Gupta, 1993). This genus consists of approximately 100 species including six representative ones–allotetraploid species Brassica carinata (BBCC, 2n = 34), Brassica juncea (AABB, 2n = 36) and Brassica napus (AACC, 2n = 38), and their diploid progenitor species Brassica nigra (BB, 2n = 16), Brassica oleracea (CC, 2n = 18) and Brassica rapa (AA, 2n = 20) (Beilstein et al., 2006). The origins and relationships of these species were well illustrated by the ‘triangle of U’ (Nagaharu, 1935). A large number of vegetable and oilseed crops were developed from these six species and are being grown worldwide, representing an important source of the world's vegetable and vegetable oil production (Labana and Gupta, 1993). Due to their agronomic importance, these Brassica species have attracted tremendous attention from researchers, and have recently been used as a model system to study genome evolution such as genome instability (Song et al., 1995; Ge et al., 2009), genome size variation (Johnston et al., 2005) and evolutionary dynamics and the consequences of layers of whole genome duplication (WGD) events, including the whole genome triplication (WGT) event that occurred in the Brassica lineage about 13–17 million years ago (Ma), after its divergence from Arabidopsis thaliana about 20 Ma (Lysak et al., 2005, 2007; Town et al., 2006; Yang et al., 2006; Mun et al., 2009).

It was suggested that B. rapa and B. oleracea diverged from a common ancestor approximately 3.75 Ma (Inaba and Nishio, 2002), and merged together through hybridization, followed by genome doubling, to form the allotetraploid species B. napus probably <10 000 years ago (Cheung et al., 2009). Comparative sequence analyses of homeologous segments from the A and C genomes of B. napus, B. rapa and B. oleracea revealed nearly perfect microcollinearity between the two genomes and overall expansion of the C genome segments relative to their A genome counterparts (Cheung et al., 2009). The relative expansion seemed to be primarily caused by differential accumulation of transposable elements (TEs), particularly long terminal repeat retrotransposons (LTR-RTs) in the orthologous homeologous regions investigated (Cheung et al., 2009).

Recent sequencing of the B. rapa and B. oleracea genomes has provided a comprehensive set of genomic data and information extremely valuable to the research community (Wang et al., 2011a; http://ocri-genomics.org/bolbase). One of the most striking observations in the two genomes was perhaps the substantial size difference between the syntenic regions of these two genomes: about 155 Mb of assembled DNA in the A genome versus about 260 Mb in the C genome (Wang et al., 2011a; http://ocri-genomics.org/bolbase). In these regions, about 26 Mb of TEs in B. rapa versus about 88 Mb of TEs in B. oleracea were identified. These TEs were able to explain the partial size difference between these syntenic regions (http://ocri-genomics.org/bolbase). However, whether and how these structural differentiations may have influenced the evolutionary rates of the two genomes since their split from a common ancestor is not clear.

Here we analyzed the rates of nucleotide substitution between two long LTRs of individual orthologous LTR-RTs, the rates of synonymous substitution (Ks) and non-synonymous substitution (Ka) among triplicated genes retained in both the A and C genomes from the shared WGT event, and ratios of solo LTRs to intact elements, and the rates of genetic recombination estimated/deduced by comparison of physical and genetic distances along chromosomes. These analyses demonstrated the evolutionary asymmetry of these two genomes and potential causes of this. We also characterized transcriptional alteration of TE junction sequences using RNA sequencing data from B. rapa and B. oleracea and identified a set of orthologous genes differentially disrupted by TEs between the two genomes. Of the 337 genes harboring TE insertions identified in this study, 231 (68.5%) were not annotated or were mis-annotated in previous reports (Wang et al., 2011a; http://ocri-genomics.org/bolbase).

Results

Contrasting genomic features: transposable elements as major players in genome differentiation

To better understand the impact of TEs on the structural variation between B. rapa and B. oleracea, we re-annotated TEs in the assembled B. rapa genome sequences (Wang et al., 2011a) and annotated TEs in the assembled B. oleracea genome sequences (http://ocri-genomics.org/bolbase) using the same methods as described earlier (Du et al., 2010a). The TEs with clear boundaries identified in this study are listed in Table S1 in the Supporting Information. These include a total of 1332 retrotransposons and 3270 DNA transposons in the approximately 280 Mb of assembled sequences of B. rapa, and a total of 5107 retrotransposons and 8275 DNA transposons in the approximately 540 Mb of assembled sequences of B. oleracea (http://ocri-genomics.org/bolbase). These elements were categorized into LTR-RTs (copia, gypsy and unclassified LTR-RTs), non-LTR-RTs (long and short interspersed nuclear elements–LINEs and SINEs, respectively) and nine superfamilies of DNA transposons (Tc1/Mariner, hAT, Mutator, PIF/Harbinger, Pong, CACTA, MITE/Stowaway, MITE/Tourist and Helitron) (Holligan et al., 2006; Wicker et al., 2007) using the methods previously reported. The LTR-RTs and CACTA superfamily were found to be most abundant in both B. rapa and B. oleracea and were further classified into families following the criteria previously described (Wicker et al., 2003; Holligan et al., 2006). The densities of either retrotransposons or DNA TEs identified in the B. rapa sequences were significantly lower than in the B. oleracea sequences (< 0.01, Fisher's exact test) (Table S1). By contrast, the gene densities in the B. rapa genome sequences (145 genes per Mb of DNA; Wang et al., 2011a) were significantly higher than identified in the B. oleracea genome sequences (85 genes per Mb of DNA; http://ocri-genomics.org/bolbase) (< 0.01, Fisher's exact test).

The proportions of TEs shared by B. rapa and B. oleracea were estimated by sequence comparisons between the two genomes. In the comparisons, we used TE-flanking sequence junction fragments (TE junction sites) from one species as query sequences to search against genomic sequences from the other species to determine whether the junction sites were shared by the two species, a method previously described in several studies (Ma and Bennetzen, 2004; Devos et al., 2005; Tian et al., 2009, 2012). As this method was based on the uniqueness of each of the TE junction sites in the whole genomes, TE junction sites with two or multiple identical or highly identical copies in either of the two Brassica genomes were excluded from this analysis (Figure S1; see 'Experimental Procedures').

Out of the examined 906 retrotransposons and 3084 DNA transposons in the assembled B. rapa sequences, and 4061 retrotransposons and 7810 DNA transposons in the assembled B. oleracea genome sequences, 47 (5.19%) LTR-RTs and 116 (3.76%) DNA TEs present in the assembled B. rapa sequences were found as intact or truncated elements in the assembled B. oleracea sequences. By contrast, 59 (1.45%) LTR-RTs and 145 (1.86%) DNA TEs present in the assembled B. oleracea sequences were found as intact or truncated elements in the assembled B. rapa sequences. These shared LTR-RTs in B. rapa and B. oleracea were dated, on average, to 4.35 and 3.55 Ma, close to the estimated divergence time of the two Brassica genomes (Tables 1 and S1; Inaba and Nishio, 2002).

Table 1. Transposable element (TE) insertions shared by the Brassica rapa (AA) and Brassica oleracea (CC) genomes
Categories of TEsB. rapa TEs shared by B. oleraceaB. oleracea TEs shared by B. rapa
Detected with assembled sequences of B. oleraceaDetected with unassembled NGS reads of B. oleraceaDetected with assembled sequences of B. rapaDetected with unassembled NGS reads of B. rapa
  1. LTR-RT, long terminal repeat retrotransposon.

  2. a

    Based on the examined 906 LTR-RTs in B. rapa and 4061 LTR-RTs in B. oleracea.

  3. b

    Based on the examined 3084 DNA TEs in B. rapa and 7810 DNA TEs in B. oleracea.

  4. c

    Based on the examined 3990 TEs in B. rapa and 11871 TEs in B. oleracea.

  5. ‘Share’ means ‘shared at orthologous sites between compared genomes, as predicted by the pipelines described in this study’.

LTR-RTs:
LTR/Copia30344170
LTR/Gypsy10151121
LTR/Unclassified78711
Subtotal475759102
Proportiona (%)5.196.291.452.51
DNA TEs:
Tc1/Mariner 8111212
hAT 11111212
Mutator 516724
PIF/Harbinger 0004
Pong 20311331
CACTA1020932
MITE/Stowaway468071134
MITE/Tourist516734
Helitron 118814173
Subtotal116273145456
Proportionb (%)3.768.851.865.84
Total number163330204558
Proportionc (%)4.088.271.724.70

Because approximately 201 and 90 Mb of the B. rapa and B. oleracea genomes, respectively, were not included in the assembled sequences (Wang et al., 2011a; http://ocri-genomics.org/bolbase), we thought that some TEs shared by the two genomes may not be detected based on a comparison of assembled sequences. We thus searched the junction sites of these well-defined TEs present in one genome against the next generation sequencing (NGS) reads generated from the other genome to determine whether the TEs are shared by the two genomes (Table 1, Figure S1; see 'Experimental Procedures'). The junction sites of 57 (6.29%) LTR-RTs and 273 (8.85%) DNA TEs present in the assembled B. rapa sequences were found in the B. oleracea NGS read dataset. By contrast, the junction sites of 102 (2.51%) LTR-RTs and 456 (5.84%) DNA TEs present in the assembled B. oleracea sequences were found in the B. rapa NGS read dataset. We thus estimate that approximately 96.8% of the 4967 LTR-RTs and 93.3% of the 10 894 DNA TEs identified in this study are not shared by these two diploids. These observations were echoed by the observation that, out of 4133 intact LTR-RTs identified in the assembled sequences of the B. rapa and B. olecarea genomes, only 128 (3%) were estimated to be older than 4 million years (Table S1)–a time point for the split of the two genomes (Inaba and Nishio, 2002; also see the estimate from this study below). Together, these data indicate that the two Brassica genomes are organizationally differentiated primarily by differential insertions of TEs after their divergence from a common ancestor. However, because the coverage of NGS reads from either of the two Brassica genomes used in the comparison remains low (about 20 ×  genome coverage), the proportions of shared TEs between the two genomes were likely to be underestimates.

Asymmetric divergence of LTR sequences between the two diploid genomes

We analyzed the average levels of sequence divergence between the two species among different categories of TEs using the shared TE set. As shown in Figure 1(a), the divergence levels vary, ranging from 0.0572 (Helitrons) to 0.1142 (MITEs). The average divergence of these shared TEs is 0.0777 ± 0.0031 (SE), which is close to previously estimated synonymous divergence (0.0967 ± 0.0058 (SE)) of 31 genes between the A and C genomes (Fourmann et al., 2002; Takuno et al., 2007).

Figure 1.

Intra- and inter-specific sequence divergence evaluated based on orthologous transposable elements (TEs) shared by the assembled portions of the Brassica rapa and Brassica oleracea genomes.

(a) Inter-specific sequence divergence of different categories of TEs. All full-length shared elements of each category between these two genomes were used to estimated the average sequence divergence and standard deviation.

(b) Intra-specific sequence divergence between two long terminal repeats (LTRs) of each of the 22 LTR-retrotransposons (RTs) shared by B. rapa and B. oleracea, and inter-specific sequence divergence between the two genomes at these 22 LTR-RT sites.

(c) Boxplot of intra-specific and inter-specific sequence divergence between two LTRs of each of the 22 orthologous intact LTR-RTs shared by B. rapa and B. oleracea.

We were particularly interested in a subset of intact LTR-RTs that are shared by the B. rapa and B. oleracea genomes. Theoretically, the two copies of each of these elements were identical in the two genomes upon their splitting from a common ancestor, and have been evolving independently since then. Thus, the sequence divergence between two LTRs of individual LTR-RTs shared by the two genomes allowed us to compare the rates of sequence divergence at these LTR-RT insertion sites in different genomes. As shown in Figure 1(b), the intra-specific divergence levels of two LTR sequences of individual elements at orthologous sites of the two genomes are positively correlated (= 0.9018, < 0.01), and are also positively correlated with the levels of inter-specific divergence of these elements between the two genomes (B. rapa,= 0.6327, < 0.01; B. oleracea,= 0.7758, < 0.01). Interestingly, of the 22 orthologous loci examined, 20 in B. rapa showed higher levels of sequence divergence between two LTRs of each element than in B. oleracea (Figure 1(b)), and overall a significant higher level of sequence divergence in the former genome than in the latter genome was detected (< 0.01, Student's paired t-test; Figure 1(c), Table 2).

Table 2. Inter-specific comparison of intra-element long terminal repeat (LTR) sequence divergence and the evolutionary rates of triplicated genes between Brassica rapa and Brassica oleracea
Genomic featurea B. rapa b B. oleracea b P-valuec
  1. a

    Ka, Ks, ω and genetic recombination (GR) rates (triplicated genes) were calculated by pairwise comparison among three copies of individual triplicated genes. A total of 243 triplicates retained in the syntenic regions of the two genomes were chosen for this analysis. Ka, Ks and ω (compared with A. thaliana) of the 691 genes (out of 729 triplicated genes) were calculated based on their respective orthologs in A. thaliana.

  2. b

    Mean ± SD.

  3. c

    Student's paired t-test.

  4. d

    Genetic recombination rates on the whole genome level were based on the genetic distance of individual chromosomes of B. rapa and B. oleracea genomes and assembled genome sequences of corresponding chromosomes (Student's t-test).

Nucleotide divergence between two LTRs of individual LTR retrotransposons0.1339 ± 0.05660.1090 ± 0.05150.0001
Ka: among triplicated genes0.0717 ± 0.03740.0730 ± 0.03760.0068
Ks: among triplicated genes0.3237 ± 0.08650.3187 ± 0.09060.0468
ω (=Ka/Ks): among triplicated genes0.2381 ± 0.13670.2494 ± 0.14520.0002
Ka: compared with Arabidopsis thaliana0.0769 ± 0.04110.0778 ± 0.04130.0141
Ks: compared with A. thaliana0.4442 ± 0.19970.4374 ± 0.18200.0170
ω (=Ka/Ks): compared with A. thaliana0.1966 ± 0.12780.2009 ± 0.12970.0018
Local GR rate (triplicated gene loci, cM/Mb)4.8843 ± 2.86702.1628 ± 1.6840<0.0001
Average GR rate (whole genome, cM/Mb)d5.2699 ± 1.95372.1675 ± 0.60160.0006

The LTR-RTs shared by the B. rapa and B. oleracea genomes were dated based on LTR sequence divergence, using the substitution rate of 1.5 × 10−8 (Koch et al., 2000). Since the rates of nucleotide substitution vary within an individual genome and between the two genomes, we used the average degree of nucleotide substitution between two LTRs of individual LTR-RTs to estimate the average age of these elements, which is 4.05 million years. Because these elements were integrated prior to the divergence of these two species, we estimate that the splitting of the two species should have occurred within the past 4.05 million years. This is very close to the divergence time estimated by Inaba and Nishio (2002) (3.75 million years) based on analysis of synonymous substitution of the SLR1 gene between the two species.

Asymmetric divergence of genic sequences between the two diploid genomes

It has been suggested that the Brassica lineage has experienced a WGT event that occurred approximately 13–17 Ma (Yang et al., 1999, 2006). Recent sequencing and comparative analysis of the B. rapa and B. oleracea genomes have identified the triplicated genes retained in the two genomes (Wang et al., 2011a; http://ocri-genomics.org/bolbase). To understand whether the higher rates of sequence substitution detected at the 22 orthologous LTR-RT sites are specific to LTR sequences or genome-wide patterns of sequence divergence in the two genomes, we compared rates of intra-genomic synonymous and non-synonymous substitutions between B. rapa and B. oleracea using a common set of highly confident triplicated genes retained in the syntenic regions of the two genomes. These include 243 triplicates in B. rapa and their corresponding orthologs in B. oleracea (Figure S2 and Table S2). Because the three copies of each of the 243 retained triplicates between B. rapa and B. oleracea were considered to have experienced independent evolution only after the split of these two species, as shown in Figure 2, the levels of sequence divergence among the three copies of individual triplicates in each of the two species allowed us to compare the evolutionary rates of genic sequences between the two genomes.

Figure 2.

Divergence of intra-element long terminal repeat (LTR) sequences and triplicated genes in the context of evolutionary history of Arabidopsis thaliana, Brassica rapa and Brassica oleracea.

Ovals indicate the orthologous triplicates retained in the syntenic regions of B. rapa and B. oleracea genomes. Two rectangles connected by a single line represent the two LTRs of one LTR-retrotransposon. The depths of different colors imply different levels of nucleotide divergence. WGT, whole genome triplication.

Our data revealed a positive correlation of the divergence levels of the 243 triplicates between B. rapa and B. oleracea and a significantly higher level of synonymous substitution (Ks) among three copies of triplicated genes in B. rapa than in B. oleracea (< 0.05, Student's paired t-test; Figure 3(b), Table 2). These results, consistent with the observation obtained from the analysis of the 22 orthologous LTR-RTs, suggest a higher rate of point mutation at neutral sites in B. rapa than in B. oleracea. Interestingly, these same sets of triplicated genes showed a significantly lower level of non-synonymous substitution (Ka) in B. rapa than in B. oleracea (< 0.01, Student's paired t-test; Figure 3(a), Table 2) and a significantly lower ratio of Ka/Ks (ω) in the former than in the latter (< 0.01, Student's paired t-test; Figure 3(c), Table 2), suggesting that overall the triplicates retained in the B. rapa genome have undergone stronger purifying selection than their orthologs in the syntenic regions of B. oleracea, although exceptions for some of the individual triplicates were observed (Table S2).

Figure 3.

Boxplot comparisons of inter-specific sequence divergences, evolutionary rates and recombination rates of the 243 triplicates retained in the syntenic regions of the Brassica rapa and Brassica oleracea genomes.

(a) Non-synonymous substitution rate, Ka.

(b) Synonymous substitution rate, Ks.

(c) ω (Ka/Ks).

(d) Average rates of genetic recombination at three members of individual triplicates. The labels A and C beneath the panels indicate the B. rapa and B. oleracea genomes. The bottom and top boundaries of the box are the first and third quartiles, and the bold lines within individual boxes are the medians, which are referred to as the second quartiles. The ends of the whiskers (the dotted lines) represent the minimum values and maximum values of the data.

Although we were able to identify and compare the corresponding triplicates between the two genomes as described above, none of the three sets of triplicated genes retained from the triplication event could be determined. As a result, the differences in evolutionary rates for a single set of triplicated genes were not directly revealed by comparison of the 243 triplicates between the two genomes. In an attempt to shed light on the evolutionary rates of individual genes between the two Brassica genomes, we aligned each of the 729 (i.e. 243 replicates times three) genes in the two genomes with their putative orthologs in A. thaliana and were able to calculate Ka, Ks and ω for each of 691 (out of 729) genes in B. rapa versus their respective orthologs in Arabidopsis and those for each of 691 orthologous genes in B. rapa versus their respective orthologs in Arabidopsis. As shown in Tables 2 and S3, overall the 691 genes showed a significantly higher rate of Ks in B. rapa than in B. oleracea (< 0.05, Student's paired t-test), a significantly lower level of Ka (< 0.05, Student's paired t-test) and a significantly lower ratio of Ka/Ks (ω) (< 0.01, Student's paired t-test) in the former than in the latter. These results were consistent with the observations obtained from the analysis of 243 corresponding triplicates, suggesting that the differences in evolutionary rates and intensities of purifying selection between the two genomes are most likely to reflect the real scenarios of their independent evolution. We would like to point out that both Ka and Ks are statistics with levels of uncertainty. As a result, the confidence levels around the mean Ka and Ks values may be somewhat lower than described above.

In an attempt to shed light on the evolutionary forces driving the asymmetric evolution of orthologous LTR-RTs and triplicated genes between the B. rapa and B. oleracea genomes, we estimated the rates of local genetic recombination at the 729 orthologous triplicated gene sites (243 triplicates) in each of the two genomes by comparing genetic and physical (sequence) maps (Wang et al., 2011a,b, 2012; Cheng et al., 2011; http://brassicadb.org/brad; http://www.ocri-genomics.org/bolbase/) following the methods described previously (Tian et al., 2009; Du et al., 2012; see 'Experimental Procedures'). We found that the overall recombination rate at these orthologous loci in B. rapa was extremely significantly higher than that in B. oleracea (< 0.01, Student's paired t-test; Figure 3(d); Table 2). Nevertheless, because both the assembled B. rapa and B. olecarea genomes contain numerous gaps and unassembled regions, the estimation of local recombination rates for either of the two genomes remains rough.

Different rates of LTR-RT DNA removal through solo LTR formation between the two diploid genomes

A solo LTR is presumably the product of an unequal homologous recombination (UR) event (Devos et al., 2002). Through UR between two LTRs of a single LTR-RT, one LTR and internal part of the element is removed, leaving a single LTR (i.e. a solo LTR) at its original site of integration. The formation of solo LTRs has been recognized as a primary mechanism for removal of LTR-RT DNA from the host genome, contracting genome expansion (Devos et al., 2002; Ma et al., 2004; Tian et al., 2009). Although it is difficult or impossible to determine when an UR event occurred to generate a solo LTR, the ratios of solo LTRs to intact elements belonging to individual families or located in particular genomic regions are often used as parameters to assess the relative pace of DNA removal (Bennetzen et al., 2005; Tian et al., 2009).

Overall, a significantly higher ratio of solo LTRs to intact elements was observed in the assembled genomic regions of B. rapa than in the assembled genomic regions of B. oleracea (< 0.01, χ2-test; Table 3). When only LTR-RTs shared by the two genomes were included, higher ratios of solo LTRs to intact elements were observed in both species, suggesting that more solo LTRs were formed along evolutionary time. The ratio of solo LTRs to intact elements in the B. rapa genome was found to be significantly higher than in the B. oleracea genome (< 0.01, χ2-test; Table 3). On average, intact LTR-RTs in B. rapa were younger than in Boleracea. If this holds true for both intact elements and solo LTRs in the two species, then the relative rate of formation of solo LTRs in the former species, in contrast with the latter species, would be even higher than reflected by the different ratios of solo LTRs to intact elements in the assembled portions of the two genomes. We should note that current assemblies of both the B. rapa and B. oleracea genomes contain numerous gaps, thus the ratios of solo LTRs to intact elements in the assembled portions of the genomes may be different from the actual ratios of solo LTRs to intact elements in the entire genomes. Nevertheless, orthologous LTR-RTs shared by syntenic regions of the two genomes also showed a higher ratio of solo LTRs to intact elements (Table 3), suggesting that the ratios detected in the assembled portions of the two genomes may be reflective of those in the entire genomes.

Table 3. Comparison of relative rates of solo long terminal repeat (LTR) formation between the Brassica rapa and Brassica oleracea genomes
CategoriesaNo. of solo LTRsNo. of intact elementsRatiob
  1. LTR-RTs, long terminal repeat retrotransposons; NGS, next generation sequencing.

  2. a

    LTR-RTs identified in the assembled sequences of one of the two diploid genomes were compared with the assembled and unassembled NGS reads of the other genome to determine whether the insertion site of an element is shared by the two compared genomes.

  3. b

    Differences in the ratios of solo LTRs to intact elements between B. rapa and B. oleracea were evaluated by χ2-test.

  4. c

    < 0.01.

LTR-RTs in the assembled B. rapa genome3466930.50c
LTR-RTs in the assembled B. oleracea genome117434400.34
LTR-RTs in the syntenic regions of B. rapa genome1553440.45c
LTR-RTs in the syntenic regions of B. oleracea genome53117030.31
LTR-RTs in B. rapa shared by B. oleracea–NGS reads36211.71
LTR-RTs in B. oleracea shared by B. rapa–NGS reads52501.04
LTR-RTs in B. rapa shared by B. oleracea–assembly31161.94
LTR-RTs in B. oleracea shared by B. rapa–assembly31281.11

Gene disruption and transcriptional alteration associated with LTR-RTs in the two Brassica genomes

To understand potential influence of TE insertions on the functional variation of orthologous genes between the A and C genomes, we analyzed the nature of sequences harboring the 1332 retrotransposons and 3270 DNA transposons identified in the assembled sequences of B. rapa and the 5107 retrotransposons and 8275 DNA transposons identified in the assembled sequences of B. oleracea. Of these elements, 94 and 48 were found in introns and exons of the B. rapa genome, and 124 and 71 were found in introns and exons of the B. oleracea genome (Tables 4 and S4). The genes harboring individual TE insertions in each genome were compared with their orthologous sites as well as homologs in A. thaliana. Of the 337 TEs inserted into genic sequences, only 14 present in introns were shared by the two diploid genomes. It is generally assumed that TE insertions in introns do not affect the amino acid sequences encoded by corresponding genes. However, we found at least 48 TE insertions in exonic regions of B. rapa and 71 TE insertions in exonic regions of in B. oleracea. These 119 insertions were believed to have potentially disrupted the functions of their host genes (Table 4). Of the 337 genes harboring TEs, 231 (68.5%) were not annotated or were mis-annotated, probably due to the insertion of TEs (Table S4).

Table 4. Contribution of transposable element (TE) insertions to potential functional disruption of orthologous genes between Brassica rapa and Brassica oleracea
TE insertions into genesaRetrotransposonsDNA TEsBoth classes of TEs
ExonIntronExonIntronExonIntron
In B. rapa
No. of genic TE insertions302618684894
No. of genic TE insertions shared by B. oleracea000808
No. of orthologous genes without TE insertions in B. oleracea232415633887
No. of mis-annotated or unannotated genes with TE insertions301218284840
In B. oleracea
No. of genic TE insertions3337388771124
No. of genic TE insertions shared by B. rapa000606
No. of orthologous genes without TE insertions in B. rapa222821704398
No. of mis-annotated or unannotated genes with TE insertions333826465984
In B. rapa and B. oleracea
No. of genic TE insertions636356153119216
No. of genic TE insertions shared by the two genomes00012012
No. of orthologous gene pairs with TE insertions in one copy only45523613181183
No. of mis-annotated or unannotated genes with TE insertions63504474107124

The LTR-RTs contain all the required signal motifs for transcription, such as promoter, terminator, primer-binding site and polypurine tract (Kumar and Bennetzen, 1999). Theoretically, they should initially transcribe from the 5′ LTR, and terminate at the 3′ LTR (Figure S3). However, because of the repeated sequences of 5′ and 3′ LTRs, transcripts can read out from the 3′ LTRs of some elements to their flanking regions, including genic sequences to form chimeric readout transcripts, which may lead to alteration of the expression of their flanking genes (Kashkush et al., 2002, 2003). It is also possible that some of the LTR-RT readout transcripts were driven by the transcribed regions harboring these elements (Druker et al., 2004).

In an attempt to understand how and at what extent the expression of orthologous genes in the B. rapa and B. oleracea genomes may have been differentially influenced by adjacent retrotransposons, we analyzed the transcriptional activities of LTR-RT junction sites by searching the 3′ LTR junction sequences against transcriptomic data generated from three tissues (leaves, stems and roots) from each of the two Brassica diploids (see 'Experimental Procedures' and Figure S3). The results are illustrated in Figure 4. Of the 906 and 4061 LTR-RTs identified in the B. rapa and B. oleracea genomes, 37 and 70 were found to have driven the readouts of LTR-RT transcripts into their adjacent sequences (dubbed LTR junction transcripts). A large fraction of the detected LTR junction transcripts were present in one or two tissues of individual species, and only 1 (2.7%) and 11 (15.7%) LTR junction transcripts were detected in the three tissues of B. rapa and B. oleracea, respectively (Figure 4). These observations suggest that as potential regulatory elements in plants LTR-RTs are tissue-specific.

Figure 4.

Long terminal repeat (LTR) readout transcripts identified in different tissues from Brassica rapa and Brassica oleracea.

(a) Brassica rapa.

(b) Brassica oleracea.

Discussion

Inter-specific evolutionary asymmetry: point mutation versus purifying selection

The most striking observation in this study is perhaps the higher rates of nucleotide substitutions at neutral sites of DNA sequences in B. rapa than that in B. oleracea versus stronger purifying selection of triplicated genes in the B. rapa than in the B. oleracea genome. Levels of genome divergence have been widely studied by comparison of orthologous genes between species (Cenci et al., 2013). However, little work has been done on genome-wide analyses of evolutionary rates of individual genomes, perhaps due to the lack of appropriate tools available in the past. Characterization of orthologous LTR-RTs, and particularly a large set of triplicated genes retained in the two Brassica genomes, provides an unprecedented opportunity and unique vehicle to investigate and compare evolutionary rates in each of the two genomes. Because the 22 LTR-RTs were amplified just before the split of the two genomes about 3.75 Ma, the levels of sequence divergence between two LTRs of individual elements were basically reflective of or very close to the rates of nucleotide substitution within the timeframe of independent evolution of the two genomes (Figure 2). By contrast, the nucleotide substitutions detected among the three members of each of the 243 copies of triplicates in either B. rapa or B. oleracea were the outcome of around 13–17 million years' evolution, and thus the proportions of nucleotide substitutions occurring in either genome during their independent evolution in the past 3 or 4 million years would be relatively small (Figure 2). Despite this, significantly higher Ks, lower Ka and lower ω were still detected in B. rapa than in B. oleracea (Table 2). If the nucleotide substitutions that occurred in these orthologous genes after the divergence of these two species could be measured, more contrasting asymmetric evolutionary patterns between the two genomes would have been revealed.

Effects of recombination on inter-specific asymmetry of genomic features

Comparison of the syntenic regions of the B. rapa and B. oleracea genomes demonstrated a considerably higher density of genes and lower proportion of LTR-RTs in B. rapa than in B. oleracea (Wang et al., 2011a; http://ocri-genomics.org/bolbase). These different genomic features appear to be associated with the local rates of genetic recombination estimated based on orthologous triplicated genes and the overall rates of genetic recombination estimated based on the genetic distances and sequence lengths of individual chromosomes between the two genomes (Table 2). Although neither of the two genomes was completely assembled, and thus the rates of recombination could not be more accurately measured, additional observations seem to bolster our estimates on recombination rates. For example, a significantly higher ratio of solo LTRs to intact elements was observed in B. rapa than in B. oleracea (Table 3). Previous studies indicated that the formation of solo LTRs by unequal homologous recombination was affected by meiotic recombination in rice (Tian et al., 2009) and soybean (Du et al., 2010b, 2012). Thus, it is reasonable to speculate that the relatively more rapid removal of LTR-RT sequences by the formation of solo LTRs in B. rapa than in B. oleracea may be associated with a relatively higher rate of genetic recombination in the former species than in the latter. Potential effects of recombination on genomic differentiation and speciation have not been extensively investigated. Nevertheless, local rates of genetic recombination was found to positively correlate with gene densities and negatively correlate with the proportions of LTR-RT DNA along the chromosomes of several plants (Gaut et al., 2007; Tian et al., 2009; Du et al., 2012). Therefore, the asymmetric genomic features observed between B. rapa and B. oleracea may be at least partially caused by different rates of genetic recombination.

Effects of recombination on inter-specific asymmetry of evolutionary rates

Evolutionary asymmetry reflected by the rates of nucleotide substitution of genic sequences harbored in different chromatin environments, such as centromeric or pericentromeric regions versus chromosome arms, within individual organisms has been recently reported in several plant species (Lin et al., 2010; Fan et al., 2011; Du et al., 2012). Because the genomic regions that were compared generally showed extremely distinct genomic features and rates of genetic recombination, there was, therefore, an association between the rates of genetic recombination and asymmetric evolution of genes. For example, 13 genes located in the functional centromeric region of rice chromosome 8 were found to evolve more slowly than 1515 genes dispersed on the short arms of rice chromosome 3 (Fan et al., 2011). However, because two sets of different genes were compared, the driving forces forming such a pattern of intra-genomic asymmetric evolution remain obscure. Nevertheless, a recent study compared two members of each of the 2439 WGD-derived gene pairs located in pericentromeric regions–the cold spots for meiotic recombination and chromosomal arms–and revealed a higher level of Ks in genic sequences in the former chromatin environment than in the latter within the same genome. This study also revealed a positive correlation between local rates of genetic recombination and Ks in soybean, even when pericentromeric regions were excluded (Du et al., 2012), echoing long-term speculation that recombination facilitates the generation of point mutation (Gaut et al., 2007).

Our data revealed an overall association among contrasting genomic features such as the densities of genes and TEs and the rates of nucleotide substitutions between the two Brassica genomes. If the overall recombination rate in B. rapa is higher than in B. oleracea as indicated by the current data, it would be a logical speculation that the inter-specific asymmetric evolution was perhaps the effect of inter-specific variation of recombination rates in the syntenic regions compared in this study. While genetic recombination appears to be able to explain the observed higher rates of point mutations in B. rapa than in B. oleracea, it is less clear why the triplicate genes in the B. rapa genome have undergone a higher density of purifying selection than their orthologs in the B. oleracea genome. It was proposed that purifying selection is less effective in genomic regions of low recombination due to Hill and Robertson effect (Haddrill et al., 2007; Comeron et al., 2008; Betancourt et al., 2009; Charlesworth et al., 2009), which appears to be supportive of our observations. Nevertheless, no effects of recombination on the efficacy of natural selection were observed in Drosophila or primates (Betancourt and Presgraves, 2002; Bullaughey et al., 2008). Additional analyses of sequence divergence in each of the two Brassica species at population levels, profiling of the expression patterns of these triplicated genes and more accurate estimation of genetic recombination may provide a more comprehensive picture regarding the asymmetric evolution of the two genomes and the evolutionary forces shaping the distinct patterns of sequence divergence and speciation of these two species.

Regulation of TE-mediated alteration of gene expression

Genome-wide study of tissue-specific TE-mediated alteration of gene expression has not been described prior to this study, and potential interplays between genes and adjacent TEs in different tissues and/or at different developmental stages of their host plants remains unclear. Massive methylation changes at insertion sites of an LTR-RT family were observed in the first four generations of a newly formed wheat allohexaploid (Kraitshtein et al., 2010). It has been documented that DNA methylation is one of several epigenetic mechanisms that cells use to control gene expression (Phillips, 2008). We thus propose that differential hypomethylation or hypermethylation at LTR-RT insertion sites in different types of plant cells and chromatin dynamics associated with transposons may be responsible for tissue-specific expression of LTR junction sequences.

Experimental Procedures

Identification and classification of TEs

A combination of structure-based and homology-based approaches was employed to identify TEs in the assembled B. rapa and B. oleracea genome sequences, but the procedures and programs used for different classes or superfamilies of TEs varied. The LTR-RTs were characterized by the methods previously described (Du et al., 2010a,b). Intact LTR elements were initially identified by the program ltr_struc (McCarthy and McDonald, 2003), followed by manual inspection. The 5′ LTR of the intact elements with clearly defined boundaries were used to detect additional intact elements and solo LTRs following the approaches previously described (Ma and Bennetzen, 2004; Ma et al., 2004). Non-LTR retrotransposons (LINEs and SINEs) and DNA transposons (Tc1-Mariner, hAT, Mutator, Pong, PIF-Harbinger, CACTA and MITE) were identified following the protocol provided by Holligan et al. (2006). The conserved reverse transposase or transposase protein domains of each category were used as queries to search against the whole genome assemblies of B. rapa and B. oleracea using TBLASTN (Altschul et al., 1997). The upstream and downstream sequences of the matched sequences by TBLASTN searches with a common query were extracted and compared with each other to define their boundaries and structures, such as terminal invert repeats (TIRs) and target site duplications (TSDs). Helitron elements were identified by the program HelSearch 1.0 (Yang and Bennetzen, 2009) followed by manual inspection. Custom perl scripts were written to facilitate the data mining and analyses. Detailed manual inspection was conducted to confirm each predicted element and to define its structure and boundaries. The LTR-RTs and CACTA elements were classified into different families based on the criteria proposed by Wicker et al. (2003, 2007), while other elements were classified into superfamilies as previously described (Holligan et al., 2006).

Sequence divergence and LTR-RT insertion time

Homologous sequences were aligned using the program muscle (Edgar, 2004) or ClustalW (Thompson et al., 1994), and sequence divergence was assessed using mega4 (Tamura et al., 2007). The age of each LTR-RT was dated based on the divergence of its 5′ LTR and 3′ LTR (K) sequences, using the Kimura two-parameter method and an average mutation rate (r) of 1.5 × 10−8 substitutions per synonymous site per year (Koch et al., 2000), with the formula T = K/2r.

Triplicated genes with all three copies retained in both B. rapa and B. oleracea orthologous regions were used to calculate Ka, Ks and ω (Ka/Ks), using the yn00 module in the paml software (Yang, 2007). The average Ka, Ks and ω among each three copies of triplicated genes in B. rapa were compared with the average values of the orthologous genes in B. oleracea, using Student's paired t-test in the SAS package. In addition, the Ka, Ks and ω of the orthologous genes between B. rapa and A. thaliana, and between B. oleracea and A. thaliana were calculated and compared using Student's paired t-test in the SAS package.

Identification of shared TEs between different genomes

A semi-automated bioinformatics pipeline for identification of the shared TE elements between the two Brassica genomes using the assembled sequences and unassembled NGS reads is illustrated in Figure S1. Transposable elements without clearly defined terminal boundaries and unique junction sites were excluded from this analysis. Two approaches were employed to determine whether an element was shared by the two genomes, One is comparative analysis of orthologous regions of the assembled A or C genomes, with a focus on TE insertion junction sequences (100-bp TE end and 150-bp flanking sequences), as described earlier (Tian et al., 2009, 2012), to determine whether a TE element is present at orthologous regions of the two genomes. In this approach, an element was considered to be shared by the two genomes when the 250-bp junction sequence locating this element in one genome was found in the assembled sequences of the other genome. The other approach is comparative analysis of TE junction sequences (20-bp TE end and 20-bp flanking sequences) in one genome with the NGS reads from the other genome. In the latter approach, as illustrated in Figure S1, each junction sequence identified in the assembled A or C genome was used as a query sequence to BLAST-search against the NGS reads from the C or A genomes. An element was considered to be shared by the two genomes when the 40-bp TE junction sequences locating this element in one genome was found in the NGS reads of the other genome with ≥95% sequence identity.

Estimation of the rates of genetic recombination

The local rates of genetic recombination were estimated using mareymap (Rezvoy et al., 2007). A total of 430 genetic markers in a linage map of B. rapa (Wang et al., 2011b) and a total of 617 genetic markers in a linage map of B. oleracea (Wang et al., 2012) were checked and used for estimation of local genetic recombination rates. The average rates of genetic recombination for the whole genomes were estimated based on the genetic distance and the physical lengths of assembled sequences of individual chromosomes.

Identification of genes with TE insertions

The 2-kb upstream sequence and 2-kb downstream sequence flanking a TE insertion site (excluding one of the two TSDs) were combined as a single sequence to search against the annotated A. thaliana CDS database (The Arabidopsis Information Resource, www.arabidopsis.org) to identify genes harboring TE insertions in the assembled B. rapa and B. oleracea genome sequences. Each of these genes in one Brassica genome was aligned with its ortholog in the other Brassica genome, as well as their putative ortholog in A. thaliana. These sequences and alignments were manually inspected to define their introns and exons, and TE insertion sites.

Characterization of LTR-RTs with putative readout activities

The intact LTR-RTs and solo LTRs with clearly defined boundaries (Figure S1) were chosen to characterize putative readout activities of LTR transcripts. As illustrated in Figure S3 the TE junction sequences each contain a 45-bp 3′ LTR end and its 45-bp flanking sequence was extracted and used as a query sequence to BLAST-search against the RNA-Seq reads from three tissues (leaves, stems and roots) of B. rapa and B. olecarea, The presence of RNA-Seq reads matching (no more than 1-bp mismatch) a TE junction site with no less than 20 bp of the TE's flanking sequence from the TE insertion site was considered as evidence of LTR-RT readout transcription activities.

Statistical tests

The correlations among investigated parameters were assessed using Pearson's correlation in the SAS software. The differences in sequence divergence levels between individual triplicates or individual genes were estimated by Student's paired t-test. The differences in recombination rates, on the whole genome level, between B. rapa and B. olecarea were evaluated based on the genetic distance of individual chromosomes of the two genomes and the assembled genome sequences of corresponding chromosomes by Student's t-test. The significance of the difference in the ratios of solo LTRs to intact elements was evaluated by the χ2-test or Fisher's exact test.

Acknowledgements

We would like to thank Phillip SanMiguel for assistance in TE identification and data analyses, and Doug Yatcilla for the help of software installation and testing. This work was supported by the National Key Research Program to SL and Purdue Agricultural Research Program to JM. The authors have no conflict of interest to declare.

Ancillary