Comparative analysis of genome composition in Triticeae reveals strong variation in transposable element dynamics and nucleotide diversity

Authors


For correspondence (e-mail christopher_middleton@access.uzh.ch; wicker@botinst.uzh.ch).

Summary

A 454 sequencing snapshot was utilised to investigate the genome composition and nucleotide diversity of transposable elements (TEs) for several Triticeae taxa, including Triticum aestivum, Hordeum vulgare, Hordeum spontaneum and Secale cereale together with relatives of the A, B and D genome donors of wheat, Triticum urartu (A), Aegilops speltoides (S) and Aegilops tauschii (D). Additional taxa containing the A genome, Triticum monococcum and its wild relative Triticum boeoticum, were also included. The main focus of the analysis was on the genomic composition of TEs as these make up at least 80% of the overall genome content. Although more than 200 TE families were identified in each species, approximately 50% of the overall genome comprised 12–15 TE families. The BARE1 element was the largest contributor to all genomes, contributing more than 10% to the overall genome. We also found that several TE families differ strongly in their abundance between species, indicating that TE families can thrive extremely successfully in one species while going virtually extinct in another. Additionally, the nucleotide diversity of BARE1 populations within individual genomes was measured. Interestingly, the nucleotide diversity in the domesticated barley H. vulgare cv. Barke was found to be twice as high as in its wild progenitor H. spontaneum, suggesting that the domesticated barley gained nucleotide diversity from the addition of different genotypes during the domestication and breeding process. In the rye/wheat lineage, sequence diversity of BARE1 elements was generally higher, suggesting that factors such as geographical distribution and mating systems might play a role in intragenomic TE diversity.

Introduction

Hexaploid wheat (Triticum aestivum) is a major crop world-wide. It is a member of the Triticeae tribe, which also includes other economically important species such as Hordeum vulgare (barley) and Secale cereale (rye). The barley lineage also includes wild taxa such as Hordeum vulgare ssp. spontaneum. The wheat lineage includes the hexaploid T. aestivum as well as its diploid genome donors Triticum urartu, Aegilops speltoides and Aegilops tauschii. In addition it includes wild einkorn wheat (Triticum monococcum ssp. boeoticum) and its domesticated descendant Triticum monococcum ssp. monococcum, which are closely related to T. urartu. Triticeae species probably originated in the Fertile Crescent, which includes Iran, Iraq, south-east Turkey, Syria, Lebanon, Jordan and Israel (Kihara, 1944; Feldman et al., 1995; Devos et al., 2005; Kilian et al., 2007a; Bordbar et al., 2011). However, the divergence times and phylogenetic relationships, especially between bread wheat and closely related taxa, is not fully understood. Hordeum vulgare and the wheat/rye lineage were predicted to have diverged 10–15million years ago (Ma), while wheat and rye diverged approximately 5–11 Ma (Chalupska et al., 2008). Triticum urartu, Ae. speltoides and Ae. tauschii were predicted to have diverged from each other between 2 and 6 Ma (Huang et al., 2002; Akhunov et al., 2003; Chalupska et al., 2008).

Triticum aestivum has a total hexaploid genome size of approximately 16–17 Gb (Rees and Walters., 1965; Bennett and Smith, 1976). Wheat is an allohexaploid and formed of three genomes, denoted A, B and D. The haploid sizes of these genomes in T. aestivum are very similar to those of the other Triticeae members, which are generally 3500–8500 Mb (Eilam et al., 2007; Özkan et al., 2010; Bennett and Leitch, 2011). The complete genomic complement of T. aestivum AABBDD was formed from the hybridisation of three diploid ancestors (Akhunov et al., 2003; Edwards and Batley, 2010; Tomita et al., 2010). The first hybridisation event is estimated to have occurred between 0.20 and 1.3 Ma (Mori et al., 1995; Huang et al., 2002; Dvorak and Akhunov, 2005), between T. urartu (AA) and possibly Ae. speltoides (SS), to form the tetraploid species Triticum turgidum ssp. dicoccoides (AABB), wild emmer wheat (Dvorak and Zhang, 1990; Dvorak et al., 1993; Akhunov et al., 2005; Kilian et al., 2007b). The genome of domesticated emmer wheat T. turgidum ssp. dicoccon was further complemented by the addition of the D genome from Ae. tauschii approximately 8000 years ago to form the hexaploid T. aestivum (Kihara, 1944; McFadden and Sears, 1946; Feldman et al., 1995; Devos et al., 2005; Dubcovsky and Dvorak, 2007; Kilian et al., 2007b; Bordbar et al., 2011).

The large genome size of the Triticeae members and the presence of high numbers of repetitive elements which make up at least 80% of the whole genome complement (Bennett and Smith, 1976; Hollister and Gaut, 2009), have made sequencing the genomes of these species extremely challenging. As transposable elements (TEs) form such a large component of the genome, previous studies have generally focused on the contribution of different TE families to single species (Wicker et al., 2007; Rebollo et al., 2010; Tenaillon et al., 2011), while the differences in TE families between species are less well understood. Transposable elements come in two classes according to their method of transposition: Class I, which copy themselves via a RNA intermediate, before the newly synthesised element is inserted into a different region of the genome, and Class II, which transpose in a copy–paste mechanism, as they are directly cut from their position in the genome and reinserted elsewhere (Feschotte et al., 2002; Casacuberta and Santiago, 2003). There are large variations in the amount and the copy number of each element, when comparing different genomes (Kidwell, 2002). For example the genome of Arabidopsis thaliana contains approximately 10% TEs, whereas in most grass species TEs comprise between 50% (rice) and 80% (wheat) of the entire genome complement (Feschotte et al., 2002). Transposable elements play a role in a number of evolutionary processes, including insertion into protein-coding genes, illegitimate recombination and chromosome breakage (Slotkin and Martienssen, 2007). Anyone of these can have an influence over the fitness of the host. There are several mechanisms to control the level of transposition, including post-transcriptional silencing and methylation, However, these systems can be exacerbated during times of abiotic stress, leading to a proliferation of elements in the genome (Vicient et al., 1999; Todorovska, 2007).

Nucleotide diversity, π, represents the average sequence divergence between all homologous sequences among all individuals in a given set for comparison (Nei and Li, 1979). It is often used to infer the presence of past population bottlenecks in studies of domestication genetics, because when a population goes through a bottleneck, the allelic diversity in the population is diminished, and π is thus expected to be small.

A rare case of a cereal in which there have been no recent breeding bottlenecks was described by Kilian et al. (2007a), where no reduction of nucleotide diversity at all was found. The absence of a domestication bottleneck is in contrast to the conclusions of studies of domestication in intensively bred crop species, where claims for domestication bottlenecks are commonplace (Buckler et al., 2001; Doebley et al.,2006; Kilian et al., 2006). In that study, Kilian et al. (2007a) investigated nucleotide variation at 18 loci from 92 domesticated einkorn lines compared with 321 lines from wild populations. Several insights into domestication history emerged from that study. One of the most important insights was that wild einkorn is not really a single homogeneous population, rather it underwent a natural process of genetic differentiation prior to domestication, resulting in three distinct wild einkorn races. These three races, which were designated as α, β and γ, are genetically distinct both at the level of their haplotypes across 18 loci studied and at the level of their amplified fragment length polymorphism fingerprints. One of those races, wild race β, is genetically much more similar to domesticated einkorn, hence it is the race, or genotype, that was exploited by humans during domestication. Race β occurs only in the Karacadag and Kartal-Karadag Mountains in south-east Turkey today. A second major surprise in the findings of Kilian et al. (2007b) was that nucleotide and haplotype diversity in domesticated einkorn was found to be higher than in the β race. However, very little is known about the nucleotide diversity within transposable families within a genome.

Mating systems may also have an influence on the numbers of TEs and the nucleotide diversity of the elements within a genome. Beside the outbreeder S. cereale, two predominantly oubreeding species are known in the Triticum–Aegilops group within the Triticeae tribe: Ae. speltoides and Aegilops mutica (Kimber, 1987; Kilian et al., 2007b, 2011). All other taxa including, for example, T. urartu, einkorn wheat and Ae.  tauschii are inbreeders. Mating systems have been found to influence molecular evolution, reducing the levels of polymorphism and affecting the effective population size (Haudry et al., 2008). It has been stipulated that inbreeding can reduce the effective population size by as much as 50% (Pollak, 1987), therefore it can be extrapolated that there would also be a reduction seen in nucleotide diversity dependent upon mating system, with lower diversity seen in inbreeders compared with outbreeders. Transposable element families also represent populations inside genomes with subfamilies and hardly ever are two copies of a family absolutely identical. Little is known about the nucleotide diversity of TE families within a genome as there has been no quantitative survey. The TE diversity within a genome has not been studied before. Therefore it is not known what influence domestication, mating system or geographical isolation will have on the TE diversity within a genome.

Next-generation sequencing provides new opportunities due to the large volume of datasets it can produce. Many studies have been conducted using 454 sample sequencing as a platform (Macas et al., 2007; Swaminathan et al., 2007; Mardis, 2008); Wicker et al. (2009) utilised this method to analyse the TE composition of the barley genome. The main finding was that a small number of TE families contribute to more than 50% of the genome, with the vast majority of these pertaining to the class I long terminal repeat (LTR) retrotransposons. It has been noted before that the BARE1 clade in barley and the Angela/Wis clade in wheat form approximately 10% of the genome (Vicient et al., 1999; Kalendar et al., 2000; Soleimani et al., 2006; Wicker et al., 2009). The study conducted by Wicker et al. (2009) looked at the barley repetitive fraction, with a comparison being made with a limited dataset from T. aestivum, suggesting that TE compositions of Triticeae genomes vary between taxa, for example the Gypsy element BAGY2 was more abundant in the barley than in the wheat taxa (Wicker et al., 2009). However, no broad survey of an entire tribe has been conducted yet.

Here we used 454 sequencing to obtain between 2 and 5% genome coverage for 10 Triticeae taxa including the A, B and D genome donors of bread wheat. We wanted to address the following questions. (i) Are there differences in the composition of the genome, in particular regarding the abundance and variation of TEs between the taxa? (ii) Are there differences in the level of nucleotide diversity of particular TEs within the genome, with attention being paid to different factors such as the domestication process, geographical distribution and mating system?

Results

454 sample sequencing and characterisation of genome compositions

454 titanium 7 kb paired end sequencing was done to produce a genome sequence coverage of between 2 and 5% for each of S. cereale, T. urartu, Ae. speltoides and Ae. tauschii. The datasets resulted in approximately 441 000–546 000 reads, with an average size of between 260 and 300 bp. In addition, 454 sequences of the Triticum taxa T. monococcum ssp. boeoticum (hereafter T. boeoticum) and T. monococcum ssp. monococcum (hereafter T. monococcum) and the Hordeum taxa H. vulgare cv. Barke and H. vulgare ssp. spontaneum (hereafter Hspontaneum) accessions FT11 and FT462 were analysed. The sample of these additional taxa consist of approximately 450 000–1 300 000 reads with average read lengths of 295–396 bp. The three taxa T. urartu, T. boeoticum and T. monococcum all contain the A genome and will be referred to in the text as the ‘A genome taxa’. Furthermore, 500 000 publicly available T. aestivum cv. Chinese Spring 454 sequences with an average size of 415 bp were also included in the analysis (http://www.cerealsdb.uk.net/CerealsDB/Douments/DOC_CerealsDB/) (Table 1). The reads for each of the species were classified using BLAST searches against different databases, which included Triticeae repeats TREP10 (known TEs) and a BLASTX search against PTREP11 (TE protein sequences) as well as organelles and genes. Transposable elements made up the largest proportion of reads for each of the grass species, with between 64.2 and 71.2%. The T. urartu sample contained the highest number of identified TEs with a total of 72.3% of all reads (Figure 1).

Table 1. 454 datasets for each of the 10 species
Taxon nameNo. of 454 readsAverage size (bp)Total (Mb)Coverage (%)
  1. a

    Sequence supplied by the University of Bristol and the University of Liverpool.

  2. b

    Sequence supplied by Benjamin Killian.

Triticum aestivum a 4999994152071.2
T. urartu 5460572651452.6
T. monococcum b 5075233931993.6
T. boeoticum b 4588755402484.5
Aegilops speltoides 4415402631162.0
Ae. tauschii 6402662671713.1
Secale cereale 5861272921712.9
Hordeum vulgare b 13253842963927.1
H. spontaneum FT11b6592633692614.7
H. spontaneum FT462b6423123752414.4
Figure 1.

Composition of the 454 reads for each of the individual taxon snapshots. Known transposable element families make up the largest proportion of the reads. Other sequences identified include chloroplast, mitochondria, rDNAs (ribosomal DNAs); CDS (coding sequences) were also included in the analysis.

A small number of TE families make up a large proportion of Triticeae genomes

Further investigation of the TE reads was conducted in order to identify the proportions of different TE families present in each of the genomes. All taxa analysed were found to contain between 226 and 241 different TE families. A large number of the identified TE families are present in very low copy numbers in the genomes, and approximately 15 different families make up at least 50% of the genome complement for each species (Figure 2). Approximately 70% of the characterised TE reads belong to the Gypsy superfamily of retrotransposons.

Figure 2.

Abundance of transposable element (TE) families and their variation between taxa for 11 TE families, displaying the TE families with the largest differences between taxa.

In all the taxa analysed, the BARE1-clade (which includes Angela and Wis) made up the largest proportion of the identified TE reads, with between 10.37 and 14.18% classified for all taxa (Figure 2). Hordeum spontaneum and H. vulgare contained the same proportion of BARE1 reads with a total of 12.96%, this also confirms previous studies in H. vulgare (Vicient et al., 1999; Kalendar et al., 2000; Soleimani et al., 2006; Wicker et al., 2009) which found that BARE1 contributed to more than 12% of the genome. The hexaploid wheat T. aestivum contained a lower percentage of BARE1 elements (10.54%) than the diploid genome donors, but was similar to S. cereale with 10.37%.

The abundance of TE families differs strongly between taxa

The Gypsy families Fatima and Sumana are examples of elements that differ considerably between the barley and wheat species (Figure 2). In H. spontaneum and H. vulgare only 0.06% of reads were classified as Fatima elements. Secale cereale also contained a low copy number of Fatima elements with 0.45% of reads. This is in contrast to the wheat taxa which contain between 3.06 and 7.17% Fatima elements. The Sumana element shows a similar pattern of abundance to Fatima between the barley and wheat taxa (Figure 2).

The most dramatic differences were observed for BAGY2, which was found to make up 5.45% of the overall H. vulgare reads and 5.36% of H. spontaneum reads. However, BAGY2 only accounted for between 0.10 and 0.22% in rye and the wheat taxa. The Gypsy family Haight also occurs in higher abundance in H. vulgare and H. spontaneum than in the other Triticeae members, with Haight accounting for 1.34 and 1.18% in H. spontaneum and H. vulgare, respectively, but was practically absent from rye and the wheat taxa. No Sumana elements were found in H. vulgare or H. spontaneum, but they were present in increasing abundance of between 0.85 and 1.51% in Ae. speltoides, Ae. tauschii, the three A genome taxa, S. cereale and T. aestivum. Similarly, no Sumaya elements were identified in either H. spontaneum, H. vulgare or S. cereale, while their contribution to the genomes of Ae. speltoides, T. aestivum, Ae. tauschii and the three A genomes is between 0.27 and 1.14%. Aegilops tauschii showed a slightly lower content of both Sumaya and Fatima elements than Ae. speltoides, the A genome taxa and T. aestivum (Figure 2).

The Gypsy element Erika, also displayed a difference in abundance between taxa. The highest numbers of Erika were found in the three A genome taxa, where their genomes were made up of between 3.09 and 3.86% Erika elements. Whereas the genomes of T. aestivum, Ae. tauschii and S. cereale were made up of approximately half the number of Erika elements that were found in the A genome taxa (1.5%). However, in Ae. speltoides and the barley taxa low numbers of Erika elements were identified, between 0.03 and 1.3%.

Differences in TE classification were not only restricted to retroelements. The CACTA element Jorge showed very strong variation between taxa, with only 0.03% being attributed to that TE family in H. vulgare and H. spontaneum. In contrast the abundance of Jorge elements increased in S. cereale to 1.17%, with a further increase to 1.84% in Ae. speltoides. Similar amounts of Jorge elements were found in the A genome taxa and T. aestivum, which contained between 2.4 and 3.83% respectively. The highest abundance of Jorge elements was observed in Aetauschii which contained 4.93% (Figure 2).

Nucleotide diversity of BARE1 differs between species

As BARE1 makes up roughly 10–14% of the genomes in all taxa studied, it was possible to assess the nucleotide diversity of BARE1 elements within each taxon. Nucleotide diversity describes the degree of polymorphism within a population (Nei and Li, 1979). In our case, we used it to assess the level of polymorphism of the BARE1 family within the genome. The BARE1 clade contains BARE1 from barley and Angela from wheat. Although, 70–80% identical at the DNA level, BARE1 and Angela can be distinguished by some highly diagnostic characteristics (e.g. the BARE1 LTR starts with TGTT, while Angela begins with TGAA). Angelas were found to be completely absent from the barley accessions. Rye contains both Angela and BARE1 elements, while wheat contains only minuscule amounts BARE1. It was possible to draw phylogenetic trees based on a consensus sequence of the first 300 bp of the LTR of both BARE1 and Angela in all the taxa (Figure 3(a)). The tree clearly shows the close phylogenetic relationship between BARE1 and Angela, with Angela arising as a subfamily of BARE1.

Figure 3.

Phylogenetic tree of the consensus sequence of the first 300 bp of the LTR of BARE1 and Angela elements and the nucleotide diversity of two regions of a retrotransposon LTR, using multiple sequence alignment. (a) Phylogenetic tree showing that Angela arose as a subfamily of BARE1 in the wheat/rye lineage, the tree was produced using MEGA 5.0 with 1000 bootstrap replicates. (b) Nucleotide diversity of the barley BARE1 element with comparison being made between the domesticated Hordeum vulgare cv. Barke and the two wild genotypes of H. spontaneum. (c) Differences in nucleotide diversity between the Triticeae species, showing a difference in the nucleotide diversity between two regions of the Angela element LTR.

Nucleotide diversity was tested using the first 300 bp and a region between 600 and 900 bp of the LTR of BARE1 as queries in BLASTN searches against the 454 datasets. These sequences were chosen as the first 300 bp of the Angela and BARE1 LTR evolves rapidly and enables BARE1 and Angela to be distinguished from each other. The second region between 600 and 900 bp was chosen for a second comparison. Regions of 300 bp were selected as the size coincides with the average 454 read length generated for each dataset, so a complete sequence read would match the length of the LTR region selected. For all taxa, we isolated approximately 100 sequences that covered the entire query sequences. The sequences were then aligned using clustalw and nucleotide diversity was calculated from this alignment (Figure 3).

Hordeum vulgare, H. spontaneum accession FT11, H. spontaneum accession FT462 and S. cereale were compared directly with BARE1. The nucleotide diversity for the three Hordeum accessions is relatively low, with H. vulgare scoring less than 0.05 for both LTR regions used for the analysis (Figure 3(b)). Nucleotide diversity for the two H. spontaneum genotypes was approximately half that of domesticated H. vulgare cv. Barke. In contrast, the nucleotide diversity of BARE1 elements in S. cereale is approximately twice that of domesticated H. vulgare, with a diversity of approximately 0.11. (Figure 3(b)).

For the taxa T. urartu, Ae. speltoides, Ae. tauschii, T. boeoticum, T. monococcum and T. aestivum we measured nucleotide diversity of the Angela element which is the wheat homologue of BARE1 (see above). Secale cereale was also included in the analysis of Angela as it contains both Angela and BARE1 elements. The same LTR regions of the Angela element as for BARE1 were used for the purposes of the analysis. Generally, the Angelas in the wheat taxa have higher nucleotide diversities than the BARE1 element in the three barley taxa, with Ae. tauschii having the lowest (0.053) and Ae. speltoides the highest (0.076) (Figure 3(c)). However, a lower nucleotide diversity of the Angela element was found in T. boeoticum and T. monococcum, with values of 0.037 and 0.012, respectively (Figure 3(c)). These lower values in T. boeoticum and T. monococcum are similar to the results obtained for BARE1 in the three barley accessions. DnaSP (Librado and Rozas, 2009) was used to assess the statistical validity of the nucleotide variation within the taxon using Tajima's test (Tajima, 1989) and it was found that all the nucleotide diversity tests were above the 95% confidence level.

Discussion

The objectives of this study were to analyse the genome composition of several Triticeae species, with respect to TE families. 454 titanium sequencing was used to generate between 2 and 5% coverage of the respective genomes. This allowed quantitative comparisons of overall genome composition, providing insight into how these genomes have formed and evolved rapidly in a relatively short evolutionary time of between 10 and 15 million years since they diverged from a common ancestor (Huang et al., 2002; Akhunov et al., 2003; Chalupska et al., 2008). Approximately 70% of all 454 reads could be classified as known TE families. These results closely mirror previous work by Wicker et al. (2009) in which 454 reads from H. vulgare were characterised and 69.14% of the reads were found to be TE related. Although coding sequences of genes made up only approximately 1% of the genomic samples, we still sampled approximately 5000 genes per taxon due to the very large number of reads yielded by the Roche/454 technology. Since exploration of gene space was not the focus of this study, a detailed analysis of genic sequences may be presented elsewhere.

The number of reads that remained unclassified ranged from 24.29 to 30.25%; these are more than likely due to unclassified TEs as the predicted levels of TEs in Triticeae are at least 80% (Charles et al., 2008; Choulet et al., 2010). This idea is further reinforced when particular datasets are examined further, for example for T. urartu and T. monococcum in which the greatest number, 72.26% and 71.20% respectively, of TE families were identified. One possible explanation of this is that the TREP database was originally built with sequences from T. monococcum. Thus the TE variety of the A genome is particularly well covered in this database.

The TE composition differs strongly between Triticeae species

Only a few studies have been conducted on TE abundance in Triticeae. Charles et al. (2008) and Wicker et al. (2009) reported differential amplification of TE families in the A and B genomes, as well as between barley and wheat – and our broad dataset shows this to be a general pattern across Triticeae species. We found that some TE families have strongly varying levels of proliferation in the different taxa. Although, they are found in all Triticeae species analysed, several TE families have undergone either proliferation or a reduction in abundance during species diversification. In some cases, the abundance of TE families reflects phylogeny. For example in barley both Jorge and Fatima are virtually absent. In rye, they contribute 0.5–1% to the genome while in wheat and its close relatives they contribute a considerable portion of the genome (2–7%). This indicates that these two TE families started to proliferate in the period 2–6 Ma (Chalupska et al., 2008) after the common ancestor of rye and wheat diverged from the ancestor of barley. Eventually, Jorge and Fatima became very successful genome colonisers in the lineage leading to wheat. The Gypsy family Sumaya shows even more recent pattern of proliferation, as it is present only in wheat and its close relatives but absent from rye and barley.

Another Gypsy element, Erika, was identified as being present in higher abundance in the wheat taxa containing the A genome, (T. urartu, T. boeoticum and T. monococcum), with it accounting for more than 3% of the overall genome composition. Approximately half of this amount of the Erika element was classified in T. aestivum, Ae. tauschii and S. cereale, and Erika was found to be virtually absent in Ae. speltoides, H. spontaneum and Hvulgare. This indicates that the Erika element proliferated specifically in the A genome lineage. There are also examples of where TE families have proliferated in the lineage leading to barley, while they became virtually extinct in the wheat/rye lineage: The Gypsy families BAGY2, Haight and Surya contribute approximately 5% to the genomes of H. spontaneum and H. vulgare. This is in strong contrast to the rye and wheat species where the three contribute less than 0.5% to the genome.

The question of why some TE families expand their numbers whereas some reduce cannot be answered conclusively with the available data. A possible explanation is that stochastic processes determine the abundance of individual TE families in genomes. Previous studies showed that TEs are active in waves and that the host genome needs time to adapt to newly active TEs by establishing means to silence them (Wicker et al., 2007; Choulet et al., 2010; Slotkin, 2010). We propose that the size of a TE family depends on the level of activity of the TE and the time needed by the host to establish silencing. In addition, repetitive sequences are rapidly (in evolutionary terms) deleted from the genome through processes such as illegitimate recombination, leading to a constant turnover of intergenic sequences (Devos et al., 2002; Wicker et al., 2007). Thus only minor variations in these factors could lead to very different TE family sizes between species.

Another question is where do novel TE families come from? We do not believe that any of the TE families studied arose completely de novo in a particular evolutionary lineage. The different TE families differ strongly from each other at the DNA level, indicating that they diverged long before the divergence of the Triticeae species. Thus, the time needed for a new family to emerge by far exceeds the evolutionary time-scale that is studied here. We can also exclude horizontal transfer of TE families from taxa outside the Triticeae because almost all TE families are found at at least a very low abundance in all species examined. We conclude that the vast majority of TE families are actually present in all Triticeae species, but most are present at such low copy numbers that they are hardly detectable in our samples of limited sample size (2–5% of a genome equivalent).

Sequence diversity of BARE1 populations

The large whole-genome samples also allowed an assessment of nucleotide diversity of BARE1/Angela elements, the most abundant TE family within Triticeae genomes. Previous studies showed a close phylogenetic relationship of BARE1 and Angela (SanMiguel et al., 2002; Wicker et al., 2009). Our data now clearly show that Angela arose as a subfamily of BARE1 in the wheat/rye lineage (Figure 3(a)). This can be seen in S. cereale, as rye contains both the BARE1 and Angela elements whereas Hordeum only contains BARE1 and wheat only Angela elements. One can speculate that Angela out-competed BARE1 in the wheat lineage, and this could explain the virtual absence of BARE1 in the wheats. Interestingly, Angela diversity is very similar in the rye and wheat taxa, with the exception of T. boeoticum and T. monococcum (see below). This indicates that the full diversity of Angelas had evolved earlier, possibly in the common ancestor of rye and wheat. In addition T. aestivum has the same Angela diversity as the subgenome donors. This indicates that Angelas have not diverged into many new subfamilies in the genome donors, otherwise T. aestivum would display a greater nucleotide diversity of Angela elements.

Do mating systems, breeding and geographic isolation influence TE diversity?

Interestingly, sequence diversity of BARE1/Angela elements varies strongly between different taxa. For example, BARE1 diversity in S. cereale is more than five times higher than in wild barley H. spontaneum. In fact, our data suggest that inbreeding taxa (e.g. barley or the A-genome species) tend to have a lower BARE1/Angela sequence diversity. Previous studies showed that probably only very few TE copies are active, leading to selective amplification of specific subfamilies within a given TE family (Slotkin and Martienssen, 2007; Slotkin, 2010). We therefore propose that inbreeding tends to keep TE diversity within a genome low. In contrast, outbreeding can lead to the recombination of genomes in which different subfamilies of TEs are active, thereby increasing intragenomic TE diversity.

Indeed, a very low level of BARE1 diversity was found in the two wild H. spontaneum accessions which were described to have a high level of inbreeding of approximately 98% (Caldwell et al., 2006). In contrast, BARE1 from the domesticated barley H. vulgare cv. Barke showed a nucleotide diversity approximately twice as high. We therefore speculate that the cultivated H. vulgare harbours genetic diversity obtained from several genotypes which were added during the domestication process, improvement and breeding. Cultivar Barke (two-rowed German spring barley) which was used for this study was released in 1996 and was formed by crossing two other intensively bred cultivars Libelle and Alexis. We propose that this mixing of different sources (and thereby also the BARE1 populations within the genome) resulted in the H. vulgare cv. Barke genotype that consists of recombined fragments from multiple genotypes – and thus contains a higher nucleotide diversity of the BARE1 elements.

A similar tendency was observed in the inbreeding wild einkorn T. boeoticum and its domesticated form T. monococcum which have much lower nucleotide diversity in Angela elements than the other Triticeae taxa. Here it is possible that both geographical isolation of T. boeoticum (Kilian et al., 2007a) as well as domestication contributed to the low Angela diversity. However, these cases are less clear. For example the inbreeding A-genome species T. urartu has an Angela diversity similar to that of outbreeding S. cereale. Thus further analysis including other TE families and a wider range of Triticeae taxa is necessary for a better understanding of the complex interplay between intragenomic TE diversity, domestication, geographical distribution and/or mating system.

Experimental Procedures

Plant materials and DNA extraction

The following Triticeae accessions were used: Ae. tauschii Coss. subsp. strangulata accession AE429 (from Iran), Ae. speltoides var. ligustica accession SPE0061 (from Turkey; single seed descended from AE 346-5-1), T. urartu accession EP0471 (from Lebanon; single seed descended from ID 388 studied in Kilian et al., 2007a), T. boeoticum accession 1628 (single seed descended from ID716 studied in Kilian et al., 2007a) from Turkey and T. monococcum accession 2240 (single seed descended from ID 492 in Kilian et al., 2007a) from Turkey. 454 sequences for two H. spontaneum genotypes (FT11 from Israel and FT462 from Turkey, both single seed descended) and two runs of 454 sequences of H. vulgare cv. Barke were included in the analysis. These were all single seed descended through three generations prior to DNA extraction. Secale cereale cv. Imperial seeds were also grown for the purpose of the analysis. Three to five grams of leaf material was harvested from each species for DNA extraction. The DNA extraction was conducted using a 1.3 × cetyl trimethylammonium bromide (CTAB) and dichloromethane: isoamylalcohol (24:1) method (http://www.protocol-online.org/cgi-bin/prot/). The DNA was further purified with a Qiagen DNeasy Kit, starting at step 13 and following the manufacturer's guidelines. Samples were then sent for 454 titanium (454 Life Sciences, http://www.454.com/) 7 kb paired end sequencing at the Functional Genomics Center Zurich and the IPK in Gatersleben.

Analysis of the 454 reads

linux systems (open source operating system) were utilised for the analysis of the datasets. The 454 reads were classified using the blast program (http://www.ncbi.nim.nih.gov/). For the identification of TEs we used the databases totalTREP10 (http://wheat.pw.usda.gov/ITMI/Repeats/) and PTREP11. Databases were created locally for the chloroplast, mitochondria, rDNAs, tRNAs and Brachypodium distachyon coding sequences. Custom Perl scripts were used to analyse each of the read sets, this created two files, one containing BLAST hits (defined as BLAST hits with E-values <10–6). The second file contained the ‘no hits’ reads, and this file was used for subsequent BLAST searches against the other databases.

Assessing nucleotide diversity in BARE1 elements

Consensus sequences of the first 300 bp of the LTR sequence taken for all taxa for each of the BARE1 element (H. spontaneum, H. vulgare and S. cereale) and the Angela element (S. cereale, T. urartu, Ae. speltoides, Ae. tauschii, T. boeoticum and T. monococcum). The program MEGA 5.0 was used with 1000 bootstrap replicates to draw a maximum-likelihood tree with the general time-reversible model of the DNA substitution rate (Tamura et al., 2011). The first 300 bp and a region between 600 and 900 bp of the BARE1 LTR consensus sequence were used as queries in BLASTN searches against the 454 reads. To avoid bias due to the different sample sizes, we used exactly 400 000 reads from each of the 454 datasets for the BLAST searches. For H. vulgare, and H. spontaneum (FT11 and FT462), all matches of 300 bp in size were used for CLUSTALW alignments. Nucleotide diversity was calculated using an original Perl script. DnasP (Librado and Rozas, 2009) was used to carry out Tajima's test for the statistical analysis to validate the results. For T. aestivum, T. urartu, Ae. speltoides, Ae. tauschii, T. boeoticum, T. monococcum and S. cereale we used a consensus sequence of Angela (the BARE1 homologue in wheat). Since Angela sequences are more diverse in wheat, we extracted all matches longer than 200 bp from the datasets.

All of the 454 datasets can be obtained from the authors by request.

Acknowledgements

This research was supported by COST action FA0604 and the Swiss office for Education and research (SBF) grant number 37150503. We would like to thank Frank Blattner, Fedor A. Konovalov and Andreas Graner for their valuable comments and suggestions on the manuscript, Susanne König for excellent technical support and 454 sequencing and the members at the functional genomics centre Zurich for 454 sequencing and the University of Liverpool Genome Centre and the University of Bristol for providing the T. aestivum 454 sequences.

Ancillary