The physical map of the hexaploid wheat chromosome 3B was screened using centromeric DNA probes. A 1.1-Mb region showing the highest number of positive bacterial artificial chromosome (BAC) clones was fully sequenced and annotated, revealing that 96% of the DNA consisted of transposable elements, mainly long terminal repeat (LTR) retrotransposons (88%). Estimation of the insertion time of the transposable elements revealed that CRW (also called Cereba) and Quinta are the youngest elements at the centromeres of common wheat (Triticum spp.) and its diploid ancestors, with Quinta being younger than CRW in both diploid and hexaploid wheats. Chromatin immunoprecipitation experiments revealed that both CRW and Quinta families are targeted by the centromere-specific histone H3 variant CENH3. Immuno colocalization of retroelements and CENH3 antibody indicated that a higher proportion of Quinta than CRWs was associated with CENH3, although CRWs were more abundant. Long arrays of satellite repeats were also identified in the wheat centromere regions, but they lost the ability to bind with CENH3. In addition to transposons, two functional genes and one pseudogene were identified. The gene density in the centromere appeared to be between three and four times lower than the average gene density of chromosome 3B. Comparisons with related grasses also indicated a loss of microcollinearity in this region. Finally, comparison of centromeric sequences of Aegilops tauschii (DD), Triticum boeoticum (AA) and hexaploid wheat revealed that the centromeres in both the polyploids and diploids are still undergoing dynamic changes, and that the new CRWs and Quintas may have undertaken the core role in kinetochore formation.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
Centromeres play a key role in the faithful transmission of chromosomes during mitosis and meiosis. Despite the lack of conserved DNA sequences (Henikoff et al., 2001; Henikoff and Malik, 2002; Jiang et al., 2003; Lamb and Birchler, 2003), centromeres from most multicellular eukaryotes share very similar structural features. Satellite DNA and centromeric retrotransposon (CR) families are the most common DNA elements in centromeres of human (Schueler et al., 2001), Arabidopsis (Copenhaver et al., 1999; Kumekawa et al., 2000), Oryza sativa (rice; Cheng et al., 2002; Nagaki et al., 2004; Wu et al., 2004; Zhang et al., 2004), and Zea mays (maize; Jin et al., 2004) and its relative Coix lacryma-jobi (Han et al., 2010). The satellite repeats are typically arranged in a tandem head-to-tail fashion. In plants, they are usually surrounded and interspersed by various repeats, primarily composed of long terminal repeat (LTR) retrotransposons (Presting et al., 1998; Cheng et al., 2002; Nagaki et al., 2004; Wu et al., 2004; Zhang et al., 2004; Nagaki and Murata, 2005). Although centromeric DNA sequences are highly variable, most centromeric proteins are conserved. CENH3, a variant of histone H3 specifically belonging to the nucleosomes of functional centromeres, plays a key role in kinetochore assembly (Henikoff et al., 2001), and has been used as a marker to identify functional areas that correlate with DNA sequences of centromeres (Wang et al., 2009; Han et al., 2010; Sanei et al., 2011). Meiotic recombination is almost completely suppressed at the centromeric and pericentromeric regions (Gore et al., 2009), but rearrangements occur via the insertion and deletion of retrotransposons (Henikoff et al., 2001; Hall et al., 2004; Liu et al., 2008). In contrast to model plant genomes, little is known of the dynamics of centromeric DNA in large and polyploid plant genomes.
Bread wheat (Triticum aestivum L.) is a recent hexaploid (2n = 6x = 42) containing the closely related A, B and D genomes (herein designated as subgenomes; Huang et al., 2002; Dvorak et al., 2004; Gill et al., 2004). Because of its polyploid composition, large numbers of repetitive sequences (85%) and a size of 17 Gb, only a few large contiguous sequences have been produced and analyzed to date (Choulet et al., 2010), and a few individual bacterial artificial chromosomes (BACs) related to centromeric regions have been studied. Liu et al. (2008) reported the sequences of two centromere-associated BAC clones from Triticum boeoticum (AA) (Chen et al., 2002). Fiber fluorescence in situ hybridization (FISH) and chromatin immunoprecipitation (ChIP) analysis showed that a Cereba-like element called centromeric retrotransposon in wheat (CRW) represents the main component of the centromeres, and is associated with centromere-related function. Recently, Paux et al. (2008) established the first version of an integrated physical map of the 1-Gb chromosome 3B of cultivar (cv.) Chinese Spring, thereby providing a major resource to perform structural and functional analyses of wheat centromeric regions.
In this paper, we describe the sequence composition of a major part of a B-genome centromere. A contig of 1.1 Mb has been identified after screening the minimal tiling path (MTP) of chromosome 3B with centromere-specific repeat probes. Twelve BACs were selected for FISH analysis, and eight of them, located at the end of the contig, were fully sequenced. The functional roles of some retrotransposon families in centromere formation were studied through interaction with the centromere-specific histone H3 variant, CENH3, of hexaploid wheat. In addition, sequencing and analyses of two Aegilops tauschii (DD)-derived BAC clones, and the re-annotation of two BAC sequences from T. boeoticum (AA) (Liu et al., 2008) isolated after screening with a cereal universal repetitive sequence RCS1 (Dong et al., 1998), and cereal centromeric sequence CCS1 (Aragón-Alcaide et al., 1996) allowed us to compare the composition, structure and dynamics of centromeric regions of diploid and hexaploid wheat genomes.
BAC clones spanning the wheat 3B centromere
Two probes corresponding to the cereal centromeric universal repeat CCS1 (Dong et al., 1998) and a 365-bp fragment of the LTR of the autonomous CRW (Liu et al., 2008) were hybridized to macroarrays spotted with DNA of the 7440 BAC clones comprising the MTP version 1 of the chromosome 3B physical map of Chinese Spring (Paux et al., 2008). Positive signals were observed for 122 BAC clones, with 60 showing positive signals with both probes and 62 with one probe only. A physical contig (ctg0355.1) encompassing a region of 1.1 Mb, and for which the MTP comprised 12 clones (Figure 1a), exhibited the highest density of positive clones, and was chosen for FISH experiments and/or sequencing. Twelve of the MTP BAC clones were used as probes for FISH experiments. Signal characters of eight clones on Chinese Spring chromosomes are shown in Figure 2. These results revealed strong variations in signal intensity and dispersion, showing that the proportion of centromere-specific sequences is variable between neighboring BACs and along the contig. In addition to the centromere-specific sequences, some of the BACs carried sequences that were repeated and dispersed through the whole genome, leading to non-specific hybridization patterns with all chromosomes. For example, when using TaaCsp3BF124N06 as a probe, the FISH signal was stronger in the centromere region, but was also dispersed through both arms of all B-related chromosomes (Figure 2d).
Based on the FISH data, eight BAC clones from ctg0355.1 with FISH signals more concentrated at the centromeres were selected for complete sequencing (Figure 1). Each BAC sequence was assembled individually into a few scaffolds (up to five), accounting for a cumulative size of 1140 kb (Table 1). Sequencing revealed that TaaCsp3BF078D03 was included in TaaCsp3BF088N13, and it was therefore discarded from further analysis.
Table 1. Estimated insert sizes using Pulsed Field Gel Eletrophoresis (PFGE) and assembly data of the sequenced BAC clones
PFGE estimated size (kb)
Scaffold sequence size (kb)
The 3B centromere was formed by a massive invasion of two TE families
Sequence annotation revealed that 96% of the 1.14-Mb region consisted of transposable elements (Table 2), LTR retrotransposons and truncated derivatives representing 88% of the sequence, whereas CACTA and other DNA transposons accounted for 5.5% of the sequence (Figure 3; Table 2). This composition was quite different from that previously reported for 18 Mb of chromosome 3B (Choulet et al., 2010), and revealed an enrichment in centromere-specific elements. Two Gypsy retrotransposons in particular, CRW [also called Cereba in Hordeum vulgare (barley); Presting et al., 1998;] and Quinta, which were absent in non-centromeric regions, were highly represented (Choulet et al., 2010). CRW was the most abundant family, accounting for 18% of the region. In contrast, Quinta was less represented and accounted for 4%. A gradient of CRW density was observed along the 1.14 Mb, from BAC TaaCsp3BF153P02 to TaaCsp3BF147D05, which are composed of 43 and 5% CRWs, respectively (Figures 1 and 3). This decrease in CRW density was correlated with an increased diversity of TE families. Indeed, whereas TaaCsp3BF153P02 contained only three types of LTR-retrotransposons (CRW, Fatima and Daniela), 20 different families were represented in the most distal BAC, TaaCsp3BF147D05. This composition suggests that TaaCsp3BF153P02 is proximal to the 3B centromere, whereas TaaCsp3BF147D05 points to the pericentromeric region, as previously suggested (Table 3; Liu et al., 2008). These data also confirmed that centromeric sequences are not specific to kinetochore-binding domains, but also tend to extend into the pericentromeric regions (Kanizay and Dawe, 2009). This is supported by sequential detection of CENH3 and CRW/Quinta on chromatin fibers (see below).
Table 2. Transposable element (TE) composition of the 3B ctg0355.1 contig, two BAC re-annotated sequences of Triticum boeoticum, and two BAC sequences of Aegilops tauschii
Triticum aestivum ctg0355.1 (3B)
Triticum boeoticum (A)
Aegilops tauschii (D)
% of sequence
% of sequence
% of sequence
1 091 954
1 140 450
Table 3. Proportions (%) of main retrotransposons in the eight sequenced bacterial artificial chromosomes (BACs)
Transposable elements (%)
To compare the composition of the 3B centromeric contig with centromeric regions of homoeologous wheat chromosomes, a BAC library of A. tauschii was constructed and screened with the CCS1 and p-365 probes (see Experimental procedures). A BAC clone (5M14) was sequenced and analyzed revealing the presence of another retroelement, Weg, that represented 32.8% of the sequences (Table 2). Weg was also detected on the 3B centromeric sequences analyzed here, and in the BAC clone TaaCsp3BF100L17 annotated previously (Choulet et al., 2010), but represented only 2% of the sequences (Table 2). FISH analysis indicated that Weg is mainly concentrated in the pericentromeric regions of the D and A subgenomes of Chinese Spring, but is more scattered and has a fainter intensity than the signals of either CRW or Quinta (Figure 4). Its putative role in wheat centromeric function was later verified by the CENH3 antibody ChIP. Finally, although not specific to the centromeres, the Fatima retrotransposon family was the second most abundant family in the region (17% of the sequence; Table 3), and was found in similar proportions to non-centromeric regions (Choulet et al., 2010) (Table 4).
Table 4. Physical mapping of syntenic genes (expressed sequence tags, ESTs) of rice chromosome 1 and wheat chromosome 3B centromeric regions
The 3B centromeric region is enriched in relatively young and specific TEs
A total of 298 retrotransposons and truncated or partially sequenced derivatives were found within the 1.14-Mb region. Among them, 24 had two intact LTRs that were aligned for the estimation of insertion dates based on nucleotide substitution rates. Considering all families, the average insertion time was about 1.4 Ma (Table 5), similar to the peak observed for the entire chromosome 3B (Choulet et al., 2010). The youngest element was a Quinta inserted 0.1 Ma, whereas the oldest TEs in the region were an unnamed fam78 and a Sumana inserted at 4.3 and 3.4 Ma, respectively. The three intact copies of CRWs were estimated to have been inserted 0.6, 0.8 and 1.3 Ma, respectively. Although CRWs were apparently inserted more recently than most of the other families in this region, the other centromere-specific retrotransposon family, Quinta, probably appeared earlier, with three of the four youngest elements found in the 1.14 Mb showing insertion times of less than 0.4 Ma. This suggests that the Quinta elements were transposed after tetraploidization, which was estimated to have occurred about 0.5 Ma (Figure 5; Huang et al., 2002; Dubcovsky and Dvorak, 2007). Thus, these results indicate that the 3B centromeric region has been shaped by massive invasions of mostly two TE families (CRW and Quinta) that were inserted into a background of non-centromere-specific families, such as Fatima.
Table 5. Intact retrotransposons and estimated insertion times
Transposable element name
Genome A (Triticum boeoticum)
Genome B (Chinese Spring ctg0355a)
Genome D (Aegilops tauschii)
CRW, Quinta and Weg are likely to be involved in the active centromere
To determine whether the identified retroelements are part of the active centromere, we conducted ChIP experiments with a CENH3-specific antibody. First, we identified genes in the wheat genome encoding CENH3 and generated antibodies specifically targeting CENH3s of wheat, barley and Thinopyrum ponticum (Podp.) (tall wheatgrass; Liu & Wang; see Experimental procedures). Western blotting and immunostaining indicated that the antibody specifically binds the centromeres of wheat (Figures S1 and S2). To identify the type of sequence putatively involved in the function of wheat centromeres, we conducted immunostaining (Figure 6a,d,g) and sequential FISH experiments (Figure 6b,e,h) on the same interphase cells of wheat cv. Chinese Spring. The results showed co-localization of wheat CENH3 with the centromeric CRW and Quinta transposable elements (TEs), and to a smaller extend with Weg. Both CRW and Quinta co-localized with TaCENH3-specific signals (Figure 6c,f), whereas the distribution of Weg was more dispersed (Figure 6i) in the interphase nucleus. The CRW signals were usually larger than those of TaCENH3 antibodies, suggesting that not all of the CRWs were present in the kinetochore. This was further proven by extended chromatin fiber FISH analysis of CRW, Quinta and CENH3 antibody (Figure 7). Only a portion of the CRWs were co-localized with CENH3; other regions rich in CRWs lacked CENH3 (Figure 7b,c,d). Compared with CRW, a high proportion of Quinta elements co-localized with TaCENH3 (Figure 8). Thus, these results suggest that both CRW and Quinta are likely to be involved in centromeric function, with a more specific role for Quinta.
To further confirm these findings, we performed ChIP experiments with anti-TaCENH3. The ChIP-PCR experiments were repeated twice using ribosomal RNA genes (5.8S rDNA) and the Erika element as extra-centromeric controls (Liu et al., 2008). Two different regions of the CRW, Quinta, Weg and Fatima elements, and one region each of 5.8S rDNA and Erika were used to design primers for real-time PCR analysis (Table S3). On average, the relative fold enrichments (RFEs) were 3.13 ± 0.03, 2.76 ± 0.04, 7.61 ± 0.98, 5.54 ± 1.09, 3.71 ± 0.82 and 2.99 ± 0.68 for CRW-5′UTR, CRW-365-2, Quinta-LTR, Quinta-gag, Weg-LTR and Weg-gag, respectively. Each differed significantly from the control 5.8S (1.00 ± 0.08), indicating that CRW, Quinta and Weg were immunoprecipitated at significantly higher levels than the negative control. Fatima was not associated with wheat CENH3 protein, irrespective of the probe used (RFE = 1.19 ± 0.22 and 0.91 ± 0.25). However, Fatima and Erika were amplified at the same level as 5.8S (Figure 9). Again, Quinta enrichment was much higher than CRW, supporting the immunochromatin binding results (Figures 6, 7 and 8).
In contrast to the other grass species, we did not detect dominant satellite tandem repeats in the sequences of the 1.14-Mb centromeric contig of chromosome 3B. To check for the presence and localization of centromeric satellites in wheat, we designed primers based on a set of centromeric satellite tandem repeats of the Thinopyrum genus, the most closely related perennial grass species to wheat. We successfully amplified a centromeric satellite-like sequence with a ~550-bp repeated motif from cv. Chinese Spring and its diploid ancestors. This was physically mapped to centromeric regions of common wheat by FISH, with fainter signals compared with those of CRW or Quinta (Figure S4). After searching in the 5x Chinese Spring Genome Sequence Database (http://www.cerealsdb.uk.net/CerealsDB/Documents/DOC_search_reads.php), several hit contigs were detected. One hit contig with the largest scaffold was contig 310431, with a length of 1665 bp. Dot-matrix analysis showed the presence of three satellite repeats in the contig (Figure S5). The first one is from 13 to 514 bp, the second is from 564 to 1052 bp and the third is from 1101 to 1603 bp (Figure S6). ChIP analysis was also performed, but the results did not support the hypothesis that this tandem repeat sequence is associated with CENH3 in wheat. Its RFEs were 2.27 ± 0.41 and 1.35 ± 0.09, almost the same level as the negative control 5.8S (1.90 ± 0.17 and 2.74 ± 0.03), and significantly lower than that of CRW (4.08 ± 0.12 and 5.81 ± 0.29) in two independent experiments of ChIP PCR amplification. Finally, Southern hybridization indicated that polyploid wheat inherited such a tandem repeat sequence from its diploid ancestors (Figure S7). Thus, we propose that satellite tandem repeats were originally present in the wheat centromere core regions, but that they were likely to have been pushed away by the insertion of Quinta and CRW TEs, and are no longer associated with CENH3.
Gene content and synteny of the centromeric regions in related grass genomes
Three genes were detected in the 1.14-Mb centromeric sequence (Figure 1b). Two genes, gene01 (carried by TaaCsp3BFH111F24) and gene03 (carried by TaaCsp3BFH033D06), are orthologous and syntenic with rice chromosome 1 and Brachypodium chromosome 2, and share similarities with Os01g37800 and Bradi2g41180 (putative ras-related protein, with 94 and 97% amino acid identities, respectively), and with Os01g36860 and Bradi2g40760 (putative DEAD box ATP dependent RNA helicase, with 86 and 93% amino acid identities, respectively). An additional non-syntenic gene partially sequenced in BAC TaaCsp3BF111F24 shares 96% amino acid identity with Os03 g46650 and Bradi1 g12820 (function unknown) (Table 4). With one gene every 380 kb on average, the centromeric gene density appeared to be between three and four times lower than the average estimate of 1/104 kb for the entire 3B chromosome (Choulet et al., 2010). Expressed sequence tag (EST) matches were found for all the putative genes, thereby suggesting that wheat centromeres contain genes that may be functional and expressed. The two genes that were identified as syntenic with rice and Brachypodium were carried by two non-overlapping BACs (TaaCsp3BF111F24 and TaaCsp3BF033D06; Figure 1), separated by a sequencing gap. The orthologous genes in rice (Os01g36860/37800) and Brachypodium (Bradi2g40760/41180) are in the same chromosomal regions, but are not strictly neighboring genes. They are separated by 52 and 41 protein-coding genes in the two species, respectively. Among these genes, 25 are collinear between the two genomes, and were thus expected to be present at the orthologous locus on wheat chromosome 3B. This might suggest a chimeric assembly of the BAC contig or rearrangements between wheat and the other grass species in this region.
To test this hypothesis, we searched for similarity between the 25 rice/Brachypodium orthologous gene pairs against the full wheat EST collection. Matching ESTs were used to design sequence-tagged-site (STS) markers. STS markers were successfully designed for 17 of the targeted genes, and were used to PCR screen the 9216 clones comprising the second version of the Minimal Tiling Path of the 3B physical map (Rustenholz et al., 2011) (see Experimental procedures; Table S1). BAC contigs were identified for 10 of the 17 markers. These 10 EST-derived markers were assigned to 10 different BAC contigs, some of which were assigned to non-pericentromeric regions (see Table S2). These results suggest that the original contig ctg0355.1 is not chimeric, and indicate a disruption of collinearity. Interestingly, the orthologous region from rice chromosome 1 does not correspond precisely to its centromere. It is located on the long arm ~4 Mb (10% of rice chromosome 1) distal to the centromere, suggesting a shift of the centromere position since the divergence of wheat and rice. This confirms previous hypotheses established by comparing the locations of mapped wheat ESTs with the sequence of rice chromosome 1 (Qi et al., 2004).
Comparisons of sequence composition among the centromeres of the A, B and D genomes
Two BACs (DQ904440 and EF624064) from T. boeoticum (wheat A genome-related diploid), previously sequenced and analyzed by Liu et al. (2008) were re-annotated following the same procedure as for contig ctg0355.1. In addition, two BAC clones that were identified by screening a BAC library of A. tauschii (wheat D genome-related diploid) with the same centromeric probes were completely sequenced. The A and D centromeric BAC sequences were enriched in LTR retrotransposons: 95 and 89% of the BAC sequences were derived from LTR retroelements, respectively (Table 2). Again, CRW was the most abundant element, accounting for 24 and 45% of the A and D genome-derived BACs, respectively. Quinta was also a major family in the A and D centromeres, accounting for 17 and 7% of the A and D BACs, respectively. LTR sequence substitution studies indicated that the average insertion dates of the five and four intact CRWs and Quintas annotated on the two BACs from T. boeoticum were 0.6 and 0.4 Ma, whereas the three and four intact CRWs and Quintas annotated on chromosome 3B were 0.9 and 0.7 Ma, respectively (Figure 5; Table 5). This reveals that Quinta is younger than CRW. Only two intact elements were found in the Ae. tauschii BAC sequence: a CRW inserted at 1.6 Ma and a recently inserted Quinta (0.1 Ma). This number was too small to compare the TE dynamics with the A and B genomes, but the results indicate that in both T. boeoticum and Ae. tauschi, Quinta is the youngest contributor to shaping the centromeres. With four or five CRWs inserted less than 0.5 Ma, the CRWs identified in the T. boeoticum centromere were younger than those in chromosome 3B of hexaploid wheat, i.e. they were probably inserted in the T. boeoticum lineage after the tetraploidization event. This showed that polyploidization is not the only factor affecting the dynamics of centromere-specific retrotransposons, but rather that retrotransposons have been highly active at the centromeres, independently of polyploidization events (Figure 5; Table 2).
Cereal centromere retrotransposons and their phylogenetic relationships
A phylogenetic tree was established by aligning amino acid sequences of the integrase core domains of known centromeric repeats of rice (CRRs), maize (CRMs) and barley (Cerebas) with the wheat centromeric sequences Weg and autonomous CRWs (Figure 10). The CR element of Brassica rapa, CRB1 (CRB-1-AC166739), was used as an out-group. Amino acid sequence alignments confirmed that autonomous CRWs and Cerebas belong to a closely related family of TEs, which includes the rice CRRs and maize CRM segments. The integrase sequence divergence was less than 10%, with a maximum of 8.6% between CRM2 and Cereba2. The Weg1 and Weg2 sequences showed more divergence (12%) from the other CR elements (Figure 10). Homology searches in the Triticeae Repeat Sequence Database (http://wheat.pw.usda.gov/ITMI/Repeats) revealed that Quinta, a non-autonomous CR element, is relatively specific to wheat and its diploid relatives. Southern hybridization also revealed that Quinta contents are much lower than autonomous CRW in the diploid ancestors of wheat (Figure S3B). Moreover, very low Quinta levels were detected in T. boeoticum (Liu et al., 2008). Interestingly, CRM1 is separated from CRM2 in the phylogenetic tree. It is known that CRM1 was replaced by CRM2 as the major centromeric transposable element binding to CENH3 on maize chromosomes 2 and 5 (Wolfgruber et al., 2009; Shi et al., 2010). Thus, similarly in wheat, Weg may be the ancestral functional centromere sequence, and its replacement by CRW and Quinta is ongoing.
DNA components of wheat centromeric regions are relatively unique among the grass family
Eukaryotic centromeres are composed of large blocks of highly repetitive DNA that are not easy to sequence (Henikoff et al., 2001; Henikoff and Malik, 2002; Jiang et al., 2003; Lamb and Birchler, 2003; Tek et al., 2010). In this study, we sequenced a 1.14-Mb BAC contig physically located in the centromeric region of chromosome 3B from the complex polyploid common wheat genome. Our results provide evidence for a putative functional role of centromere-specific retrotransposons. Transposable elements have been identified at the centromere in many grass species. The centromeres usually contain genus-specific regions with highly repetitive short tandem repeats, each about 180 bp in length, interspersed with long stretches of moderately repeated retrotransposons that are shared among the grass family (Kurata et al., 2002; Jiang et al., 2003). In barley, a close relative of wheat, the centromeric regions are shaped by both Cereba (CRB) and satellite tandem repeats (AGGGAG)n that act as functional centromeric sequences (Presting et al., 1998; Hudakova et al., 2001). Here, we show that satellite tandem repeats are still present in the centromeric region of wheat and its diploid relatives, but that they are no longer associated with CENH3. Instead, we found evidence that they might have been replaced by the CRW and Quinta elements at the functional centromere. Whereas CRW was already known to be a transposable element specifically found in centromeric regions, our results show that the Quinta family is another important retrotransposon in wheat centromere function. Sequence annotation, FISH and Southern analyses indicated that both CRW and Quinta elements are conserved in the three genome donor species of common wheat, thereby possibly favoring the genomic stabilization and fertility of the newly formed polyploid wheat (Liu et al., 2008).
Centromere differentiation of the wheat subgenomes
A key question is what drives the differentiation of subgenomes at the pericentromeric regions in polyploids. In genus Coix, Han et al. (2010) found a 153-bp satellite repeat discriminating the two subgenomes of Coix aquatica at the centromeric regions. CRB is a major component of all centromeres in three diploid Brassica species and their allotetraploid relatives. However, CentBr (176 bp) found in B. rapa (AA), was not detected in Brassica nigra (BB), the most distantly related diploid species analyzed, but it provided strong subgenomic specificity in Brassica juncea (AABB) and Brassica carinata (BBCC) (Lim et al., 2007). In the present study, we found that Weg, involved in centromere function, is more abundant in D genome pericentromeric regions than in the pericentromeric regions of the A and B genomes. In a previous study, we showed that another retroelement, Wgel, was more abundant in pericentromeric regions of A-genome chromosomes than in those of the B and D genomes (Liu et al., 2008). This suggests that in hexaploid wheat the subgenomes have their own specific pericentromeric elements, which is in agreement with the view that centromeres and pericentromeres are critical areas for the differentiation of subgenomes in polyploids (Zhang et al., 1996; Jiang et al., 2003; Dawe, 2005; Lee et al., 2005). FISH experiments clearly showed that the distribution of the Weg element is much more dispersed than either CRW or Quinta at the interphase. As Weg is still partially involved in CenH3 binding, we suggest that it is an ancestral centromeric functional sequence, and its replacement by the newer elements CRW and Quinta is ongoing (Topp and Dawe, 2006).
Polyploidization caused a CRW and Quinta burst in the centromeric regions, but they are still in a dynamic state in the diploid species
Divergence between the A, B and D genomes (Triticum from Aegilops) was estimated to have occurred around 2.5–4.5 Ma, whereas tetraploidization occurred about 0.5 Ma (Huang et al., 2002; Dvorak et al., 2004; Gill et al., 2004). According to the estimated insertion times, it appears that in hexaploid wheat as well as in the diploid A and D species, centromere-specific retrotransposons are younger (less than 1 Ma) than those in non-centromeric regions (Table 5; Choulet et al., 2010). Based on the sequences of 3B fragments, Charles et al. (2008) and Choulet et al. (2010) supposed that most TEs were inactive for 1 Ma in wheat: that is, their amplification and transposition were basically completed before 1 Ma. However, for centromeric-related TEs, such as CRW (Cereba) and Quinta, their rate of turnover appears much faster at the centromeres than in other regions, irrespective of the level of polyploidy. For Quinta, two burst times were detected on chromosome 3B: the older one about 1.5 Ma and the recent one less than 0.5 Ma. For CRW only one burst time, less than 0.5 Ma, was detected on chromosome 3B (Choulet et al., 2010). The burst times of both CRW and Quinta coincides with the origin of tetraploid wheat (Huang et al., 2002; Dvorak et al., 2004; Gill et al., 2004). Southern hybridization of CRW, Quinta, Wgel, Erika, Sukkula and Daniela in diploid and polyploid wheats strongly supported this hypothesis. The apparent bursts of CRW and Quinta (Liu et al., 2008; Figure S3a,b) might have been related to homogenization of centromere sequences on non-homologous chromosomes among the subgenomes in polyploid wheat (Birchler and Presting, 2012; Tsukahara et al., 2012). At the same time, the other four non-centromere-specific retroelements were highly conserved during polyplodization (Liu et al., 2008; Figure S3c). However, CRW and Quinta remain active in all Triticum species and Aegilops diploid relatives, as revealed by the estimated insertion times. The youngest CRW and Quinta elements were dated at about 0.1 Ma, much later than tetraploidization (Figure 7; Table 5; Charles et al., 2008), and among the few retroelements appearing in the wheat genome in the past 0.5 Myr (Liu et al., 2008; Choulet et al., 2010).
New centromeric retrotransposons take a role in centromere function
Although several papers report tandem repeats associated with wheat centromeres, almost all of them were mapped in an autonomous CRW (Liu et al., 2008). This could reflect that there are many more CRW than Quinta elements in wheat centromeric regions. This estimation is supported in this study by the sequencing of BAC clones from chromosome 3B, and by the results of Southern hybridization; however, ChIP results showed that more Quinta elements were precipitated by the TaCENH3 antibody. An alternative explanation is that more Histone 3 was replaced by CENH3 in chromatin regions rich in Quinta elements, as indicated by immunostaining and sequential chromatin fiber FISH. The estimated insertion time revealed that Quinta and CRW are the most recent retrotransposons found in wheat centromeres, and Quinta is more recent than CRW. ChIP experiments indicated that Weg was also partially involved in the centromere function, but it is much older than both Quinta and CRW. The dispersed distribution on metaphase chromosomes also indicated it no longer controls the basic functions of centromeres. This hypothesis is supported by the phylogenetic analysis based on the integrase core domains of grass centromere retrotransposons. Weg elements clustered into the CR element clade, but at a longer distance from other CRs. Based on the estimated insertion dates, ChIP experiments, phylogenetic tree, and sequential detection of CENH3 and retroelements on chromatin fibers, we conclude that new centromeric retrotransposons, especially Quinta, have taken dominant roles in wheat centromere function.
BAC library screening and marker design
The 7440 BAC clones comprising the first version of the minimal tiling path of the 3B physical map (Paux et al., 2008) were hybridized with probes derived from the cereal centromeric universal repetitive sequence (CCS1) (Aragón-Alcaide et al., 1996; Dong et al., 1998) and a CRW LTR sequence (p-365) (Liu et al., 2008), as described in Choulet et al. (2010). Twelve BAC clones from the minimal tiling path of ctg0355.1 were identified, and eight of them [in italics, followed by their European Molecular Biology Laboratory (EMBL) accession numbers] were fully sequenced: TaaCsp3BF153P02 (accession number HF541876), TaaCsp3BF078D03, TaaCsp3BF088N13 (HF541872), TaaCsp3BF107E11, TaaCsp3BF124N06 (HF541874), TaaCsp3BF111F24 (HF541873), TaaCsp3BF173L21, TaaCsp3BF068N07, TaaCsp3BF033D06 (HF541870), TaaCsp3BF037C18 (HF541871), TaaCsp3BF017E12 and TaaCsp3BF147D05 (HF541875) (http://urgi.versailles.inra.fr/cgi-bin/gbrowse/wheat_FPC_pub).
To further check the reliability of the BAC contig assembly, the second version of the minimal tiling path of the 3B physical map (Rustenholz et al., 2011) was screened by PCR using markers derived from genes that were suspected to be located at the 3B centromere, based on collinearity with the rice and Brachypodium genomes. Genes annotated between Os01g36860 and Os01g37800 on rice chromosome 1 were aligned by BLAST (Altschul et al., 1997) to the corresponding syntenic region in Brachypodium to identify conserved genes. Similarly, wheat ESTs were searched by BLASTN using a threshold of 70% identity over 70% of the length of the EST. Clustering of overlapping ESTs was performed using phrap (http://www.phrap.org), and the resulting contig was mapped to the corresponding rice genomic region using gmap (Wu and Watanabe, 2005). Finally, primer pairs were designed with primer 3 (Rozen and Skaletsky, 2000), so that the PCR amplicon encompassed a single exon. Primer pairs of these markers are provided in Table S1.
BAC sequencing and sequence assembly
We carried out standard high-throughput sequencing following the protocol described in Kong et al. (2004). Briefly, we constructed shotgun libraries for eight BAC clones, and the libraries were randomly sequenced using dye terminator chemistry on ABI 3730 sequencers (Applied Biosystems, http://www.appliedbiosystems.com). Inserts were sequenced from both directions with T7 and T3 primers via BigDye v3.1 terminator chemistry (Applied Biosystems). Gaps were filled by a primer-walking method. For gaps with a high G/C content, dGTP BigDye terminator chemistry (Applied Biosystems) was used in the sequencing reaction. The coverage for each BAC was about five times. A BAC library of Ae. tauschii (DD) was also constructed and screened by CCS1 and p-365. A BAC clone, 5M14, associated with the centromere was sequenced, assembled and annotated.
The Lasergene SeqMan II module (DNAStar, http://www.dnastar.com) was used to assemble the sequence data into contigs. For annotation of protein coding genes, CDS predictions were combined with similarity searches using NCBI-BLAST (Altschul et al., 1997) against full-length cDNAs (FL-cDNAs), unigenes and ESTs from wheat, Triticeae and other Poaceae species, and specifically against the SwissProt, rice and Brachypodium distachyon proteomes. Matching transcripts were mapped on genomic DNA using Gmap (Wu and Watanabe, 2005). For TEs, RepeatMasker (http://www.repeatmasker.org) was used to find similarities against the TREP databank (http://wheat.pw.usda.gov/ITMI/Repeats), and the dotter program was used for precise annotation of the exact borders of each TE by identifying LTRs or terminal inverted repeats and target site duplication. Reconstruction of the nested structures of TEs was manually curated under Artemis (Rutherford et al., 2000). Classification and naming of retrotransposons followed the procedure described in Wicker et al. (2007).
Fluorescence in situ hybridization of BACs
Young root tips were collected from T. aestivum (cv. Chinese Spring) seedlings. BAC clones associated with the 3B centromere were labeled by digoxigenin-dUTP or biotin-dUTP via nick translation. FISH analysis was performed as described by Liu et al. (2008) using probes prepared by PCR amplification with the following primer pairs: CRW2-LG-3L, 5′-ACCAATTACTAGAGCTCGCGCA-3′; CRW2-LG-3R, 5′-AGCAGGAGCCACAGAAGTAGCA-3′;Quinta-LTR-L3, 5′-ACTTTGACGATCCGACTACAAAC-3′; Quinta-R5, 5′-CCTGCTGCATGGTAAGAACTTG-3′; Weg 1-LTR-1L, 5′-TTGCACGATTGTAGGCGTACTC-3′; Weg 1-LTR-1R, 5′-GCACCTACGACTCCATGAGTGA-3′.
Hybridized probes were detected by anti-digoxigenin antibody coupled with Rhodamine (Roche) or avidin-conjugated with FITC (Vector Laboratories, http://www.vectorlabs.com), respectively. Chromosomes were counterstained by 4′,6-diamidino-2-phenylindole (DAPI) in a Vecta Shield antifade solution (Vector Laboratories). FISH images were captured by a CCD camera (AxioCam HRM; Zeiss, http://corporate.zeiss.com).
Preparation of CENH3 antibodies, western blot and immunostaining
Wheat CENH3-coding genes were identified based on homology with rice and maize. Three highly similar genes (amino acid identities higher than 96%) in Chinese Spring wheat were mapped to the 1A, 1B and 1D chromosomes through PCR amplification in nullisomic-tetrasomics lines (accession numbers JF969285, JF969286 and JF969287). Sequence conservation with rice, maize, sorghum, sugarcane and Brachypodium homologs was determined by aligning the protein sequences. A peptide antigen, ‘CARTKHPAVRKTK’, was synthesized and used to immunize rabbits at CWBio (http://www.cwbiotech.com). The specificity of the antibody was checked by western blotting and immunostaning of root tip cells of grasses.
Immunostaining was performed as previously described (Zhong et al., 2002; Jin et al., 2004), with minor modifications. Root tips were fixed for 20 mins in 4% paraformaldehyde in fresh full-strength PBS (0.13 m NaCl, 0.007 m Na2HPO4, 0.003 m NaH2PO4). After washing with full-strength PBS containing 0.2% Triton X-100, the root tips were directly squashed under cover slips on slides. The cover slips were removed after freezing in liquid nitrogen for 15 s. Approximately 100 μl of rabbit anti-wheat CENH3 antibody [diluted 1:200 in TNB buffer (0.1 m Tris-HCl, pH 7.5, 0.15 m NaCl, and 0.5% blocking reagent; Sigma-Aldrich, http://www.sigmaaldrich.com] was added to each slide. After incubation in a humid chamber at 37°C for 3 h, the slides were washed three times in full-strength PBS before adding 100 μl of FITC anti-rabbit secondary antibody (Jackson Immuno Research, http://www.jacksonimmuno.com; 1:200 in TNB buffer). The incubation for 1 h and washes were the same as described for the primary antibody. After fixation in 4% paraformaldehyde for 20 min, the slides were dried sequentially in 70, 95 and 100% ethanol, each for 5 min at 20°C, prior to being used for FISH. Immunostaining images and FISH signals were captured separately by CCD, and were then merged. Strong immunostaining signals were obtained in wheat, barley and tall wheat grass (Thinopyrum ponticum), but not in rice and maize.
Sequential detection of CENH3 and CRW/Quinta on chromatin fibers
For the sequential detection of CENH3 and CRW/Quinta repeat sequences on chromatin fibers, the stretched chromatin fibers were prepared according to Han et al. (2010), with minor modifications. Unfixed fresh root tips were intensively squashed on a slide coated with poly-Lys (Sigma-Aldrich) and covered with a coverslip. After soaking in liquid nitrogen and removing the coverslip, the slides were fixed in 4% paraformaldehyde in fresh full-strength PBS for 15 min before immunostaining. The following procedures were performed as described above.
Immunoprecipitation of chromatin
The ChIP was performed essentially as described by Nagaki et al. (2004) using anti-wheat CENH3 antibody. ChIP experiments were conducted in duplicate. Mock experiments using pre-immunized rabbit serum served as a non-specific binding control for each ChIP experiment. Quantitative real-time PCR was used to determine the RFE of CENH3-associated sequences in the bound fraction over the mock control. PCRs were performed in triplicate according to the method described by Yan et al. (2005). The cycle threshold (CT) was taken with the baseline of fluorescence intensity manually set at a value between 0.01 and 0.05. For each primer pair, we calculated the RFE of the amplified product using the comparative CT method (Saffery et al., 2003). The negative control was 5.8S ribosomal DNA (located far from the centromere), with the primers: 5.8S-F, 5′-TCGGCAACGGATATCTCGGCTCTC-3′ (forward), and 5.8S-R, 5′-CCCAGGCAGGCGTGCCCTC-3′ (reverse). Primers for different fragments in other regions of interest are listed in Table S3. RFE was then calculated as , where ΔCT = CT (bound DNA)−CT (mock DNA) (Yan et al., 2005). The RFE value of each sequence was normalized using the 5.8S rDNA as a reference, with the control RFE value set at 1. To determine the significance of the difference of RFE between the target sequences and 5.8S rDNA, we performed one-tailed Student t-tests of at the level of α = 0.05.
The DNA of each species was digested completely with the enzymes HindIII, BamHI and DraI (New England Biolabs, http://www.neb.com). Five, 10 and 15 μg of DNA were used for diploid, tetraploid, and hexaploid species, respectively. The digested DNAs were separated in 0.8% agarose gels and transferred to Hybond-N+ nylon membranes (Amersham Bioscience, now GE Healthcare, http://www.gelifesciences.com) by the alkaline transfer method. Southern hybridization was performed using the procedures of Liu et al. (2008). Stringent wash conditions (0.2 × SSC, 72°C) were used in the Southern hybridization.
LTR retrotransposon insertion time
The insertion dates of LTR retrotransposons were estimated by aligning both 5′ and 3′ LTRs using ClustalW2 (Larkin et al., 2007) with a mutation rate of 1.3 × 10−8 substitutions per site per year (SanMiguel et al., 1998; Ma et al., 2004). We measured the insertion time (T) of the intact LTR retrotransposon using T = N/(2*L*Knus), where Knus is the substitution rate, N is the number of substitutions observed and L is the length of the LTR.
Identification of centromeric satellite repeats in wheat and its diploid ancestors
To check for the presence and localization of centromeric satellite repeats in wheat, we amplified genomic DNA of wheat and its diploid ancestors Triticum urartu and Ae. tauschii genomic DNA using primers (F, 5′-AAACGGAAGGGTCGTAGAGG-3′,R, 5′-CGTCCGTGCGATGAAGTTAC-3′), which were designed based on centromeric satellite repeats of Thinopyrum ponticum, the closest perennial relative species of wheat. The PCR-amplified product was sequenced and physically mapped to wheat chromosomes by FISH, in which CRW was employed as a control. Its association with CENH3 was tested by ChIP-PCR using primers: F, 5′-AGTTTGTAATGGAAGGATGG-3′, and R, 5′-AGACGATTTTATGCTTGAGG-3′. Sequence conservation and content changes among wheat and its close relatives were analyzed by Southern hybridization. To further confirm the existence of centromeric satellite repeats in wheat, we also searched the 5X Chinese Spring Genome Sequence database (http://www.cerealsdb.uk.net/CerealsDB/Documents/DOC_search_reads.php) for homologous sequences. Dot-Matrix in dnaman 4.0 (http://dnaman-software-download.fyxm.net) was used to reveal the repeated feature of contigs identified.
Construction of a phylogenetic tree among retrotransposons associated with grass centromeres
To study the evolutionary relationships between centromeric retrotransposons (CRs) of wheat, barley, rice and maize, CR elements of barley [Cereba1 (Cereba-AY040832-1) and Cereba2 (Cereba-AY040832-2)], rice [CRR1, CRR-AC022352 and CRR2 (DQ458289)] and maize [CRM1 (CRM-AC116034) and CRM2 (AY129008)] were compared with autonomous CRWs, i.e. CRW1 (EF624064-2), and CRW2 (DQ904440-2), as well as Weg (two detected in Weg-5M14) by MEGA5 (Tamura et al., 2011). A neighbor-joining tree was constructed based on their amino acid sequences of the integrase core domains (Neumann et al., 2011). The CR element of Brassica rapa, CRB1 (CRB-1-AC166739), was used as an out-group (Lim et al., 2007). The integrase is one of the most conserved core domains in the polygene region of retroelements associated with grass centromeres (Liu et al., 2009). As all Quinta elements detected in our study are non-autonomous, there is no intact integrase core domain, and they were not included in the phylogenetic trees.
This research was supported by a CAAS-INRA wheat joint research project, Chinese Ministry of Science and Technology (2010CB125900), the Natural Science Foundation of China (31025018, 30960172), the Agence Nationale de la Recherche grant ANR-05-BLANC-0258-01 and the European Community's Seventh Framework Program (FP7/2007-2013) under the grant agreement FP7-212019. We would like to thank D Boyer for excellent technical assistance in the screening of the 3B BAC library. We thank Dr A Houben, IPK-Gatersleben, Germany, for fruitful discussions. We also gratefully acknowledge help from Prof. Robert A McIntosh, University of Sydney, with English editing.