Scott A. Jackson for correspondence relating to identification and characterization of CentPv1 and CentPv2. Kiyotaka Nagaki for correspondence relating to CENH3. Valerie Geffroy for correspondence relating to analysis of Nazca repeat.
In higher eukaryotes, centromeres are typically composed of megabase-sized arrays of satellite repeats that evolve rapidly and homogenize within a species' genome. Despite the importance of centromeres, our knowledge is limited to a few model species. We conducted a comprehensive analysis of common bean (Phaseolus vulgaris) centromeric satellite DNA using genomic data, fluorescence in situ hybridization (FISH), immunofluorescence and chromatin immunoprecipitation (ChIP). Two unrelated centromere-specific satellite repeats, CentPv1 and CentPv2, and the common bean centromere-specific histone H3 (PvCENH3) were identified. FISH showed that CentPv1 and CentPv2 are predominantly located at subsets of eight and three centromeres, respectively. Immunofluorescence- and ChIP-based assays demonstrated the functional significance of CentPv1 and CentPv2 at centromeres. Genomic analysis revealed several interesting features of CentPv1 and CentPv2: (i) CentPv1 is organized into an higher-order repeat structure, named Nazca, of 528 bp, whereas CentPv2 is composed of tandemly organized monomers; (ii) CentPv1 and CentPv2 have undergone chromosome-specific homogenization; and (iii) CentPv1 and CentPv2 are not likely to be commingled in the genome. These findings suggest that two distinct sets of centromere sequences have evolved independently within the common bean genome, and provide insight into centromere satellite evolution.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
Centromeres are specialized regions on chromosomes where spindle microtubules attach via the kinetochore during cell division. In higher eukaryotes, centromeres typically contain large arrays of satellite repeats (Henikoff et al., 2001; Jiang et al., 2003). Despite the conservation of centromere function across eukaryotes, the sequences of centromeric DNA and kinetochore proteins evolve rapidly, known as the ‘centromere paradox’ (Henikoff et al., 2001; Malik and Henikoff, 2002). The rapid evolution of centromeric satellite repeats involves the homogenization of the satellite sequences (Henikoff and Malik, 2002). As a result, all centromeres within a species are typically dominated by a single, species-specific satellite repeat, such as the alpha-satellite in humans (Willard and Waye, 1987), pAL1 in Arabidopsis thaliana (Martinez-Zapater et al., 1986; Murata et al., 1994), CentC in Zea mays (maize; Ananiev et al., 1998) and CentO in Oryza sativa (rice; Cheng et al., 2002a). However, recent studies in Solanum tuberosum (potato) and Gallus gallus domesticus (chicken) found unusual centromeric structures, where subsets of centromeres lack satellite repeats, suggesting that satellite repeats may not be necessary for centromere function (Shang et al., 2010; Gong et al., 2012). Moreover, research on human neocentromeres (ectopic centromeres formed at previously non-centromeric chromosomal regions) has shown that most neocentromeres are non-repetitive and are devoid of satellite repeats (Marshall et al., 2008). These studies raise questions about the functional roles, formation, accumulation and evolution of centromeric satellite repeats.
Despite the fact that centromeric satellite repeats have been identified in numerous eukaryotic species (Henikoff et al., 2001; Jiang et al., 2003; Melters et al., 2013), detailed characterization of the satellite repeats within a genome of an individual species is still limited to a few model species such as human, rice and Arabidopsis (Lee et al., 1997, 2006; Hall et al., 2005; Ma and Jackson, 2006). This is because of technical difficulties in sequencing and, in particular, assembling highly repetitive regions (Yan and Jiang, 2007), as well as the laborious work needed to process and analyze the large volume of genomic data (Macas et al., 2010).
Functional centromeres are marked by the presence of a centromere-specific histone H3 (CENH3) protein that helps the kinetochore associate with DNA (Henikoff et al., 2001). CENH3 is distinguished from the canonical histone H3 by a highly variable, in both sequence and length, N–terminal tail and a slightly longer loop1 region, one of the principal DNA-interaction domains (Henikoff et al., 2001). CENH3 homologs have been identified in all eukaryotic genomes studied so far (Jiang et al., 2003). Recently, immunofluorescence and chromatin immunoprecipiation (ChIP) assays using anti-CENH3 antibodies have successfully identified satellite repeats at functional centromeres from several plant species, confirming their functional significance at these centromeres (Zhong et al., 2002; Nagaki et al., 2003, 2004, 2012a; Nagaki and Murata, 2005; Houben et al., 2007; Tek et al., 2010, 2011).
The common bean, Phaseolus vulgaris L., is an important legume crop with a relatively small genome of 630 Mbp with 2n = 2x = 22 chromosomes (Arumuganathan and Earle, 1991). The common bean belongs to the Phaseoloid clade, which includes other important legumes such as Vigna unguiculata (cowpea), Glycine max (soybean) and Cajanus cajan pigeonpea) (Doyle and Luckow, 2003; Choi et al., 2004). Three centers of diversity can be distinguished for natural populations of P. vulgaris: the Mesoamerican, the South Andean, and the North Andean centers. Major domestication events took place independently in the Mesoamerican and South Andean centers of diversity, leading to two gene pools of cultivated beans referred to as the Andean and Mesoamerican gene pools (Velasquez and Gepts, 1994; Gepts, 1998). The common bean genome of Andean cultivar G19833 was recently sequenced (http://www.phytozome.org) using Sanger and a combination of next-generation sequencing (Table S1) Using the draft genome sequence and a set of bacterial artificial chromosome (BAC) end sequences (BESs), we identified two highly abundant tandem repeats: CentPv1 and CentPv2. A combination of fluorescence in situ hybridization (FISH), immunofluorescence and ChIP-based assays with an anti-CENH3 antibody were used to confirm that these two repeats underlie functional centromeres. Detailed analysis of these repeats show evidence of locus or chromosome-specific homogenization. Moreover, analysis of a fully sequenced BAC revealed that CentPv1 was organized into higher-order repeat (HOR) structures of a larger 528–bp tandem repeat, referred to as Nazca. Our findings provide new insights into the unique organization and evolutionary dynamics of functional centromeres.
Identification of potential centromeric repeats in the common bean
To identify centromeric repeats in the common bean, we used the tandem repeats finder (TRF; Benson, 1999) against an 8× draft genome sequence and BAC end sequences (BESs), both derived from an Andean accession G19833 (BAC library Pv_GBa; Schlueter et al., 2008; Table S1). Centromeric repeats are typically the most abundant tandem repeats in plant genomes (Gill et al., 2009); therefore, we focused on high-copy tandem repeats. First, we analyzed the TRF results from the 8× draft genome sequence. TRF results show copy numbers of tandem repeats only within individual scaffolds or contigs; however, we wanted to determine the abundance of the top tandem repeats within the entire genome. We had previously analyzed the frequency of all possible random 25–bp sequences (25 mer) occurring more than 3000 times in a 2–3× draft genome sequence (Table S1). These 25 mers were used to blast against these tandem repeats. This blast analysis indicated that the 29th and 30th highest copy number tandem repeats from TRF of the 8× draft sequence were in fact the most abundant tandem repeats in the genome. These two repeats were identical, differing by a single nucleotide insertion/deletion, resulting in a unit length of 100 and 99 bp, respectively (Figure 1a), and thus they correspond to a single 99- or 100–bp tandem repeat.
The TRF results from the BESs differed from the analysis of the 8× common bean draft sequence, where a 110–bp tandem repeat was the most abundant element, corresponding to approximately 20% of the identified tandem repeats in the BESs (Figure 1b). The 110–bp tandem repeat appeared unrelated to the 99- or 100-bp tandem repeats, as they shared <40% sequence identity.
To investigate the chromosomal distribution of the 99- or 100 and 110–bp tandem repeats, 25–bp oligonucleotide probes specific to these repeats were designed. Based on FISH analysis of the 25–bp probes, these two repeats were localized at, or around, the centromeric regions on mitotic chromosomes (Figure 1c). The 99- or 100 and 110–bp satellite sequences were named CentPv1 and CentPv2, respectively. Interestingly, among 2n = 22 chromosomes, CentPv1 signals were localized on 16 centromeres with eight strong and eight weak signals. CentPv2 signals localized to the other six centromeres with four strong signals and 2 weak signals (Figure 1c).
To make sure that the 25–bp probes were not unintentionally targeting diverged portions of the tandem repeat that may confound the interpretation of our results, we amplified CentPv1 and CentPv2 from G19833 genomic DNA using PCR, and performed FISH with the PCR-amplified CentPv1 and CentPv2 probes to capture more variation within these repeats. The PCR-derived CentPv1 FISH signals were the same as from oligonucleotide-based probes (Figure S1c). The PCR-derived CentPv2 probe showed the same four strong signals, as seen before, but also had very weak signals at other chromosomes, including the two chromosomes that had weak signals in oligo-FISH (Figure S1d). Minor signals of CentPv2 either overlapped or were in close proximity to CentPv1 signals on chromosomes where both were found. These weak signals were inconsistent between different chromosome preparations. Based on all the FISH results, we conclude that CentPv1 is located on eight chromosome pairs out of eleven chromosome pairs, and that CentPv2 has major loci on three chromosome pairs and minor loci on other chromosome pairs.
Characterization of a common bean centromere-specific histone H3 variant
To find the CENH3 homolog in the common bean, designated PvCENH3 (P. vulgaris centromere-specific histone H3), a rapid amplification of cDNA ends (RACE)-PCR cloning strategy using common bean cDNA was used. First, we designed a PCR primer (PvCENH3r) corresponding to the end of the G. max centromere-specific histone H3 (GmCENH3) encoding gene. This primer yielded a cDNA covering the 5′ untranslated region (5′–UTR) and 5′ coding region of PvCenH3. Second, a gene-specific primer, PvCENH3f, was designed from the internal portion of the PvCenH3 coding region. The PvCenH3-specific primer was used to recover the 3′–UTR of PvCenH3 cDNA. All amplified products from 5′ and 3′ RACE-PCR experiments were identical in the overlapping region. Therefore, we determined an 876–bp PvCENH3 cDNA (GenBank number: KC491791) that encodes a predicted 163-amino acid (aa) protein. Based on blastp analysis against transcript peptide sequences (http://www.phytozome.net) using PvCENH3 as a query, PvCenH3 is a single-copy gene.
A multiple sequence alignment revealed several prominent features of the deduced PvCENH3 in comparison with soybean CENH3, Arabidopsis CENH3 (HTR12) and common bean canonical histone H3.1 proteins (Figure 2). First, PvCENH3 shares 88.4% (84 out of 95 residues) sequence identity with GmCENH3 at the histone fold domain. As expected, the N–terminal tail in PvCENH3 is more divergent (58.9%). Second, a single 1–aa longer loop–1 region is present in PvCENH3 in comparison with canonical histone H3.1, as described previously (Henikoff et al., 2001; Malik and Henikoff, 2003). In addition, 15 out of 20 residues in PvCENH3 are identical to the peptide sequences used for the development of anti-GmCENH3 antibody (boxed in Figure 2). Overall, these results suggest that the deduced PvCENH3 is the authentic CENH3 homolog in common bean, and that the anti-GmCENH3 antibody could be used for the specific recognition of PvCENH3 in situ and in vitro.
PvCENH3 at centromeres is recognized by the anti-GmCENH3 antibody
An O. sativa-derived CENH3 antibody was previously used to capture CENH3-associated centromeric sequences in different Oryza species that diverged up to approximately 15 Mya (Lee et al., 2005), indicating the utility of antibodies over long divergence times. In order to experimentally test whether the anti-GmCENH3 antibody recognizes the PvCENH3 protein at functional centromeres, we used an immunofluorescence assay. Our previously raised anti-GmCENH3 antibody showed centromeric signals on all 22 common bean chromosomes (Figure S2). Although the GmCENH3 antibody consistently bound to PvCENH3 in the common bean at the interphase, prometaphase and telophase stages (Figure S2), we could not detect signals at the metaphase stage. This is in contrast with the observation that the anti-GmCENH3 antibody recognized the centromeres throughout the cell cycle, including the metaphase, in soybean and Astragalus sinicus (Tek et al., 2010, 2011). This suggests that the binding of the soybean antibody may have been blocked by a metaphase-specific modification of the PvCENH3 in the common bean. The centromeric immuno-signals by anti-GmCENH3 antibody on the interphase nuclei showed that ChIP using the antibody could be used to determine the underlying DNA components of common bean functional centromeres.
CentPv1 and CentPv2 are co-localized with the anti-GmCENH3 antibody
Our first strategy to confirm the functional role of the satellite repeats was to perform an immunofluorescence assay followed by FISH on the same set of cells. This would allow us to unequivocally determine the localization of CentPv1 and CentPv2 in relation to PvCENH3 immuno-signals.
An immunofluorescence assay followed by FISH with CentPv1 revealed the co-localization of PvCENH3 immuno-signals (Figure 3b) with CentPv1 signals (Figure 3c) at interphase nuclei (Figure 3a–e). As expected, FISH signals overlapped with most of the immuno-signals (Figure 3a–i). A similar experiment with CentPv2 showed that four strong and two weak FISH signals completely overlapped with immuno-signals at the common bean interphase nuclei, although the weak signals were not always reproducible (Figure 3k–o).
Collectively, the sequential immunofluorescence and FISH assays indicated that CentPv1 and CentPv2 were independent of each other, but that each co-localized with PvCENH3 immuno-signals, suggesting the existence of two different functional centromeric DNA elements in the common bean genome.
Quantitative analyses of the association of CentPv1 and CentPv2 with PvCENH3
A quantitative ChIP slot blot was used to assess the association of the two satellite repeats with the centromere-specific CENH3. Common bean nucleosomes digested with micrococcal nuclease were captured using the anti-GmCENH3 antibody. Using a set of standards and comparison samples, we determined the relative enrichment of CentPv1 and CentPv2. Two repetitive sequences known to localize outside of centromeres in the common bean genome, rDNA and khipu (David et al., 2009), were also hybridized to the extracted samples as non-centromeric control DNAs (Figure 4a).
The average percentage of immunoprecipitation (IP%) of the non-centromeric control rDNA and khipu repeats were similar to each other at 24.0 ± 0.8% (SE; n = 5) and 25.8 ± 2.2% (SE; n = 5), respectively (Figure 4b). In contrast, the average IP% of CentPv1 and CentPv2 were 41.9 ± 1.9% (SE; n = 5) and 34.5 ± 2.7% (SE; n = 5), respectively (Figure 4b). As the antibody showed basal non-specific binding to DNA (Tek et al., 2011), all the sequences were increased in these pellet fractions (Pel). Such non-specific binding has been more conspicuous in ChIP using magnetic beads than that using sepharose beads, as magnetic beads have lower non-specific binding than sepharose (Nagaki et al., 2003, 2012a,b; Tek et al., 2011); however, the non-centromeric controls and candidate centromeric repeats were significantly differently enriched. Khipu was not enriched in Pel compared with rDNA [P = 0.48, Tukey's honestly significant difference (Tukey's HSD) test]. In contrast, CentPv1 and CentPv2 were both significantly enriched in Pel (P = 0.0001 and 0.0058, respectively). Similar backgrounds of rDNA and enrichment in centromeric DNA sequences were also observed in Astragalus sinicus using the anti-GmCENH3 antibody (Tek et al., 2011). These results indicate that CentPv1 and CentPv2 were significantly enriched in the Pel carrying the anti-GmCENH3 antibody.
Higher-order repeat structure of CentPv1 (Nazca structure)
In parallel with the tandem repeat analysis of BESs and the draft genome, we searched fully sequenced BAC clones for the presence of the two centromeric repeats. BAC Pvm1-249m1 contained 77 units of CentPv1-like sequences (70–87% identity with CentPv1, shown in Table S2). This BAC was sequenced to 30× coverage and the repeats are more fully assembled than in the draft genome that was useful to investigate higher-order organization of CentPv1-like sequences. Units of CentPv1-like sequences were organized into four tandem units separated by an unrelated sequence of approximately 159 bp. The four tandem units of CentPv1-like and 159–bp sequences were also tandemly arranged into HOR. This HOR structure of approximately 528 bp in length is referred to as Nazca. We named the CentPv1-like sequences ‘NazcaA’, and the 159–bp sequence ‘NazcaB’ (Figure 5a). Twenty complete Nazca units were identified in BAC Pvm1-249m1. Manual annotation revealed that within a single Nazca repeat, the NazcaA repeats can be classified and are physically organized as NazcaA1, NazcaA2, NazcaA3 and NazcaA4, presenting a consensus sequence of 80, 102, 105 and 82 bp, respectively. NazcaA1–NazcaA4 shared 50–80% sequence identities (Table S2). An internal highly conserved region of 57 bp (>90% nucleic identity among NazcaA1, A2, A3 and A4) was identified.
The FISH was performed using a clone containing NazcaA from BAC Pvm1-249m1 as probe on a Mesoamerican accession BAT93. When the NazcaA clone was hybridized at regular stringency, only two major and a few minor signals were observed (Figure S3a,b). When the same probe was hybridized at lower stringency, signals of various sizes were observed at 16 centromeres (Figure S3c,d), in agreement with the CentPv1 distribution. These results confirm NazcaA is present at eight pairs of centromeres, and suggests chromosome-specific homogenization of the sequence.
Genomic structure of CentPv1/NazcaA, NazcaB and CentPv2
In most eukaryotes, centromeric repeats have high copy numbers. To estimate the copy number of the centromeric repeats in the common bean genome, we performed blast analysis of the CentPv1/NazcaA, NazcaB and CentPv2 regions against short-read DNA sequences representing approximately 16-fold coverage of the common bean genome. We used the short-read data set as repetitive sequences are often lost in genome assemblies. blastn was used with two different cut-offs: 80% length and 80% sequence identity, and 60% length and 80% sequence identity, as the short reads averaged 275 bp in length, and might not contain complete repeat units (Table S3). Consistent with the FISH data, CentPv1 had the highest copy number: approximately 10 times more than CentPv2. NazcaB was less abundant as compared with the CentPv1/NazcaA region, which is consistent with the Nazca structure (Figure 5a). Out of 169 006 reads that contained either CentPv1 or CentPV2, only 26 reads contained both CentPv1/NazcaA and CentPv2.
We further analyzed the organization of CentPv1/NazcaA and CentPv2 in the genome using an assembled version of the genome sequence (11 pseudo-chromosomes; Table S1) and fiber-FISH experiments. CentPv1 and CentPv2 were used to query the 11 pseudo-chromosomes using blastn with a cut-off of 60% identity and 80% length. Significant matches of CentPv1 were found in eight pseudo-chromosomes (1–4 and 7–10), and always in a head-to-tail structure. Significant matches of CentPv2 were found in pseudo-chromosomes 5, 6 and 11 that do not have CentPv1, but also in pseudo-chromosomes 1, 2, 3 and 7, which may explain the minor signals of CentPv2, some of which co-localized with CentPv1. A few copies of CentPv2 sequences were found in other pseudo-chromosomes, but the numbers were so low that they may not be detectable by FISH. CentPv2 was also organized in a head-to-tail orientation. A caveat to these analyses is that assemblies of short-read sequences into genome assemblies often leave out tandem-repeat arrays, and thus these analyses may not reflect the true genomic structure/organization.
Fiber-FISH was performed on extended DNA fibers using PCR-amplified CentPv1/NazcaA, NazcaB and CentPv2 probes. Signals of CentPv1 and CentPv2 were found on molecules up to 182.82 and 121.04 μm in length, which correspond to approximately 606.11 and 388.54 kb, respectively, with observed gaps (Figure S4a,b). Co-localization of signals from CentPv1 and CentPv2 were not observed, which indicated that these two repeats are not commingled in the genome. Fiber-FISH with CentPv1 and NazcaB showed both overlapping and intermingled signals (Figure S4c), consistent with the HOR structure of these sequences.
Sequence variability of the centromeric repeats within the common bean genome
Next, we investigated the sequence variability of these centromeric repeats within the genome. First, a multiple alignment was made using 493 sequences of CentPv1 and 1241 sequences of CentPv2 found by blast searches of BES and a 20× genome draft sequence (Table S1). Several polymorphic and conserved regions were identified in CentPv1 and CentPv2. Oligonucleotide probes specific to the polymorphic regions were designed (Figure 6a). CentPv1_A, a variant of CentPv1, is 86 bp in length and shared approximately 74% identity with CentPv1. The 25–bp oligonucleotide CentPv1_A was targeted to the region containing the transitions of two bases and the insertion of two bases (Figure 6a). FISH signals of oligonucleotide CentPv1_A showed that this variant was present at two centromeres overlapping with two of the weak CentPv1 signals (Figure 6b). From the multiple alignment of CentPv2, we found a relatively abundant 111–bp variant CentPv2_A, with more than 90% sequence identity with CentPv2. 25–bp oligonucleotide CentPv2_A was targeted to a region that included two transversions and a single nucleotide insertion that accounted for the difference in unit lengths (Figure 6c). FISH of oligonucleotide CentPv2_A showed that it localized to centromeres overlapping with two of the four strongest CentPv2 signals (Figure 6d). Thus, FISH results of CentPv1_A and CentPv2_A revealed locus- or chromosome-specific homogenization of repeats.
Using the 11 pseudo-chromosomes, we analyzed the diversity, chromosome-specific variants and phylogeny of these repeats. blast searches of CentPv1, Nazca and CentPv2 were performed, and multiple alignments were done using clustalW. We were unable to find CentPv1_A in the assembled sequence, but did find it in the 16× unassembled short reads, indicating that this variant was not assembled into the genome. blast analysis of CentPv2_A showed that this variant exists only on chromosome 11, along with CentPv2, which is congruent with the FISH results.
Phylogenetic analysis was performed using a subset of Nazca and CentPv2 monomer sequences, with 95–100% of the unit lengths, from eight and three pseudo-chromosomes with major loci, respectively. A neighbor-joining tree revealed chromosome-specific homogenization of CentPv2 in chromosomes 5, 6 and 11 (Figure 7a). The average sequence distances of CentPv2 monomers within chromosomes 5, 6 and 11 were 0.147 ± 0.022, 0.161 ± 0.031 and 0.115 ± 0.023, respectively. These were smaller distances than the average distances between the chromosomes: 0.22 ± 0.039 (chr5–chr6), 0.17 ± 0.031 (chr5–chr11) and 0.242 ± 0.045 (chr6–chr11). Interestingly, chromosome 6 only contained a CentPv2 variant with a 9–bp insertion, resulting in a 119–bp unit length, named CentPv2_B (Figure S5). This 9–bp insertion occurs in the same region as the changes found in CentPv2_A, indicating that this region may be unstable. Phylogenetic analysis of the complete Nazca repeats identified in chromosomes 1–4 and 7–10 were also performed. Chromosome-specific homogenization was also observed for Nazca (e.g. chromosomes 2, 3 and 7), but to a lesser extent than for CentPv2 (Figure 7b).
Conservation of CentPv1/NazcaA, NazcaB and CentPv2 among species
To investigate the origin of CentPv1/NazcaA, NazcaB and CentPv2, we performed Southern analysis using wild and domesticated common beans from the Andean and Mesoamerican gene pools, from various Phaseolus species and from other species (Figure S6a–c). CentPv1/NazcaA, NazcaB and CentPv2 were abundant in wild and domesticated common beans from both gene pools and in Phaseolus dumosus. CentPv1/NazcaA, NazcaB and CentPv2 were also present in Phaseolus coccineus and Phaseolus acutifolius, indicating the origin of these repeats within the Vulgaris group (Figure S6d; Delgado-Salinas et al., 2006). CentPv2 hybridization signals had the typical ‘ladder’ pattern inherent to tandem repeats because of monomer, dimer and multimers of the repeat unit. CentPv1/NazcaA and NazcaB did not show this pattern, confirming that these repeats are organized into HOR structures within the genome. Signals of CentPv1/NazcaA and CentPv2 were also seen in G. max, V. unguiculata and O. sativa, which indicates that elements with sequence similarity to CentPv1/NazcaA and CentPv2 exist in these genomes; however, they are probably not organized as tandem repeats, and have low copy numbers, as the signals were much weaker than those from the common bean. To confirm this, FISH experiments were performed with these sequences in G. max and V. unguiculata, but there were no clear signals as would be expected from a high-copy tandem repeat.
In this article, we report on the identification and characterization of satellite repeats that underlie the functional centromeres of common bean. There were two unrelated centromeric satellite repeats: (i) CentPv1, a 99–bp tandem repeat organized into HORs called Nazca, and localized at eight pairs of centromeres; and (ii) CentPv2, a 110–bp tandem repeat, localized primarily in three pairs of centromeres. The interaction of these repeats with CENH3, the centromere-specific histone, was confirmed by ChIP experiments.
Identification of centromeric satellite repeats in other species has shown that a single major satellite repeat typically dominates all centromeres in most diploid species (Henikoff et al., 2001; Jiang et al., 2003). Centromeric satellite repeats across species are, however, highly diverged, and often do not share any sequence similarity. The only commonality is the unit length of 150–180 bp, which is close to the range of nucleosomal unit lengths (Henikoff et al., 2001). This rule is not true for the common bean, however. Despite being a diploid species, two unrelated centromeric satellite repeats are distributed across subsets of centromeres, and the monomer lengths are less than the nucleosomal unit length. Given the fact that there are other examples of centromeric satellite monomers with non-nucleosome unit sizes [e.g. 91-, 92- and 411–bp CentGm families in soybean; 126–bp CentO-C1 and 366–bp CentO-C2 in Oryza rhizomatis; 48-, 68-, 91- and 92–bp in Nicotiana tabacum (tobacco)], the conservation of centromeric satellite monomer length around 150–180 bp is not universal across species (Lee et al., 2005; Gill et al., 2009; Tek et al., 2010; Nagaki et al., 2012a; Melters et al., 2013).
During satellite repeat evolution, adjacent polymorphic monomers can be amplified together as a larger repeating unit, forming HOR arrays, with the original monomers becoming a subunit. For example, 171–bp monomers of alpha-satellite in humans are organized into HORs that are repeated hundreds to thousands times. The monomer subunits of alpha-satellite within HORs exhibit around 60–80% identity (reviewed in Lee et al., 1997). To date, there are only a few reports of HORs in plant centromeres, either because plant centromere sequences are typically not organized into HORs, or because of the technical difficulties in detecting HORs in short reads derived from next-generation sequencing (Melters et al., 2013). A fully sequenced repetitive BAC, Pvm1-249m1, enabled us to identify HOR structures called Nazca, which typically contain one NazcaB region and four NazcaA regions, corresponding to CentPv1 (Figure 5a). NazcaA regions were further classified into four subtypes, NazcaA1, A2, A3 and A4, which share a 57–bp conserved region, confirming the existence of conserved and variable regions of the repeat during evolution. Compared with other centromeric HORs (Willard and Waye, 1987; Melters et al., 2013), Nazca is remarkable for the presence of NazcaB, highly diverged from NazcaA (30% identity), but nested between four NazcaA regions.
Based on the analysis of these repeat structures, we propose that CentPv1 originated as a monomeric structure, but mutation and homogenization processes resulted in the current HOR Nazca organization, with selective pressure maintaining the conserved 57–bp regions. However, the origin of NazcaB is unclear, as it appears to be unrelated to NazcaA. Two variants of CentPv2 – CentPv2_A and CentPv2_B – had polymorphic regions at the same bases, which further supports the existence of conserved and variable regions in CentPv2. Together, these observations provide evidence of dynamic evolution of CentPv1, Nazca and CentPv2 within the common bean genome. It is likely that the centromeric satellite repeats evolve by balancing mutations with selective pressure on certain DNA sequences that might be essential structurally for centromeres, and/or for epigenetic modifications to maintain interactions with CENH3 (Hall et al., 2003, 2005; Lee et al., 2006).
In general, repetitive sequences evolve by concerted evolution, resulting in higher sequence similarity of the repeat family within a species than between species (Elder and Turner, 1995). This is the consequence of a ‘molecular drive’ process, whereby mutations spread throughout a repeat family (homogenization) and are consequently fixed within a population (Dover, 1982, 1986). Molecular drive is caused by several mechanisms, such as unequal crossing over, gene conversion, replication slippage, rolling circle replication and retrotransposon-mediated transposition (Smith, 1976; Dover, 1982, 1986; Stephan, 1986; Charlesworth et al., 1994; Birchler and Presting, 2012).
The spread of CentPv1 and CentPv2 in subsets of common bean centromeres confirms the concerted evolution of the satellite repeat monomers, and suggests the interaction of these repeats between non-homologous chromosomes. In addition, the FISH and phylogenetic analysis of CentPv1, Nazca and CentPv2 clearly showed evidence of locus- or chromosome-specific homogenization, which is postulated to be driven by unequal crossing over (Smith, 1976; Schueler et al., 2001). In general, satellite repeat monomers show higher degrees of similarity within a chromosome than between chromosomes because of different rates of local and global homogenization (Dover, 1986). The evidence of chromosome-specific homogenization of CentPv1 and CentPv2 confirms that they have undergone a process typical of satellite repeat evolution.
Questions remain, however: how did individual centromeric repeats become selectively homogenized within subsets of centromeres? Based on FISH, individual centromeres contain either CentPv1 or CentPv2 as a major component, and Fiber-FISH results show that CentPv1 and CentPv2 are not intermingled in the genome. This indicates that two types of centromeres, containing either CentPv1 or CentPv2, have been evolving independently within a single nucleus.
Another question is how do two different types of centromeric repeats evolve independently while maintaining interaction with CENH3? Divergence of centromeric satellite repeats has been explained by the centromere drive hypothesis (Henikoff et al., 2001; Malik and Henikoff, 2002). This model claims that centromeric satellite repeat variants compete by microtubule attachment for transmission through asymmetric female meiosis that leads to the fixation of the ‘winning’ centromere satellite variant. This process requires that certain centromere satellite variants possess selective advantages over others, such that they better interact with DNA-binding kinetochore proteins that ensure the variant better access to reproductive cells (eggs) in meiosis. Given this, we hypothesize that both CentPv1 and CentPv2 are still in the process of selection, having similar strengths of interactions with DNA-binding kinetochore proteins, and that neither is more favored to be fixed at functional centromeres. It is possible that one of them may eventually gain a greater advantage in interaction with DNA-binding kinetochore proteins, and ultimately dominate all the centromeres of the common bean.
Southern analysis showed that CentPv1/NazcaA, NazcaB and CentPv2 are conserved in the species of the Vulgaris group, which is about 2–4 million years old (Delgado-Salinas et al., 2006). In comparison with the common bean, the copy number of these repeats was dramatically reduced in other species of the Vulgaris group, supporting the idea that the satellite repeats contract or expand over a very short evolutionary time frame.
The origin of CentPv1/Nazca and CentPv2 centromeric satellite repeats can be explained by two hypotheses (Figure 8). First, these two centromeric repeats originated from a single progenitor repeat present in an ancestral species that underwent multiple mutations, divergence and homogenization, leading to two different ‘unrelated’ repeats: CentPv1 and CentPv2 (Figure 8a). If this is the case, these two opposing processes, sequence divergence and homogenization, occurred in a relatively short evolutionary time frame within the Vulgaris group.
Alternatively, CentPv1/Nazca and CentPv2 may have had two different origins (Figure 8b). One progenitor repeat may have dominated the centromeric region in an ancestral species, and the other may have been non-centromeric. Insertion and expansion of the non-centromeric repeat may have caused a contraction of the original centromeric repeat, resulting in the current organization of different proportions of CentPv1 and CentPv2. In O. rhizomatis, the satellite repeat, CentO-C2/TrsC was detected in both subtelomeres and centromeres, and is likely to have gained functional importance at centromeres (Lee et al., 2005; Bao et al., 2006). We did not observe any non-centromeric distribution of CentPv1 or CentPv2, but it is also possible that CentPv1/Nazca and CentPv2 derived from non-centromeric progenitor repeats, and diverged rapidly because of the rapid evolution of centromeric repeats. Our analyses also showed that some chromosome pairs contain CentPv1/Nazca as the major component and CentPv2 as a minor component. This could be because of continuing chromosome-specific homogenization processes and/or interchromosomal transfer of the two repeats (Dover, 1986).
Evolution is a complex and continuous process, and an experimental limitation is that we see only one time frame of a continuum; however, progress in genome sequencing will accelerate centromere identification and characterizations in related species, which will lead to a better understanding of the evolutionary dynamics that occur at centromeric regions. It is necessary to investigate the chromosomal distribution of CentPv1, CentPv2 and their variants in a wide range of common bean and Phaseolus species in the Vulgaris group in order to trace interchromosomal exchanges, divergence, homogenization and amplification/deletion.
In summary, we report a peculiar structure of common bean centromeric satellite repeats: two distinctive sets of centromere sequences evolving independently, of which one is organized into HORs. In addition to satellite repeats, retrotransposons have been shown to be important structural components in functional centromeres in plant species (Nagaki et al., 2004, 2011; Nagaki and Murata, 2005; Tek et al., 2010). Further ChIP experiment and genome analysis will be necessary to define centromere-specific retrotransposons in order to gain more insight into the functional centromeres of the common bean.
Identification of centromeric satellite repeats
To identify and characterize centromeric satellite repeats, we used multiple sequence data sets. The detailed information of the sequence data sets used in this study are described in Table S1. A total of 89 017 PV_GBa BESs (Schlueter et al., 2008; NCBI GI 134154885–134244144) and an 8× draft genome sequence of G19833, containing 20 067 scaffolds and 6136 contigs were searched for tandem repeats using tandem repeats finder (Benson, 1999). The tandem repeats finder output was extracted using a custom perl script. Tandem repeats with a consensus size greater than 60 bp and copy number greater than three were retained for further analysis. The identified tandem repeats from the 8× draft genome were sorted by copy number within individual scaffolds or contigs, and the consensus sequences of the top–30 highest copy number tandem repeats were obtained. To estimate the copy number of the top–30 tandem repeats in the entire genome, blast analysis was performed using a set of random 25–mer sequences, occurring more than 3000 times in the 2–3× draft genome sequence, as a query against the top–30 tandem repeats.
Common bean accessions (G19833, BAT93, G23580 and PI535416) and other plant species (P. coccineus, Phaseolus lunatus, P. acutifolius, P. dumosus, Phaseolus leptostachyus, Phaseolus hintonii, Phaseolus maculatus, G. max, V. unguiculata, C. cajan, Medicago truncatula, A. thaliana and O. sativa) were obtained from the US Department of Agriculture (USDA) or International Center for Tropical Agriculture (CIAT), Colombia, and were grown in a glasshouse for chromosome and DNA isolations.
Fluorescence in situ hybridization (FISH)
Mitotic chromosome preparations and oligonucleotide-based FISH were conducted as described in Gill et al. (2009) and Findley et al. (2010), with the following modifications. Root tips from the common bean (accession G19833) were treated with pressurized nitrous oxide for 90 min, fixed in Carnoy's solution (3:1 ethanol and glacial acetic acid) for 24 h at room temperature, (∼25°C) and then stored at 4°C until used. Root tips were digested with an enzyme solution containing 1% (w/v) Pectolyase (MP Biomedicals, http://www.mpbio.com) and 2% (w/v) Cellulase (MP Biomedicals) in citric buffer (10 mm sodium citrate, 10 mm sodium EDTA, pH 5.5) for 80 min at 37°C. The following fluorochrome-labeled oligonucleotides were used as FISH probes (Integrated DNA Technologies, http://www.idtdna.com) CentPv1, cyanine 5 (Cy5)–CACATGAAATTGTTTTTCAAAGATA; CentPv2, TEX615-CAATAAATTCATGCAACTACCACAA; CentPv1_A, fluorescein- GGTTTTTCAAGGGTGTATCATAGGT; CentPv2_A, fluorescein- CCAATGTCTATCACTACTCTTTGACA. FISH using PCR products and plasmid clones was performed according to the method described by Jiang et al. (1995). CentPv1 and CentPv2 were amplified from genomic DNA using PCR with the primer sets P1 and P2, respectively (Table S4): NazcaA was amplified and cloned from BAC Pvm1-249m1 using the primer set P3 (Genbank number: KC990411). The PCR products of two copies of CentPv1 (approximately 198 bp), two copies of CentPv2 (approximately 220 bp), and NazcaA clone were labeled with biotin-dUTP, digoxigenin-dUTP or Cy3-dUTP (GE Healthcare, http://www.gelifesciences.com) using Nick Translation Mixes (Roche, http://www.roche.com). Indirectly labeled probes were detected with streptavidin-conjugated Alexa Fluor 488 (Invitrogen, http://www.invitrogen.com) or rhodamine-conjugated anti-digoxigenin antibodies (Roche). Leaf nuclei isolation and fiber-FISH was performed as described by Jackson et al. (1998). The images were taken with an Axio Imager M2 microscope (Zeiss, http://corporate.zeiss.com), equipped with AxioCam MRm, controlled by axio vision 40 22.214.171.124 (Zeiss). The measurements of DNA fibers were performed using the measurement tool in axiovision, and converted into kilobases using a 3.21–kb μm−1 conversion rate (Cheng et al., 2002b). The image was adjusted using Adobe Photoshop CS5.1 (Adobe, http://www.adobe.com).
Cloning of PvCENH3 cDNA
Total RNA was extracted from common bean leaves of a commercially available cultivar ‘green mild’ (RNeasy Plant Mini kit; Qiagen, http://www.qiagen.com). First-strand cDNA synthesis and rapid amplification of cDNA ends (RACE) PCR reactions were performed using the SMARTer RACE cDNA Amplification kit (Clontech, http://www.clontech.com). First, using a reverse primer, PvCENH3r, (5′–TCACCAAGGCCTTCCTATTCCTC–3′), a 5′–RACE-PCR amplification was performed to obtain the 5′ coding and untranslated regions (UTRs). Second, a 3′–RACE-PCR amplification was conducted to recover the 3′–UTR using a coding region-specific forward primer, PvCENH3f, (5′–GGAACTGTGGCGCTTCGTGAGAT–3′). Amplified fragments were cloned, sequenced and analyzed as described previously (Tek et al., 2010, 2011). Multiple sequence alignment was carried out using muscle (Edgar, 2004).
Immunofluorescence was carried out using the rabbit anti-GmCENH3 antibody, as described previously (Nagaki et al., 2009; Tek et al., 2010). Briefly, common bean root tips were fixed in 3% (w/v) paraformaldehyde, and digested with 1% (w/v) Cellulase Onozuka RS (Yakult Pharmaceutical IND.CO., http://www.yakult.co.jp/ypi/en/index.html) and 0.5% (w/v) Pectolyase Y23 (Kikkoman, http://www.kikkoman.com). The digested tissue was squashed in phosphate-buffered saline (PBS) on poly-l-lysine-coated slides (Matsunami Glass Ind., Ltd, http://www.matsunami-glass.co.jp). Slides were incubated with a 1:100 dilution of the anti-GmCENH3 antibody in TNB buffer (0.1 M Tris-HCl pH 7.5, 0.15 M NaCl, and 0.5% (w/v) Roche Blocking Reagent). The antibody was detected with Alexa Fluor 555-conjugated anti-rabbit antibody (Molecular Probes, now Invitrogen, http://www.invitrogen.com). After recording the immunofluorescence images, the same slides were processed for the localization of CentPv1 and CentPv2 (clones CentPv1–9 and CentPv2–21) using FISH. CentPv1 and CentPv2 sequences were amplified using primer sets of P4 and P5, respectively (Table S4).
Quantitative ChIP slot-blot hybridization
ChIP with the anti-GmCENH3 antibody and quantitative ChIP slot-blot hybridization was performed as described previously (Nagaki et al., 2003, 2009; Tek et al., 2010, 2011). The DNAs isolated from pellet and supernatant fractions of ChIP and mock experiments were extracted with phenol/chloroform and precipitated in ethanol. Subsequently, the DNA samples were transferred onto a Biodyne Plus nylon membrane (PALL Co., http://www.pall.com) using a slot-blot apparatus. Common bean centromeric clones (CentPv1–9 and CentPv2–21) and non-centromeric control clone khipu (David et al., 2009) from common bean and rDNA (pTa71) (Gerlach and Bedbrook, 1979) from wheat were labeled with DIG High Prime (Roche). The khipu sequence was amplified using a primer of P6 and cloned (Table S4). After hybridizing to membranes, the probe DNA was detected with a DIG Luminescent Detection kit (Roche). Luminescent signals were captured and analyzed using the LAS1000 Plus system (Fujifilm, http://www.fujifilm.com). In each case, the percentage immunoprecipitation [defined as Pel/(Pel + Sup)] of the mock experiments was subtracted from the percentage immunoprecipitation of the antibody to CENH3 treatments [IP% = the percentage immunoprecipitation (CENH3)—the percentage immunoprecipitation (Mock)]. The ChIP reactions had five technical replications. The probability that an extra-centromeric control (rDNA) and the other repetitive DNA sequences belonged to the same group was determined by analysis of variance (anova) using ezanova software. Pairwise comparisons of each group were conducted by Tukey's HSD test using ezanova.
Identification of Nazca
The sequencing and assembly of BAC clone Pvm1-249m1 (accession number FO681296) were performed at Genoscope (http://www.genoscope.cns.fr). CentPv1 sequences in BAC clone Pvm1-249m1 were identified using blast2sequence with a cut-off threshold of 80% identity on 70 bp. The structural annotation and characterization for Nazca repeats in BAC clone Pvm1-249m1 was manually performed in meme (Bailey and Elkan, 1994) and clustalw (Thompson et al., 1994). The information was imported into the annotation platform artemis for further manual analysis (Rutherford et al., 2000).
Sequence analysis of centromeric repeats
To estimate the copy numbers of CentPv1/NazcaA, NazcaB and CentPv2 in the genome, blast searches were conducted against 16–fold coverage Illumina short reads (Table S1). To analyze the diversity and chromosome-specific homogenization in the genome, 11 pseudo-chromosomes were used for blast analysis using CentPv1 and CentPv2 as a query with a cut-off of 60% identity and 80% length. To construct a phylogenetic tree of CentPv2, only full-length CentPv2 sequences were used. A 528–bp consensus sequence derived from the multiple alignment of the 20 Nazca identified on BAC Pvm1-249m1 was used as a query in blast analysis, with a cut-off of 80% identity and 95% of the query length to analyze the diversity of Nazca. The sequences were aligned using clustalw (Thompson et al., 1994), and the multiple alignments were manually edited using jalview (Waterhouse et al., 2009). The consensus sequence logos were created using the WebLogo server (Crooks et al., 2004). Neighbor-joining trees were constructed using Kimura two-parameter models with 1000 bootstrap replicates in mega 5 (Tamura et al., 2007, 2011).
Genomic DNA from each species was restriction digested with EcoRI (Invitrogen), fractionated in a 0.8% (w/v) agarose gel and transferred onto membrane (Roche). Probe labeling and hybridization was performed as previously described (Gill et al., 2009). PCR products of two copies of CentPv1 (approximately 188 bp), two copies of CentPv2 (approximately 220 bp) and one copy of NazcaB (approximately 159 bp) were used as a probe. NazcaB was amplified using the primer set P7 developed from the consensus sequence throughout the genome (Table S4). The membrane was exposed to X–ray film (GE Healthcare) and scanned using a film processor (Konica Minolta, http://www.konicaminolta.com).
This work was supported by a USDA-NIFA grant 2009-01860 to S.A.J., the Fellowship Program of the Japan Society for the Promotion of Science (JSPS) to A.L.T. and K.N., by INRA and IFR87 to V.G., and partly by the Turkish Higher Education Council and Harran University, Turkey, to A.L.T. A.F. was supported by a grant from Fundação de Amparo à Ciência e Tecnologia do Estado de Pernambuco (FACEPE), Brazil, and A.P.–H. by the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Brazil.