We have analysed the centromere 1 (CEN1) of Arabidopsis thaliana by integration of genetic, sequence and fluorescence in situ hybridisation (FISH) data. CEN1 is considered to include the centromeric core and the flanking left and right pericentromeric regions, which are distinct parts by structural and/or functional properties. CEN1 pericentromeres are composed of different dispersed repetitive elements, sometimes interrupted by functional genes. In contrast the CEN1 core is more uniformly structured harbouring only two different repeats.
The presented analysis reveals aspects concerning distribution and effects of the uniformly shaped heterochromatin, which covers all CEN1 regions. A lethal mutation tightly linked to CEN1 enabled us to measure recombination frequencies within the heterochromatin in detail. In the left pericentromere, the change from eu- to heterochromatin is accompanied by a gradual change in sequence composition but by an extreme change in recombination frequency (from normal to 53-fold decrease) which takes place within a small region spanning 15 kb. Generally, heterochromatin is known to suppress recombination. However, the same analysis reveals that left and right pericentromere, though similar in sequence composition, differ markedly in suppression (53-fold versus 10-fold). The centromeric core exhibits at least 200-fold if not complete suppression. We discuss whether differences in (fine) composition reflect quantitative and qualitative differences in binding sites for heterochromatin proteins and in turn render different functional properties. Based on the presented data we estimate the sizes of Arabidopsis centromeres. These are typical for regional centromeres of higher eukaryotes and range from 4.4 Mb (CEN1) to 3.55 Mb (CEN4).
The term heterochromatin was originally coined by Heitz (1929) who described a portion of the nuclear chromatin which maintained a condense state throughout cell interphase. Although heterochromatin is difficult to define, a number of properties have been described (Karpen and Allshire, 1997), including suppression of both gene activity and genetic recombination.
The present work aimed to analyse a plant centromere with respect to structure, function and (hetero-)chromatin status. For several reasons the CEN1 of Arabidopsis presents an ideal model for this study. Using the closely linked gurke mutation (Torres et al., 1996; this study), it is possible to detect recombination breakpoints within this region. The Arabidopsis genome project (The Arabidopsis Genome Initiative, 2000) provides considerable sequence information, which, together with FISH analysis data, allows sequence structure and heterochromatin to be linked at CEN1. This work pays special attention to this latter point and investigates whether heterochromatin has the same effects in all centromere regions and whether suppression of recombination increases linearly when moving towards the centromere core. By integrating sequence, recombination and FISH data, a working model of the centromere is presented, which might contribute to the development of plant artificial chromosomes (PLACs).
Physical distances at the CEN1 region
The CEN1 region is, with exception of the centromeric core, well covered with BAC-contigs (Marra et al., 1999; Mozo et al., 1998; 1999; http://www.mpimp-golm.mpg.de/101/bac.html). Based on the sequence and FISH data (see below) we estimate that the region analysed in this study embraces about 660 kb of non-pericentromeric region plus approximately 1.06 Mb of pericentromere on the left (top) chromosome arm, approximately 2.26 Mb of the centromere core and about 1.06 Mb of the right pericentromere region, which is as large as the left pericentromere (Figure 1; see below). The left outermost marker TT1/tt1 (transparent testa) has been mapped by telotrisomic analysis on the top arm of chromosome 1 (Koornneef, 1983) and has been recently localised within BAC F11O6 (B. Weishaar and coworkers, pers. comm.). NIA2 and f6J23-t7 are the most proximal polymorphic markers of the left and right pericentromere, respectively. The centromeric core lies between these markers. The left and right pericentromeres are almost completely sequenced (The Arabidopsis Genome Initiative, 2000; Theologis et al., 2000). The right outermost (bottom arm) polymorphic marker used in this study is GAPB located in BAC F13A11 (Figure 1).
Genetic distances at CEN1
The gene GURKE (GK), which has been previously mapped near CEN1 (Torres et al., 1996), turned out to be tightly associated with CEN1. We used this marker and two different populations to measure genetic distances within this region. We found NIA2 to be the only marker without any recombinant with respect to GK, based on 6704 meiotic events represented. Therefore all measured distances in our maps are referred to NIA2 for convenience.
The position of 12 suitable polymorphic markers was verified by sequence comparison (Figure 1a; Table 1). The first population originated from gurke (gk) segregating F1 plants, which had been generated from crosses of tt1/tt1 mutants (in Ler background, wild-type for GK) with a heterozygous gk/GK line (in Col background, wild-type for TT1). The detected recombinants are distributed along 1.4 Mb and account for over 2 cm genetic distance (Figure 1b). However, the subsequent evaluation of these recombinants revealed that almost all of them represent recombination events between TT1/tt1 and f4A3-t7, respectively. According to these data TT1/tt1 is 1.7 cm apart from mi342 and this in turn is about 0.66 cm apart from f4A3-t7 (Table 2). A comparison with the physical distances immediately shows that the recombination frequency fits well with the average standard frequency of 1 cm per 200 Kb. The presence of two additional markers enabled us to assess the standard recombination frequency in different subregions between TT1 and f4A3-t7 (see Figure 1b; Table 2). Between f4A3-t7 and NIA2/GK, a stretch of approximately 765 Kb, only two recombinants remained. This means that a sudden and extremely pronounced suppression (53-fold) of recombination occurs within less than 15 Kb immediately proximal to f4A3-t7, the distance that requires two recombination events to hold the standard recombination frequency.
Table 1. Markers and clones used for fine mapping and FISH analysis
BAC F28L22 and other BACs + 200 Kb/BAC T4I21 + 200 Kb/BAC T4I21 + 510 Kb/BAC F7F22
FISH probe mapping mapping mapping
IGF BAC RFLP
BAC F7F22 + 580 Kb/BAC T8D8
FISH probe test mapping
RFLP and CAPS
+ 740 Kb/BAC F13A11
YAC CIC 5C9
Table 2. Physical and genetic distances of polymorphic markers
For details see Figure 1 and Experimental procedures; n.d. = not determinable
NIA2 and GK
TT1/tt1 and NIA2
2.45 ± 0.35 cm
TT1/tt1 and f3N5-t7
0.16 ± 0.13 cm
TT1/tt1 and mi342
1.7 ± 0.29 cm
mi342 and f1C13-t7
0.62 ± 0.17 cm
1C13-t7 and f4A3-t7
0.036 ± 0.04 cm
f4A3-t7 and NIA2
0.072 ± 0.06 cm
f4A3-t7 and α17A20
0.036 ± 0.04 cm
α17A20 and NIA2
0.036 ± 0.04 cm
CEN 1 core
approximately 2.26 Mb
0.05 ± 0.07 cm
f6J23-t7 and mi133
0.155 ± 0.11 cm
mi133 and GAPB
0.116 ± 0.09 cm
f6J23 and GAPB
0.271 ± 0.14 cm
GK and GAPB
0.348 ± 0.16 cm
The right border of CEN1 also exerts significant suppression of recombination but to a considerably lower extent compared with the left side (Figure 1b; Table 2). The remaining recombinant(s) between f6J23-t7 and NIA2 give a genetic distance across the centromere of 0.05 cm. This implies at least a 226-fold suppression of recombination across the CEN1 core estimating a core size of approximately 2.26 Mb (see below).
Sequence structure of the CEN1 region
We investigated the structure of the CEN1 region by analysing the sequence data provided by The Arabidopsis Genome Initiative (2000). Considering the whole chromosome 1 reveals an increase in (retro-) transposons and a decrease in genes, respectively, around the centromere (Theologis et al., 2000). Focussing on the CEN1 region this tendency is not immediately visible. Similar gene and repetitive element classes are represented on both pericentromeric sides. However, slight biases with respect to the distribution of particular classes were found (Figure 1c). Retrotransposons exhibit a tendency to accumulate towards the CEN1 core, whereas transposons are more evenly distributed along the considered regions. The gene classes were subdivided into unique, putative and unknown. The latter represent the uncharacterised and heterogeneous group of hypothetical genes and are mostly evenly distributed in CEN1 region. Putative genes, that is ORFs with significant similarities to known genes show a slightly biased concentration at different locations of both pericentromeres. The number of ‘unique’ genes that show high homology to characterised genes, decreases towards the CEN1 core. This stresses the exceptional position of NIA2 (and GURKE) which almost resides at the border of the CEN1 core. Moreover the number of predicted genes in this region might be higher than the actual number. By probing BACs F17A20 and F9D18 to different cDNA libraries we could only detect NIA2. This indicates that the presence of expressed genes is already rare in pericentromeres, although we do not exclude the possibility that some predicted genes might be transcribed at low levels. The region between TT1 and f1C13-t7 shows an increase in putative genes and a slight increase in ‘unique’ genes.
There is a strong tendency to accumulate repetitive elements other than (retro-) transposons towards the CEN1 core. Two further analyses reflect most clearly the complex repetitive architecture of the pericentromeric regions in context with other elements and genes. First, dotplot analysis revealed that all BACs, except F7F23 and T8D8, exhibited short dispersed repeats (at cut-off = 100). Some also possess few (dispersed) duplications of 2–4kb in size. Two BACs, F7F22 and F1I21, exhibit larger duplications of 14 and 9 kb, respectively. Imperfect tandem repeats were identified in F11O6 and F1I21, which reside at the outermost left and right border of the CEN1 region, respectively (Figure 1). The tandem in F11O6 is conspicuous, consisting of imperfect 2.6 kb units spreading along 45 kb but it does not share a similarity with known repeats in the CEN1 region. Second, the whole sequence of BACs F28L22 and F7F22, respectively, was compared with the sequence of all other sequenced BACs in this region. This shows that over long distances BACs share several, different dispersed repeats in more or less perfect copies (Figure 1d). Their density increases towards CEN1 core.
The complex structure of the pericentromeres stands in sharp contrast to the uniform CEN1 core. The combined sequence and FISH data presented, together with further molecular and FISH studies indicate that the gap between BACs F28L22 and T4I21 represents the CEN1 core, which is mainly filled with two repetitive elements, pAL 1 and 106B (see below).
FISH analysis of CEN1
In order to relate the recombination frequency and sequence data with the heterochromatin structure of the CEN1 region, we included FISH to pachytene chromosomes using several DNA clones.
BAC F17A20 is mapped to the left region close to the CEN1 core (Figure 1) and contains sequences that hybridise to centromeric regions of all Arabidopsis thaliana chromosomes (Figure 2a; Fransz et al., 2000; Fransz et al., 1998). We subcloned fragments of F17A20 and detected one 2.8 kb clone (named A2–4) with homology to reverse transcriptase. A2–4 hybridises to pericentromere regions of all chromosomes (Figure 2b). However, the FISH signals of F17A20 are clearly due to the cumulative hybridisation ability of many different repeats. BAC F17G21, which is localised on the left pericentromere (Figure 1) and does not harbour clone A2–4, and at least one other F17A20 repeat gives essentially the same FISH signals as F17A20 (Figure 2a). FISH with BAC F12A16, which is located in the right pericentromere, revealed a similar pattern (not shown). In conjunction with the sequence data this shows that the majority of the copies of repetitive elements from the pericentromeric regions of chromosome 1 resides in pericentromeres. Only a few elements make an exception. For instance, element 106B occurs predominantly in the centromere core but a few copies are found in several of the innermost pericentromeric BACs (not shown).
Intensive DAPI staining of pachytene chromosomes represents highly heterochromatic portions, which represent the centromeres and the nucleolar organising regions (Fransz et al., 1998). We noticed that especially FISH of F17A20/F17G21 and DAPI stains clearly show a simultaneous decrease in fluorescence signals at the same positions (Figure 2c,d,e). We transferred this coincidence to the physical map in the following way. The BACs F17A20 and F17G21 are completely covered by the sequenced BAC F28L22. We checked for homologies outside the positions of F28L22 (= F17A20/F17G21) by comparing the complete F28L22 sequence with all other sequenced BACs in the CEN1 region. These homologous segments represent numerous, partly overlapping dispersed repeats (vertical lines in Figure 1d). Based on our experience a minimum stretch of approximately 2 kb (with contiguous homology) is able to hybridise in FISH analysis (e.g. clone A2–4; Figure 2b). We designate such segments FISH-competent regions. Since the position of intense DAPI stain and FISH signals of BAC F17A20 coincide it follows that nearly the complete region represented in Figure 1(a) is heterochromatic except the region between BACs F11O6 and F10O5 which does not provide sufficient F17A20 FISH competent sequences (the repeats present are too short). The sequenced BAC F7F22, which covers F12A16, exhibits a similar although not identical distribution of FISH competent regions (Figure 1d). Also F12A16 FISH did not coincide with DAPI stain as precisely as F17A20 FISH and was therefore not used for localisation of heterochromatin.
Estimating the size of centromeres
The highly repetitious composition of centromeres impedes a physical size estimation since it is so far impossible to reliably array overlapping clones across these regions. Therefore we estimated the size of Arabidopsis centromeres by relating the extension of FISH competent regions and cytological extension of FISH signals at the centromeres. We took into account that BACs like F2C1 and F9D18 (Figure 1) link CEN1 pericentromere and core because they possess typical repeats of both regions. The distances F28L22-F2C1 and F9D18-T4I21 are not completely sequenced but should not exceed 2–300 Kb very much, due to the dense cover age with unsequenced BACs from F28L22 to F2C1 (see for instance http://www.arabidopsis.org/cgi-bin/maps/Pmap?chr = 1 & beg = 12600 & end = 13100 & nsg = & compress = 0). The last potential F17A20-FISH competent regions in the left pericentromere are located at either end of T22A15 (Figure 1). We conclude that the left pericentromeric heterochromatin spans the distance F2C1-T22A15 which is approximately 1.06 Mb. Since the right pericentromere has a similar size (Figure 2e), the entire pericentromeric heterochromatin of CEN1 is 2.12 MB in size leading to a condensation factor of 1.05 Mb µm−1 (Table3). Assuming a similar condensation factor for the core and the other centromeres, we can estimate the physical sizes of all centromere regions (Table 3).
Table 3. Estimated sizes of Arabidopsis thaliana centromeres
chr 1 [µm/Mb]
chr 2 [µm/Mb]
chr 3 [µm/Mb]
chr 4 [µm/Mb]
chr 5 [µm/Mb]
based on 1 μm = 1.05 Mb
CEN cores (= pAL1 size)
2.15 ± 0.46/ 2.26 ± 0.48
1.52 ± 0.33/ 1.6 ± 0.35
1.33 ± 0.26/ 1.4 ± 0.27
1.46 ± 0.41/ 1.53 ± 0.43
1.88 ± 0.31/ 1.97 ± 0.33
8.34 ± 1.3/ 8.76 ± 1.37
total CEN (= peri + core)
4.17 ± 0.86/ 4.4 ± 0.9
4.14 ± 1.33/ 4.35 ± 1.4
4.00 ± 0.7/ 4.2 ± 0.74
3.38 ± 0.48/ 3.55 ± 0.5
4.20 ± 0.6/ 4.41 ± 0.63
19.89 ± 2.89/ 20.88 ± 3.03
The present study integrates analyses of genetic, publicly available sequences and FISH data in order to obtain insight into the architecture and heterochromatin of Arabidopsis chromosome 1 centromere (CEN1). We consider the centromere 1 to be the entire central heterochromatin of chromosome 1, which covers large repetitive DNA regions of different, but characteristic, structural composition called pericentromeres and centromere core.
Determining CEN1 extension: impact for size estimation of Arabidopsis centromeres
We conclude from our data that the stretch from TT1 to f1C13-t7 (Figure 1a) is euchromatic and rather exhibits genetic and structural characteristics of non-centromeric regions, although it contains few dispersed and a block of tandem repetitions in the region covered by BAC F11O6. The size of the left pericentromere is estimated to be about 1.06 Mb. The left pericentromere ends in F2C1, which carries pericentromeric and centromeric core repeats. The right pericentromere very probably reaches the same size as evidenced by evaluating FISH and sequence data. The physical length of FISH competent regions compared with the size of FISH signals reveals a condensation value of 1.05 Mb μm-1 (Table 3). This value lies between that of NOR regions (2.2 Mb μm-1) and the heterochromatic knob hK4S (0.7 Mb/(m) of chromosome 4 (Fransz et al., 2000). We conclude that CEN1 is the largest centromere (4.4 Mb) and has also the largest core (2.26 Mb), while CEN4 is the smallest centromere (3.55 Mb) and CEN3 has the smallest core (1.4 Mb; see Table 3). These sizes are characteristic for so called regional centromeres in many multicellular eukaryotes (Hemleben et al. 2000; Tyler-Smith and Floridia, 2000; Willard, 1998). Do these estimations (Table 3) correspond to known data? In a few pachytene preparations the center of the CEN cores, which possibly cover the kinetochore regions, exhibit a fainter DAPI staining, thus indicating a less condensed packaging (P. Fransz, unpublished). As a corollary the indicated sizes would be slightly overestimated. Round et al. (1997) demonstrated the existence of large, uninterrupted 180 bp repeat tandem arrays between 400 kb and 1Mb in size. However, the same work detected many additional 180 bp repeats containing fragments between 100 kb and 400 kb in size. These latter portions have to be added to the larger fragments to estimate the size of the CEN cores. In the case of CEN1, for which two large fragments of 810 kb and 570 kb were detected (ibid.), the addition of smaller fragments could easily lead to a size of approximately 2 Mb. We noticed that all centromeres except CEN4 are similar in size. Note however, that probably an inversion event at CEN4 has taken place in the analysed ecotype Columbia. This displaced part of its pericentromere and generated the heterochromatic knob hk4S (Fransz et al., 2000). It follows from these estimations that about 9 Mb of the Arabidopsis genome representing the centromere cores (and additional parts of the pericentromeres) are not sequenced.
Architecture of Arabidopsis CEN1
The structural sequence information combined with FISH using pAL1 clearly proves that the core heterochromatic region is quite different from its flanks (Figures 1 and 2). The core consists of long tandem arrays of 180 bp pAL1 repeats, also termed Atcon (Heslop-Harrison et al., 1999), which are interrupted by copies of the LTR from the Athila retrotransposon 106B (Brandes et al., 1997; Fransz et al., 2000; Fransz et al., 1998; Maluszynska and Heslop-Harrison, 1993; Martinez-Zapater et al., 1986; Murata et al., 1994; Pelissier et al., 1996; Round et al., 1997; this study). This structural pattern is reminiscent of other cases in plants and Drosophila (Presting et al., 1998; Sun et al.; 1997). CEN1 pericentromeres are repetitive in structure but quite different from the CEN1 core. They consist of numerous dispersed repeats sometimes interspersed with functional genes (Figure 1). At present, the number of active genes in these regions is not known. For instance, F28L22 harbours three gene candidates but we only found NIA2 by hybridisation of F17A20 to different cDNA libraries (not shown). However, it is clear that at least some active genes reside within these highly heterochromatic regions, for instance, NIA2, GURKE and GAPB within CEN1 (this study), others might be at the borders of CEN2 and CEN4 (Copenhaver et al., 1999). Their stable transcription within centromere regions is possibly ensured by the presence of locus control regions (LCRs; Festenstein et al., 1996). Transcription might even be dependent on a heterochromatic environment, as is the case for Drosophila genes like rolled and light (Eberl et al., 1993 and references therein). This raises the question of whether genes exist in the centromeric core. At present there is no sequence information on ‘core’ BACs. However, cumulative evidence suggests that the CEN cores harbor even fewer genes than pericentromeres, if any. All FISH analyses performed in this study and all studies cited indicate a monotonous architecture mainly or exclusively based on two repeat elements, although large probes containing many different repeats and complex DNA stretches have been applied. The density of (predicted) genes decreases in the pericentromeres when progressing towards the core. To our knowledge no gene embedded in regions consisting of 180 bp tandems has been reported. Even GURKE could still be localised in one of the border regions. Most convincingly Round et al. (1997) demonstrated the presence of very large uninterrupted 180 bp repeat tandems applying different 6 bp-restriction enzymes. This excludes the presence of complex DNA sequences in the core. Another functional feature, recombination, is not fully absent in the pericentromeres but probably in the core. We consider these facts in our model for CEN1 (Figure 3).
The basic architecture of CEN1 (Figure 3) seems to be conserved between all Arabidopsis centromeres. This is also followed from comparison of the presented data with the analysis of CEN2 and CEN4 (pericentromeric) sequences (Copenhaver et al., 1999) and the cytological analysis of CEN4 (Fransz et al., 2000). The core probably harbours the region, which contacts the kinetochore (Figure 3) because the tips of metaphase chromosomes, which are ahead of all other chromosome parts and point towards the poles, give FISH signals exclusively with pAL1 or 106B (Fransz et al., 1998). Pericentromeres could have an unknown (stabilising) function, for example in sister chromatid cohesion, or alternatively represent by-products arising during the establishment of centromeres.
Properties of CEN1 heterochromatin and its impact in the CEN1 region
Judging from DAPI staining the Arabidopsis CEN1 region largely exhibits a non-differentiated appearance of the heterochromatin. Differences are uncovered by analysis of the architecture as discussed above and are reminiscent of Drosophilaα− (pericentromeres) or β− heterochromatin (centromeric core; Miklos and Cotsell, 1990). In Drosophila the α– heterochromatin contains several highly repeated satellite DNA sequences and is genetically inert, whereas the β– heterochromatin, which is less pronounced and only visible in polytene chromosomes, contains middle repetitive DNA, interspersed with actively transcribed genes (Lohe and Hilliker, 1995). It is still to be proven whether the (peri)centromeric domains of Arabidopsis are structural and functional equivalents of Drosophila α– and β– heterochromatin. Since core and pericentromere regions are structurally different, models discussing DNA-binding sites for chromatin proteins which co-operate to induce heterochromatization (Fanti et al., 1998) have to take this into account. Whether core and pericentromeres carry different and/or similar binding sites for the acquisition of heterochromatin proteins is not known. Retrotransposons, which occur in all centromeric regions are discussed as inducers of heterochromatin (The CSHL/WUGSC/PEB Arabidopsis Sequencing Consortium, 2000). In this context further analysis will be directed to the left pericentromere border of CEN1 because it shows a gradual change in sequence composition accompanied by a change from eu- to heterochromatin and a drastic change in function, that is recombination.
The suppressive effects of heterochromatin on gene expression and recombination have been known for decades (Baker, 1958; Beadle, 1932). In the case of CEN1, taking advantage of lethal gurke alleles, we could measure recombinations within the CEN1 heterochromatin. Surprisingly this reveals that the structurally similar left and right pericentromeres affect recombination to extremely different degrees showing a 53-fold versus 10-fold suppression. This effect might be influenced or determined by varying compositions of different heterochromatin-associated proteins, of which some are already known (Lohe and Hilliker, 1995; Pidoux and Allshire, 2000; Sullivan, 2001). Recent data indicate the existence of qualitative differences of heterochromatin. Analysis of mutants of the heterochromatin-associated histone H3-methylase suggests that different factors control Clr4 activity at centromeres compared with the mat2/3 locus, which is another yeast heterochromatic region (Nakayama et al., 2001). The observations at CEN1 (peri)centromeres immediately pose the question of whether quantitative and qualitative differences in sequence composition itself cause ‘different’ heterochromatin, since both pericentromeres certainly are not identical. For instance, repeats like A2–4 show a biased distribution (it is prominent in the left pericentromere). Does that mean that sequence composition, that is abundance of binding sites for heterochromatin proteins, (fine-) tunes heterochromatin packaging and in turn suppression? We know one clear case for such a correlation: the CEN1 core structure is quite different from that of pericentromeres. Suppression of recombination is also markedly different between CEN1 core and pericentromeres, namely 226-fold versus 53-fold and versus 10-fold, respectively (Table 2). This difference might be even higher since the recombination events between f6J23-t7 and NIA2 probably took place within the last innermost BACs of the right pericentromere not within the core. This means that the core completely suppresses recombination. At present, few cis elements with corresponding heterochromatin proteins are known. The CENP-B box is a consensus motif in plants and animals which binds a conserved centromeric protein, CENP-B (Aragon-Alcaide et al., 1996; Masumoto et al., 1993; Willard, 1990). The formation of telomeric core heterochromatin in yeast is dependent on the repressor activator protein 1 (RAP1), which binds to a terminal 300 bp region containing C1−3A repeats (Grunstein, 1997; Wright et al., 1992).
The Arabidopsis CEN1 appears as a large repetitive structure with at least three or four domains of different complexity (Figure 3). Next to the kinetochore, which serves as a module for spindle fibre attachment, the other CEN1 regions might support establishment, stabilization and further functions of the centromere. This study shows that these domains have remarkable similarities but also remarkable differences. Their heterochromatic status does not allow prediction of identical (suppressive) effects. Suppression of recombination can occur abruptly instead of showing gradual increments. Moreover, centromeric heterochromatin exhibits striking functional differences in its different domains. Future work will be directed to uncover further (cis and trans) elements, which establish centric heterochromatin and reveal whether subtle differences in composition gradually influence the functional properties of heterochromatin as reflected by recombination.
Plant material, growth conditions and genetic crosses
For genetic analysis we used Arabidopsis thaliana strains Landsberg erecta (Ler) ecotype, Niederzenz (Nd) ecotype, a tt1/tt1-mutant line in the Ler background (yellow seed phenotype; kindly provided by ABRC, Ohio State University, Columbus, OH, USA), a heterozygous gk/GK-Line in the Ler-background and a gk/GK-mutant line in the Columbia background (kindly provided by D. Meinke, University of Oklahoma, USA). Plants were grown and crossed as previously described (Torres et al., 1996).
Probes used for mapping purposes or FISH analysis, respectively, can be taken from Table 1. Markers were used as conventional RFLP hybridisationprobes, CAPS/PCR probes or as duplex markers on high resolution gels according to Hauser et al. (1998). For FISH analysis were used: whole BACs (as indicated in Table 1), the clones A2–4 (a 2.7-kb clone homologous to reverse transcriptase present in several copies in BAC F28L22), the pAL1 repeat (Martinez-Zapater et al., 1986), a 5S ribosomal gene clone (Campell et al., 1992) and the amplified BAC ends f5B13-t7 and f4N16-t7.
Mapping and recombination frequencies
Genetic distances at CEN1 were obtained by taking advantage of lethal mutations in the GURKE gene. To establish the first population a gk/GK-heterozygous line (TT1/TT1) was crossed to tt1/tt1 homozygous line (GK/GK). F1 plants segregating gk mutants were selected. Between TT1 and GK we identified 78 recombinant F2 plants (among 2056) representing five recombinant classes. The recombination frequency (p) between GK/gk and TT1/tt1 (or any other marker) is calculated according to the formula:
where x is the frequency of recombinants in the generated F2 population. For this purpose leaves of F2 plants provided DNA for PCR marker analysis. F3 progenies of single F2 plants gave DNA material for analysis with RFLP markers, which could not be converted to PCR/CAPS markers (Table 1). Twelve of 78 recombinants had only poor progeny and did not give enough DNA for Southern analysis. This causes a slightly lower genetic distance value between TT1 and mi342 since only 66 individuals could be used for all analyses. However, this does not affect any of the observations and conclusions made. The second population originated from a cross of GK/gk in the Ler background with GK/GK line in Niederzenz background. All recognisable recombinant classes (7) were evaluated and p was calculated according to:
The resulting recombinants (9 from 1296 plants) were subjected to segregation analysis considering markers between GAPB and GK/NIA2 including NIA2 itself. The genetic distance across CEN1 is based on 1000 plants of the second population, for all of which the NIA2 segregation had additionally to be assessed (because the applied selection only detects recombinants between GK and GAPB, not between NIA2 and GK!). The n-fold suppression of recombination in Table 2 is calculated as follows:
where x is the measured physical and y the measured genetic distance between two markers. n = 1 represents the standard recombination frequency of 1 cm per 200 kb. The distance necessary to maintain n = 1 at a given recombination frequency can be calculated, e.g. if y = 0.072 cm between f4A3-t7 and NIA2 then the physical distance between both would have to be 14.4 kb to hold n = 1.
Standard deviation of recombination frequencies (s.d.) was calculated according to the formula
(Koornneef and Stam, 1992), where N is the number of inividuals analysed. The correct calculation of s.d. for the first population (TT1/tt1), which evaluates five of seven recognisable recombinant classes is given by Servitova and Cetl (1984). However, for small P-values the above equation gives almost identical s.d.-values. For the same reason recombination percentages (= p X 100) could be directly transformed into centimorgans without correction with the Kosambi function.
FISH analysis was essentially performed as described in Fransz et al. (1998) and Fransz et al. (2000). The Arabidopsis ecotype Wassileskija and Columbia was used. It was possible to unambiguously identify CEN1 in those FISH preparations, which delivered well spread chromosomes. In these CEN2, 4 and 5 could be identified by other established probes (Fransz et al. 2000; Fransz et al., 1998). Thus CEN3 and CEN1 remained to be distinguished. This was achieved by looking at the submetacentric position of CEN3 in preparations where the chromosomes were optimally spread such that they were traceable (Figure 2). In these cases pAL1 signal of CEN1 is larger than pAL1 of CEN3 or any other centromeres. Thus CEN1 exhibits the largest pAL1 signal and CEN3 has one of the smallest. F17A20 and pAL1 signals were measured as shown in Table 3 for 13 clear cases. When we relate that to the size of FISH competent regions in CEN1 and account for comparable sizes of CEN1 pericentromeres in all preparations this results in a condensation factor of 1.05 Mb μm-1.
Computer analysis of sequenced BACs
The sequences of all BACs shown in Figure 1 were either obtained from MIPS (http://www.mips.biochem.mpg.de/cgi-bin/proj/thal/clonelist?chr1/k; ftp://warthog.mips. biochem.mpg.de/pub/cress/chr1/clones/) or TIGR (http://www.tigr.org/tdb/at/atgenome/chr.I.status.html). Gene candidates were searched with the gene prediction algorithm GENESCAN (http://ccr-081.mit.edu/GENSCAN.html). Resulting ORFs were subjected to similarity searching using BLASTX, which indicates sequences with the potential to encode a protein similar to entries in a protein database. We used also BLASTN for similarity search at the DNA sequence level (http://www.ncbi.nlm.nih.gov/BLAST/). ORFs were separated into potential (retro-) transposons identified by similarities to transposase, reverse transcriptase and polyproteins and other proteins, respectively. Proteins were qualified by separating them into different classes according to similarity values: ‘unique’ are those with high similarity (range 0–10−50 expectation) to a known gene or function; putative are those with lower similarity values (10−50−10−3 expectation) and the others are classified as unknown (> 10−3 expectation). Hybridisation screening of three different lambda libraries with total F28L22 or F9D18 DNA (innermost pericentromeric BACs, see Figure 1) did only detect NIA2 cDNAs (not shown). Dotplots were performed online at the Virtual Genome Centre University of Minnesota (http://alces.med.umn.edu/rawdot.html; http://alces.med.umn.edu/bin/newwebdot) at a cutoff of 100. Complete F28L22 and F7F22 sequences were ‘blasted’ (BLAST 2 sequences option) to all sequenced BACs given. Detected homologies were transferred according to size to provide the scheme indicating possible FISH competent stretches (see text and Figure 1). Thin bars indicate small (< 2 kb) stretches, which very likely are not FISH competent according to our experiences. FISH competent regions are found at locations were many overlapping dispersed repeats built up thick lines or boxes (Figure 1).
We thank M. Frey, U. Genschel and K. F. X. Mayer for critical reading of the manuscript and B. Weisshaar and coworkers for communicating results prior to publication. We are especially indebted to A. Gierl for generous and consistent support of our work and K. F. X. Mayer (MIPS) for help in the sequence analysis and gratefully acknowledge the Deutsche Forschungsgemein schaft for support to R. A. Torres-Ruiz (grant To134/1–4).