Comparative physical mapping of segments of the genome of Brassica oleracea var. alboglabra that are homoeologous to sequenced regions of chromosomes 4 and 5 of Arabidopsis thaliana

Authors


*For correspondence (fax +44 1603 259882

; e-mail ian.bancroft@bbsrc.ac.uk).

Summary

Due to their relatedness to Arabidopsis thaliana (Arabidopsis), the cultivated Brassica species represent the first group of crops with which to evaluate comparative genomics approaches to understanding biological processes and manipulating traits. We have constructed a high-quality binary BAC library (JBo) from genomic DNA of Brassica oleracea var. alboglabra, in order to underpin such investigations. Using the Arabidopsis genome sequence and clones from the JBo library, we have analysed aspects of gene conservation and microsynteny between a 222 kb region of the genome of Arabidopsis and homoeologous segments of the genome of B. oleracea. All 19 predicted genes tested were found to hybridize to clones in the JBo library, indicating a high level of gene conservation. Further analyses and physical mapping with the BAC clones identified allowed us to construct clone contig maps and analyse in detail the gene content and organization in the set of paralogous segments identified in the genome of B. oleracea. Extensive divergence of gene content was observed, both between the B. oleracea paralogous segments and between them and their homoeologous segment within the genome of Arabidopsis. However, the genes present show highly conserved collinearity with their orthologues in the genome of Arabidopsis. We have identified one example of a Brassica gene in a non-collinear position and one rearrangement. Some of the genes not present in the discernible homoeologous regions appear to be located elsewhere in the B. oleracea genome. The implications of our findings for comparative map-based cloning of genes from crop species are discussed.

Introduction

Continuing development of molecular markers is facilitating rapidly increasing resolution of plant genome structure. Using common sets of DNA markers, genetic linkage maps developed for different species have been compared. Such studies have revealed a high degree of linkage conservation both between the genomes of several monocot species ( Moore et al. 1995 ; Tikhonov et al. 1999 ) and between a range of dicots ( Cavell et al. 1998 ; Lagercrantz, 1998). From this work, collinear chromosome regions and orthologous loci have been identified. Even the possibility of synteny with practical applications bridging the monocot–dicot divide has been predicted ( Paterson et al. 1996 ). Comparative studies between Oryza sativa (rice) and Arabidopsis thaliana (Arabidopsis) found five genes whose synteny was altered by a single inversion, but these were interspersed by non-conserved genes ( van Dodeweerd et al. 1999 ). Comparative physical mapping and sequence analysis of genetically identified collinear regions will clarify the extent to which macrosynteny is maintained at the inter- and intragenic levels. Sequencing data comparing orthologous regions in maize, sorghum and Arabidopsis indicate that a much higher degree of diversity exists at the genome microstructural level than predicted from genetic mapping studies ( Tikhonov et al. 1999 ).

Brassica oleracea is a diploid species with many subspecies covering a wide range of commercially important vegetable crop forms such as broccoli, cauliflower, cabbage, kale and brussels sprouts. To date four genetic linkage maps have been reported for B. oleracea, all of which demonstrate that this genome is highly duplicated ( Bohoun et al. 1996 ; Kianian & Quiros, 1992; Landry et al. 1992 ; Slocum et al. 1990 ). Its nuclear DNA content has been estimated to be approximately 600 Mb ( Arumuganathan & Earle, 1991).

Arabidopsis is used extensively in many areas of plant biology research. Its small genome (approximately 130 Mb) and low amounts of repetitive DNA mean that it is well suited for genetic and physical mapping studies. The complete genome sequence should be available by the end of the year 2000, with those of the first two chromosomes having been reported ( Lin et al. 1999 ; Mayer et al. 1999 ). Arabidopsis sequencing data and the accompanying information on the functions of genes identified will be important for an understanding of other species with more complex genomes, particularly related species. Brassica genes have been shown to share a high level of sequence conservation with their Arabidopsis orthologues, typically >85% nucleotide identity in coding regions ( Cavell et al. 1998 ). Brassicas are the most closely related group of crops to Arabidopsis, and are therefore the obvious choice for evaluating comparative genomic approaches to understanding and manipulating biological processes and traits in crops.

Comparative genetic mapping between Brassica species and Arabidopsis ( Cavell et al. 1998 ; Lagercrantz, 1998) has shown that the diploid Brassica genomes are extensively triplicated. The overall pattern of organization suggests that although many rearrangements have occurred, each unit is approximated by, and has extensive collinearity with, the Arabidopsis genome. These results raised the expectation that a general description of the collinearity between Brassicas and Arabidopsis would make it possible to use physically mapped Arabidopsis clones and DNA sequence data to directly assist Brassica genome analysis. The comparative information upon which this is based is, however, derived from the genetic mapping of a relatively small number of genes and anonymous DNA fragments, often widely separated along the chromosome. Establishing the extent to which microsynteny (i.e. gene-by-gene organization) is conserved will provide clues to structure–function relationships of plant genome organization and insights into the mechanisms of plant genome evolution, and will define vital parameters to guide researchers starting to conduct gene-isolation experiments based on comparative mapping approaches.

In order to understand the details of genomic organization and microsyntenic relationships for large, duplicated genomes such as the Brassicas, which are never likely to be fully sequenced, large insert clone libraries are an essential tool. Bacterial artificial chromosome (BAC) vectors can carry large inserts (up to 300 kb) and have lower frequencies of chimeric and rearranged clones (approximately 5%; Bent et al. 1998 ) than other clone types such as yeast artificial chromosomes. This has lead to the current extensive use of BAC vectors for library construction. The Binary-BAC (BIBAC) vector ( Hamiliton, 1997) incorporates two technologies: the BAC ( Shizuya et al. 1992 ), and the binary vector strategy for Agrobacterium-mediated plant transformation ( Hoekema et al. 1983 ). Transforming large fragments of DNA (>100 kb) into plants would make it feasible to introduce genomic fragments encoding quantitative trait loci or gene clusters (as is frequently the case for disease resistance genes), or for use in studies to confirm that a particular clone contains the gene or allele of interest.

In this study a BIBAC library (33 742 clones, average insert size 145 kb) was constructed using genomic DNA from B. oleracea var alboglabra and was probed with 19 genes from a 222 kb sequenced region on Arabidopsis chromosome 4. Homoeologous regions within the Brassica genome were identified and microsynteny was evaluated.

Results

Library construction

The concentration and quality of the DNA to be cloned are critical in any attempt to construct a large-insert, highly redundant library. In this report the method of preparation of high molecular-weight DNA largely followed established protocols ( Zhang et al. 1995 ), with some important modifications. The high molecular-weight DNA preparation is a high-loss procedure, therefore it is important to embed the isolated nuclei at maximum density without adversely affecting block stability or pulsed-field gel electrophoresis (PFGE) fractionation. To construct the B. oleracea library we aimed to achieve a final concentration of 5 μg genomic DNA embedded per block, with 18–20 blocks to give a total of 90–100 μg. It was important prior to digestion to analyse one of these blocks on a gel run under pulsed-field conditions in order to check both the concentration and integrity of embedded DNA. Small size or low concentration of embedded DNA necessitated the repetition of the preparation procedure.

The restriction endonuclease Sau3AI exhibits four-base-pair recognition specificity and produces termini compatible for ligation purposes with the BamHI cloning site of the pBIBAC 2 vector. It cleaves genomic DNA more frequently than BamHI, generating more random fragmentation. For these reasons it was chosen to generate B. oleracea high molecular-weight DNA suitable for the construction of the pBIBAC library. To construct a high-redundancy library efficiently, it is essential that the conditions for partial digestion are correct. By first calibrating a tube of Sau3AI through a dilution series, the two concentrations 0.25 and 0.5 U per block were identified as giving the maximum yield in the DNA size range 110–200 kb. The partially digested DNA was run out on a gel under PFGE conditions, and the regions containing DNA in the size ranges 110–130 and 130–200 kb were identified and excised. By turning the excised gel slices through 180° and running these out on a second gel under identical conditions, the desired DNA was compressed into sharp bands which improved the efficiency of DNA recovery. It has been reported that the integrity of recovered high molecular-weight DNA may be compromised by high-temperature melting and subsequent agarase digestion of gel-purified partial digests ( Osoegawa et al. 1998 ; Strong et al. 1997 ). Use of electroelution to recover the DNA avoids these problems and also allows routine high-strength agarose to be used for easier manipulation of gels. In this work we used the GeneCapsule electroelution devices to recover the DNA from the gel, and achieved a high final yield (approximately 1 μg per preparation) and quality as determined by PFGE analysis. This was vital in achieving good ligations and large numbers of high-quality insert clones following electroporation. The DNA was used in ligations immediately after dialysis to avoid deterioration. Only the 130–200 kb DNA size fraction prepared was used in ligations to generate the library. For ligation reactions the vector and insert DNA concentrations were optimized to achieve a high cloning efficiency for fragments 130–200 kb in size, with minimal numbers of chimeric clones. Transformation efficiency (after dialysis of the ligated mixture against 0.1 × TE) ranged up to 708 colonies per μl of ligation mixture. The library constructed was termed the John Innes Centre Brassica oleracea (JBo) library.

Analysis of clone inserts

To evaluate the average insert size over the entire library, one well was randomly chosen from each of the 88 microtitre plates in which the library was arrayed. DNA was isolated from the clones, digested with NotI to release the clone inserts, and resolved by PFGE. Of the 88 clones analysed, 13 (15%) apparently did not have an insert. This figure is in keeping with previous reports and may be due to degenerative cutting/star activity of the enzyme used to prepare the cloning vector ( Osoegawa et al. 1998 ). A further four of the 88 clones analysed had insert sizes less than 20 kb. The insert size distribution of the remaining clones analysed is given in Fig. 1. Despite using only the DNA size fraction 130–200 kb, some clones with smaller inserts were found. This phenomenon has been reported previously, and is thought to be due to the entrapment of DNA molecules in the agarose gel mixture during PFGE. The proportion of small insert clones was, however, minimized by the two-step size-selection process. Further benefits of this approach were demonstrated by the large average insert size and the homogeneous clone size distribution. Overall, an estimated 85% of clones in the 33792-clone library have an average insert size of 145 kb, which should provide a 6.9-fold redundant representation of the approximately 600 Mb genome of B. oleracea ( Arumuganathan & Earle, 1991).

Figure 1.

Insert size range for 71 clones randomly chosen from the JBo library.

Analysis of the library using Arabidopsis gene-specific probes

For this comparative study we chose a 222 kb sequenced segment of Arabidopsis chromosome 4, between gene models DL4665 and DL4935 ( http://www.mips.biochem.mpg.de). This region contains 55 predicted genes. A subset of 19, spread through the region was selected for this work on the basis of being low-copy genes. All 19 genes were hybridized separately under low-stringency conditions to two high-density BAC colony hybridization filters representing the complete JBo library. The result of hybridization of one PCR-generated gene-specific probe, DL4885, to half of the library, representing plates 1–48, is shown in Fig. 2. The range of signal intensities observed illustrates the need to apply cut-off criteria when scoring autoradiographs, in order to differentiate between true positive BACs and spurious signals. Experience demonstrated that very weak signals should not be scored as representing positive clones. The stringency of the hybridizations was set as low as practicable to allow detection of even significantly diverged homologues. Strongly hybridizing positive clones were identified for all 19 gene-specific probes used, indicating extensive conservation of the gene repertoire between B. oleracea and Arabidopsis, and the efficacy of PCR generation of gene-specific probes from pooled Arabidopsis clones. JBo BACs hybridizing to the gene-specific probes were identified and collected into ‘bins’. It was clear from the large numbers of BACs hybridizing with the gene probes that multiple members of gene families were being identified. Clones common to different probes were noted for later use in assembling contiguous arrays of overlapping clones (contigs).

Figure 2.

The autoradiograph following hybridization of one PCR-generated gene-specific probe, DL4885, to the double-spotted colony filter representing half of the JBo library, plates 1–48.

Identification and clone contig representation of homoeologous Brassica regions

From existing knowledge of the structure of the Brassica genome, we expected to identify three regions in the B. oleracea genome paralogous to each other and homoeologous to the Arabidopsis region. In addition, it has recently been shown that part of the Arabidopsis chromosome 4 region studied in this work is duplicated on Arabidopsis chromosome 5 ( Bancroft, 2000). Therefore it was expected that six contigs would be necessary to represent the equivalent regions within B. oleracea. It was possible to distinguish Brassica regions homoeologous to the Arabidopsis chromosome 4 region from those homoeologous to the Arabidopsis chromosome 5 region by the presence or absence of eight genes: DL4665, DL4685, DL4725, DL4755, DL4860, DL4910, DL4915 and DL4935. These are present in the Arabidopsis chromosome 4 region, but not in the Arabidopsis chromosome 5 region.

To identify clones within each of the Brassica paralogous regions, DNA was prepared from all clones in the gene-specific bins generated from the colony hybridization data. These clones were digested with HindIII, the fragments were resolved by agarose gel electrophoresis and Southern blotted onto nylon membranes. These Southern filters were probed with the 19 gene-specific probes under low-stringency hybridization conditions, as before. After hybridization it was observed that clones from the same gene bin displayed different banding patterns. This reflects HindIII fragment pattern polymorphisms in the different Brassica paralogous regions. The number of loci containing homologous genes for each Arabidopsis gene tested was thus identified. These ranged between two and 11, as shown in Table 1.

Table 1.  Numbers of BACs hybridizing to gene-specific probes
Gene
model
Number of clones
hybridizing
Number
of loci
Average
redundancy
DL4665c1226.0
DL4675c2446.0
DL4685w1628.0
DL4705w2237.3
DL4710w1836.0
DL4725w221.0
DL4740w1226.0
DL4755w723.5
DL4765w2446.0
DL4775c3657.2
DL4785w1543.8
DL4820c2363.8
DL4840c1243.0
DL4860w1434.7
DL4885w1844.5
DL4900w48114.4
DL4910c2245.5
DL4915w1226.0
DL4935c1125.5
348695.0

Contigs were assembled by analysing clones common to two or more genes and their hybridization banding patterns. From initial analysis nine contigs were constructed, each displaying different coverage of the regions under investigation. To link smaller contigs to larger ones, suitable end BACs were identified and hybridized under high-stringency conditions to Southern blots of candidate contigs. We had previously determined (data not shown) that there is very little cross-hybridization between BACs from paralogous regions under these conditions. Using this approach, seven contigs were assembled as shown in Fig. 3(a,b). The final stages of the assembly of these contigs were contig C, DL4705–DL4775, extended to include DL4785–DL4840 by probing with the BAC 33F23 from the smaller contig to a Southern blot of the BACs in contig C. The last clone in this contig, 46B11, hybridized to 33F23. Contig F, which initially extended between DL4665 and DL4860, was linked to a smaller contig, DL4755–DL4775, by hybridizing the BAC 84C3 to a Southern of the shorter contig BACs. This identified an inversion between DL4740 and DL4755, resulting in the new gene order DL4740, DL4860, DL4775, DL4765, DL4755. Despite hybridizing with BACs from (i) the extended contig F and (ii) the unlinked DL4885 clone bin to a Southern filter of contig G, and also probing in the opposite direction, it was not possible to link contig G with any other. It is probable that no linkage could be made for this contig due to the large size of the region needing to be spanned, or large-scale rearrangements. There were also unlinked groups of BACs that hybridized to only one gene-specific probe: one group for DL4685w, one for DL4725w, two groups for DL4820c, two for DL4860w, four for DL4885w, nine for DL4900w and four groups for DL4910c, as listed in Table 2.

Figure 3.

Figure 3.

BAC clone contigs assembled for regions homologous to Arabidopsis chromosome segments.

(a) BAC clone contigs assembled for regions homologous to the Arabidopsis chromosome 5 segment. l, gene-specific probe hybridized to BAC; ¡, no hybridization detected between gene-specific probe and BAC. Where the sizes of the inserts of the BAC clones have been determined, the size is indicated in italics beside the BAC clone coordinate.

(b) BAC clone contigs assembled for regions homologous to the Arabidopsis chromosome 4 segment. Annotation as for (a).

Figure 3.

Figure 3.

BAC clone contigs assembled for regions homologous to Arabidopsis chromosome segments.

(a) BAC clone contigs assembled for regions homologous to the Arabidopsis chromosome 5 segment. l, gene-specific probe hybridized to BAC; ¡, no hybridization detected between gene-specific probe and BAC. Where the sizes of the inserts of the BAC clones have been determined, the size is indicated in italics beside the BAC clone coordinate.

(b) BAC clone contigs assembled for regions homologous to the Arabidopsis chromosome 4 segment. Annotation as for (a).

Table 2.  Grouping of BIBACs by hybridization pattern with gene-specific probes
Gene modelHybridization pattern (kb fragments)BIBACs showing pattern
DL4665c3.0, 1.87H9, 16O22, 77G15
2.5, 1.7, 1.418D17, 41I3, 42J14, 45N16, 51J3, 63B24, 73K5, 77O7, 84H6
DL4675c4.58G4, 9H8, 58E11, 59l15, 70A17
3.018D17, 41I3, 51J3, 63B24, 73K5, 77O7, 84C3, 84H6
2.5, 1.329O13, 57D6, 63C18, 66J22, 71G23, 87K8
1.39B13, 26I24, 58K8, 60M20
DL4685w7.04P8, 18D17, 41I3, 51J3, 63B24, 73K5, 77O7, 84C3, 84H6
4.01E18, 17K9, 20A1, 46B8, 62C14
DL4705w2029O13, 57D6, 61N3, 63C18, 66J22, 71G23, 87K8
15, 3.01F9, 2C1, 2D23, 17P14, 39B15, 39M20, 48F19, 66l7
1.518D17, 51J3, 63B24, 73K5, 77O7, 84C3, 84H6
DL4710w3.218D7, 51J3, 63B24, 73K5, 77O7, 84C3, 84H6
3.0, 1.51F9, 2C1, 2D23, 17P14, 39B15, 39M20, 48F19, 66l7
1.4, 1.07H9, 16O22, 77G15
DL4725w14, 5.551O19
1287H24
DL4740w12, 1.626I24, 28C16, 52I9, 58K8, 60M20
7.5, 1.718D17, 51J3, 63B24, 73K5, 77O7, 84C3, 84H6
DL4755w127H9, 16O22, 77G15
3.530J8, 40J3
DL4765w6.512J12, 14I23, 30J8, 34l19, 40J3, 56E17, 56G10
3.5, 2.5, 1.07H9, 8K11, 16O22, 77G15
3.316G6, 26I24, 28C16, 52I9
2.51F9, 5C1, 17P14, 39B15, 39M20, 46M9, 48F19, 55F15, 66l7
DL4775c4.5, 2.5, 1.91F9, 5C1, 17P14, 39B15, 39M20, 46B11, 46M9, 48F19, 55F15, 66l7
4.0, 3.412J12, 14I23, 30J8, 34l19, 40J3, 56E17, 56G10
3.5, 0.816G6, 26I24, 28C16, 52I9, 76G14
2.832K7, 36I20, 45P4, 47K5, 58E11, 59E24, 59l15, 66B15, 87H24
2.2, 0.77H9, 8K11, 16O22, 77G15
DL4785w>24, 2.028C16, 76G14
12, 7.033F23
6.0, 3.832K7, 47K5, 58E11, 59l15, 59E24, 66B15
1.257D6, 61N3, 71G23, 75A15, 87E14
DL4820c6.08G4, 47K5, 58E11, 59l15, 66B15, 88G23
1.819I13, 61N3, 75A15, 87E14
6.3, 1.8 (both weak)11B14, 27l23, 31B13, 33F23, 69D6
5.8, 0.8 (both weak)32O7, 67H8, 84K21, 86K14
4.5 (weak)84D17
1.85 (weak)12B20, 76G14
DL4840c6.511B14, 31B13, 69D6
4.67H9, 8K11, 39P21, 77G15
3.0, 1.912B20, 76G14
.519I13, 25D1, 75A15
DL4860w2.86D5, 8G4, 9H8, 54J20, 55D1, 69D15, 70A17, 88G23
15, 12, 1.8 (all weak)51J3, 77O7, 73K5, 84C3, 84H6
12, 5.0 (both weak)14G7
DL4885w6.0, 5.0, 3.551N22, 74C12
5.0. 4.585J9
2.55B23, 21I17, 25C9, 26K2, 31A6, 39P13, 52I23, 80K17, 81B22, 85F7
2.065G14
DL4900w4.5, 1.231O3
3.9, 1.841F5, 75l5
2.8, 0.9, 0.878G6, 80P22
1.7, 1.3, 0.82C22, 8K11, 23J21, 25O5, 39P21, 40O10, 41N20, 57D3, 65K14, 67A14, 77K13, 86F1
9.0, 2.6 (both weak)77D21, 85N19
4.0, 1.3 (both weak)23G5, 26E19, 41O13, 60M2
3.9 (weak)55H2, 58H20, 60C22, 68C5, 69K9
2.7 (weak)36G13, 44F8, 47D15, 58K24, 68E8, 77I13
2.6, 1.2 (both weak)69F1
2.6, 1.0 (both weak)24F14, 46C19, 55N8, 65M9, 69C13, 71J23, 83P23
1.8, 1.2 (both weak)78l6
DL4910c>233M14, 30N10, 67I21, 69N16, 72F8, 72G14, 80J24, 87G9
5.510F13, 23M4, 30l5, 37A12, 69l7, 70N15, 75D10, 79F10
5.7 (weak)16E12, 18K11, 35L17
3.2 (weak)18l3, 35M17, 83P18
DL4915w142C22, 8K11, 23J21, 25O5, 40O10, 41N20, 57D3, 65K14, 77K13, 86F1
2.059K13, 78G6
DL4935c3.818O1, 38M1, 44I17, 56C11, 59K13, 78G6, 81C17
1.023J21, 41N20, 56K14

Within each of the seven contigs A–G, only a subset of the 19 candidate genes was detected. The numbers and identities of the genes detected showed extensive variation between contigs. In contig D, the only contig to span the entire region DL4665–DL4935, nine of the 19 putative genes were found. In contig F which spans DL4665–DL4860, 10 of the potential 14 genes assayed were detected. The candidate gene DL4910 hybridized to 22 clones and had four different hybridization pattern categories, but was not placed within any of the Brassica contigs. This raises the possibility that its presence in the Arabidopsis chromosome 4 region postdates divergence from the common Arabidopsis–Brassica ancestor. No consistent pattern of gene content was observed within the Brassica paralog contigs apart from those genes on which the Arabidopsis chromosome 4–chromosome 5 discrimination was based. However, the gene order as found in Arabidopsis was maintained in the B. oleracea contigs with only two exceptions: (i) in contig E the putative gene DL4675 was found downstream inserted between DL4820–DL4840; and (ii) the inversion described above in contig F.

Comparison of the high-resolution Brassica physical maps generated in this study with the Arabidopsis physical maps suggests different levels of genome expansion/contraction in the two species ( Fig. 3a,b). Within contig A, BAC 61N3 (insert size 190 kb) contains the genes DL4705–DL4820. The equivalent region on its homologue, Arabidopsis chromosome 5, is 106 kb implying a maximum Brassica–Arabidopsis size ratio of 1.79. In contig C, also homologous to Arabidopsis chromosome 5, BAC 5C1 with insert size 150 kb fits between DL4710 and DL4785. In Arabidopsis this is spanned by 66 kb, giving a minimum Brassica–Arabidopsis size ratio of 2.27. In contig D, homologous to Arabidopsis chromosome 4, clone 7H9 (insert size 140 kb) contains DL4665–DL4840 as compared to 138 kb in Arabidopsis, a maximum size ratio of 1.01. In contig D, BAC 25O5 (insert size 142 kb) fits between DL4840 and DL4935 in Arabidopsis; this is covered by 83 kb indicating a minimum size ratio of 1.71. BAC 36I20 in contig E (insert size 200 kb) fits between DL4725 and DL4785, but in Arabidopsis only 45 kb are needed to span this region giving a minimum size ratio of 4.44.

Discussion

JBo library construction and analysis

We have constructed the first large-insert B. oleracea clone library using the pBIBAC2 vector. To achieve the 145 kb mean insert size, several improvements were made relative to standard protocols. The double-DNA size selection achieved a homogeneous insert size range and reduced often-cited problems with small DNA fragments being trapped within the larger fraction. Electroelution recovery of this size-selected high molecular weight DNA ensured integrity of the DNA for ligation purposes. Despite cloning large DNA fragments, high numbers of transformants were achieved per μl of ligation mixture and a low background of clones lacking inserts. These are major considerations in terms of both time and cost when constructing a clone library. Highly redundant random clone libraries with large inserts, as described here, are crucial for the physical mapping of species with large, complex genomes such as the Brassicas.

All 19 Arabidopsis predicted genes selected for use in the heterologous hybridizations to the JBo library successfully identified their Brassica counterparts. This result supports the earlier report of high gene sequence conservation between Arabidopsis and Brassica species ( Cavell et al. 1998 ). The use of these products from PCR amplifications of the modelled Arabidopsis genes has important implications for future Brassica–Arabidopsis comparative physical mapping. These rapidly generated markers can be produced at high density and used to anchor homoeologous Arabidopsis regions within Brassica genomes.

Brassica–Arabidopsis microsynteny

In this work none of the Brassica contigs constructed for the region under investigation was found to have the same gene content as in Arabidopsis. However, with few exceptions the gene order is the same. Transposition-like or deletion events with insertions elsewhere in the genome could be responsible for this high level of gene content variation. The multiple gene copies present in the Brassica replicated genome could have undergone rapid sequence divergence and so acquired new functions, either directly or following methylation. This work also uncovered aspects of Arabidopsis genome structure and evolution. The discovery that the 222 kb region of interest on Arabidopsis chromosome 4 is partially duplicated on chromosome 5 could indicate that the levels of Arabidopsis genome duplication have been underestimated. The extent of this will be clarified when the complete Arabidopsis genome sequence is available for analysis. Additionally, the ongoing evolution of the Arabidopsis genome is suggested by gene DL4910 being found only in the Arabidopsis chromosome 4 region, and despite having four loci in B. oleracea it was not placed in any of the contigs constructed. Its insertion at that position on chromosome 4 could postdate divergence from the common Arabidopsis–Brassica ancestor.

Comparison of the Brassica contig sizes presented here with the Arabidopsis physical maps suggests that the Brassica genome has undergone expansion relative to the Arabidopsis genome. By averaging the sizes calculated for the Brassica regions, a figure is reached that is in keeping with the triplicated 600 Mb Brassica genome being expanded approximately 1.5 times compared to that of the Arabidopsis genome. This is a workable level of genome expansion for Arabidopsis–Brassica comparative gene mapping research. It is clear that different Brassica chromosome regions display different levels of genome expansion, which emphasizes the need for using clone libraries with very large inserts to facilitate physical mapping with a minimum number of probes.

A further complication when studying species with replicated genomes is that many traits are likely to be controlled by gene families. In this work the gene DL4900 was found to have 11 different homologues in B. oleracea. An understanding of the interplay between loci is necessary if the genetics for improvement of complex traits controlled by gene families are to be tackled.

The results of this work confirm triplication of the B. oleracea genome at the microgenome level as previously demonstrated at the genetic map level, and the collinearity of these triplicated units with the Arabidopsis genome. However, the high frequency of apparent gene content variation found in this research had not been anticipated in previous Brassica–Arabidopsis comparative genetic mapping reports ( Cavell et al. 1998 ; Lagercrantz, 1998). Genetic mapping is less useful for assessing changes at the microgenome level, as regions which appear collinear at the genetic map level may contain many hundreds of genes between the mapped markers. To obtain an accurate overview of microsynteny between model species such as Arabidopsis and related species with large genomes such as the Brassicas, the use of large-scale physical mapping and high probe density is necessary. The number of markers needed should take account of the extent of divergence at the microsynteny level across all genomic regions and levels of genome expansion. Sequencing is a very precise tool for determining gene content and order, however its application is limited when large genomes are being considered and realistically only small regions can be studied in this way. The results from this work are very encouraging for the future of Arabidopsis–Brassica comparative genome mapping, and support the feasibility of such research. Through comparative study of gene function in the very closely related Arabidopsis–Brassica species it will be possible to elucidate key mechanisms in polyploid genome evolution.

Experimental procedures

Arabidopsis gene-specific probes

EST clones for the five Arabidopsis gene models DL4665c, DL4775c, DL4840c, DL4900w and DL4910c were ordered from the Arabidopsis Biological Research Centre (ABRC). Upon receipt of the clones the inserts were resequenced to confirm their identities. For 14 Arabidopsis putative genes used as probes, oligos were designed to predicted exonic regions of the genes according to the gene models developed at the Munich Information Centre for Protein Sequences (MIPS) ( http://www.mips.biochem.mpg.de). The genes were amplified by PCR from pooled DNA of BAC and cosmid clones: G11319, G19830, G2424, G7006, CC29F3, CC18E16, CC9C4, CC22C20, IB12F15, IB13H3, G3845, T3P6 and IB12E19. These clones represent the region of chromosome 4 containing the gene models DL4665–DL493. Details of the gene-specific hybridization probes are given in Table 3. All DNA fragments (clone inserts or PCR products) were gel purified using QIAquick columns (Qiagen Ltd., Crawley, UK) before labelling.

Table 3.  Gene-specific hybridization probes
Gene
model
EST clone/accession no.
or PCR primers (5′−3′)
PCR product
size (bp)
Paralogue
accession
Paralogue position
within accession
(bp)
P-value for highest-
scoring segment
DL4665c186L17/R89950None   
DL4675cTGACTACTCTTTCCTTTGCTC,918MNJ741 479–42 4141.2 e−109
CTAGAGAGAAGAGAGCGATC    
DL4685wATGGCGGCCACGTTTCTG,988None  
CCCAAAAGTGTTGGTTCTAGG    
DL4705wATGGTGAAGATTGAGATAGG,933MNJ78485–98412.7 e−104
TCAAGGGTAGCTTTCTGTGG    
DL4710wATGGAGGAAGAGTTGGAAG,1711MNJ73517–50832.6 e−144
ACGTGTGACCAGTGAGATTC    
DL4725wATGGAAGACGACGGAGGAG,1044None  
AAACAATGACTCATTTTGCAAG    
DL4740wATGGAGTCTGTACGGTGTCC,1090MQL584 161–86 5996.5 e−99
CTTCCGATCGTCGAATCTCG    
DL4755wATGTCTGCGAATTCCAAGTCG,573None  
ATCGAGATGTTTCTGCATAGC    
DL4765wCAAAAGCAACAGTTTCTTAGG,877MQL564 913–65 9643.6 e−67
GACCTAGGACGCATCACATC    
DL4775c37C11/T04534 MQL555 813–62 7042.9 e−78
DL4785wATGTCGATGACGGCGGATTC,803MQL520 398–29 0802.9 e−41
TAAAACCAATAAACGATCGCC    
DL4820cCACTTGTGAACTACACACCG,1442K14A328 422–29 6624.9 e−40
CTTGAGGAGGTTATAGCTTTC    
DL4840c31F5/T04209K14A317 391–18 0669.7 e−11
DL4860wGAGCTGTATCTTCCTTACTTTC,1007None  
CATCACTCACTCAAATTCAAC    
DL4885wTAGTGATGGAGGAGTACGAG,1277MQD223491–38323.1 e−34
GTAGAAAAATGCATTCCTGGC    
DL4900wF3C11/N95926 MSD2314 322–15 7752.6 e−88
DL4910c172M1/H36030 None  
DL4915wTCACGAAATACGAGTATGGAG,1065None  
ATCAGCAGGCTCTTCAGGC    
 ATGGCTGGTCTTGATCTAGG,    
DL4935cTCAGAAAGGACCTCTTCCAC879None  

Vector preparation pBIBAC2 plasmid DNA was prepared using a Nucleobond AX10 000 kit using the protocol supplied by the manufacturer (Macherey-Nagel GmbH, Düren, Germany). Isolated plasmid DNA was linearized with BamHI and an aliquot checked by agarose gel electrophoresis. The linearized vector was dephosphorylated by standard methods using calf intestinal alkaline phosphatase.

Plant material, preparation and embedding of nuclei

Selfed seeds of single plant descent for the doubled haploid line B. oleracea var. alboglabra A12 were received from Penny Sparrow, Brassica and Oilseeds Department, JIC. Young, fully expanded leaf material (80 g) was collected from glasshouse-grown plants and immediately frozen in liquid nitrogen before being stored at −80°C. The nuclei extraction and embedding in agarose blocks was as described in the Texas A&M BAC Training manual (http.tamu.edu/8000). The blocks were incubated in lysis buffer ESP (100 m m EDTA pH 9, 1% Sarkosyl, 0.5 mg ml−1 Proteinase K) for 24 h at 50°C with a further 24 h at 50°C incubation after exchange with fresh lysis buffer. The blocks were washed 2 × 1 h with TE plus 1 m m phenylmethylsulfonylfluoride, 6 × 20 min with TE, and stored at 4°C in TE until use.

Enzyme calibration and partial digestion

A tube of restriction endonuclease Sau3AI was calibrated for partial digestion of the embedded plant genomic DNA using the enzyme dilution series 0, 0.125, 0.25, 0.5, 1, 2, 4 and 8 U per half block. Two enzyme concentrations, 0.25 and 0.5 U per full block, were identified as giving the maximum amount of digested DNA in the range 110–200 kb and a large-scale digest was carried out with 18–20 blocks.

Size selection of insert DNA

A 1% SeaKem GTG agarose gel (FMC Bioproducts, Rockland, Maine, USA) in 0.5 × TBE buffer was prepared and a trough cut near one end. The 18–20 blocks of partially digested high molecular-weight DNA were loaded as two rows and lambda ladder markers were placed on both sides of the plant DNA. The trough was sealed with the remainder of the molten agarose. The DNA was subjected to pulsed field gel electrophoresis (PFGE) ( Bancroft et al. 1992 ) using pulse parameters suitable to resolve fragments of approximately 250 kb (20 sec pulse times, 260 V, 130° field angle for 20 h). When the run was finished the gel was cut into three parts: two outer gel edges containing the lambda markers and approximately 2 mm of digested plant DNA, and the central part containing the bulk of the digested plant DNA. The flanking strips were stained with ethidium bromide. The DNA was visualized with UV illumination and the exact migration distances for the size ranges 110–to130 and 130–200 kb were recorded. Based on these measurements, gel slices containing these size fragments of DNA were cut from the unstained portion of the gel. The excised gel slices were rotated 180° and placed in a fresh 1% SeaKem GTG agarose gel. This second gel was run with the same parameters as the first. At the end of this run the gel edges with 2 mm plant DNA were cut off, stained and visualized. The exact migration distance of the DNA was measured, and based on this figure the DNA was recovered from the unstained portion of the gel by electroelution using GeneCapsules (Geno Technology Inc, St Louis, MO, USA) following the manufacturers' instructions. Eluted DNA was pooled and a 10 μl sample analysed by PFGE to assay size and concentration. The remainder of the eluted DNA was float dialysed on a 0.025 μm VS Millipore filter against 50 ml of 0.1 × TE at 4°C overnight. Dialysed DNA was stored at 4°C until use.

Ligations

Ligation reactions were set up using 200 ng of insert DNA and a 10-fold molar excess of vector DNA (300 ng) and incubated overnight at 4°C. As a control one ligation was set up without insert DNA. The ligations were dialysed against 0.1 × TE as described for the size-fractionated high molecular-weight DNA.

Transformation, library construction and arraying

Aliquots (1 μl) of the dialysed ligation mix were electroporated into 20 μl Electromax DH10B cells (Gibco BRL, Gaithersburg, MD, USA), using a CellPorator (Gibco BRL) with voltage booster. Electroporation conditions were as recommended by Gibco (4000 OHMs, 330 μF, fast charge). Electroporated cells were selected on LB agar plates containing 40 μg ml−1 kanamycin and 5% sucrose. The transformation efficiency was judged to be adequate for library construction when the ligation mixes including insert DNA produced at least 50-fold more transformants than ligations without inserts. Colonies were picked to 384-well microtitre plates containing 50 μl freezing broth and 40 μg ml−1 kanamycin per well. Grown plates were stored at −80°C.

The library was double spot gridded onto 22 cm2 Hybond N+ membrane (Amersham International plc, Amersham, UK) using a BioGrid Robot (BioRobotics, Cambridge, UK). After overnight growth on LB plates with 40 μg ml−1 kanamycin, the filters were processed as described by Bent et al. (1998) .

Filter hybridizations

Dried filters were rinsed in 2 × SSC before being placed in large hybridization tubes with 30 ml hybridization buffer (5 × SSPE, 5 × Denhardts, 0.5% SDS, 5 μg ml−1 salmon sperm DNA). Filters were prehybridized at 50°C for 4 h while slowly rotating. The buffer was then replaced with 10 ml fresh hybridization solution. 50 ng of probe DNA was random primed labelled with 32P using a random primer labelling kit (BRL) according to the manufacturer's instructions), denatured by boiling, and added to the hybridization tube without removal of unincorporated nucleotides. After overnight hybridization at 50°C the filters were washed three times for 20 min each in 2 × SSC, 0.1% SDS at 50°C. The washed filters were sealed in plastic bags and autoradiography was performed using X-OMAT-AR film (Kodak). For high-stringency hybridizations the filters were incubated at 65°C overnight and the next day given three 20 min washes in the buffer 0.1 × SSC, 1% SDS at 65°C. Sealed filters were exposed to film as above.

Acknowledgements

We wish to thank Liz Bent, Sam Johnson, Daryl Tacon and Caroline Hall for assistance during plating and arraying of the JBo library and for instruction in physical mapping technology, and Anne-Marie van Dodeweerd for resequencing clones received from ABRC. We would also like to thank BBSRC for financial support.

Ancillary