The cultivated Brassica species are the group of crops most closely related to Arabidopsis thaliana (Arabidopsis). They represent models for the application in crops of genomic information gained in Arabidopsis and provide an opportunity for the investigation of polyploid genome formation and evolution. The scientific literature contains contradictory evidence for the dynamics of the evolution of polyploid genomes. We aimed at overcoming the inherent complexity of Brassica genomes and clarify the effects of polyploidy on the evolution of genome microstructure in specific segments of the genome. To do this, we have constructed bacterial artificial chromosome (BAC) libraries from genomic DNA of B. rapa subspecies trilocularis (JBr) and B. napus var Tapidor (JBnB) to supplement an existing BAC library from B. oleracea. These allowed us to analyse both recent polyploidization (under 10 000 years in B. napus) and more ancient polyploidization events (ca. 20 Myr for B. rapa and B. oleracea relative to Arabidopsis), with an analysis of the events occurring on an intermediate time scale (over the ca. 4 Myr since the divergence of the B. rapa and B. oleracea lineages). Using the Arabidopsis genome sequence and clones from the JBr library, we have analysed aspects of gene conservation and microsynteny between six regions of the genome of B. rapa with the homoeologous regions of the genomes of B. oleracea and Arabidopsis. Extensive divergence of gene content was observed between the B. rapa paralogous segments and their homoeologous segments within the genome of Arabidopsis. A pattern of interspersed gene loss was identified that is similar, but not identical, to that observed in B. oleracea. The conserved genes show highly conserved collinearity with their orthologues across genomes, but a small number of species-specific rearrangements were identified. Thus the evolution of genome microstructure is an ongoing process. Brassica napus is a recently formed polyploid resulting from the hybridization of B. rapa (containing the Brassica A genome) and B. oleracea (containing the Brassica C genome). Using clones from the JBnB library, we have analysed the microstructure of the corresponding segments of the B. napus genome. The results show that there has been little or no change to the microstructure of the analysed segments of the Brassica A and C genomes as a consequence of the hybridization event forming natural B. napus. The observations indicate that, upon polyploid formation, these segments of the genome did not undergo a burst of evolution discernible at the scale of microstructure.
The cultivated Brassica species represent the group of crops most closely related to Arabidopsis. The lineages of Arabidopsis and Brassica species diverged ca. 20 Ma (Yang et al., 1999). The genomes of the Brassica species are extensively triplicated and a comparative analysis has been conducted of the microstructure of six regions of the genome of B. oleracea (O'Neill and Bancroft, 2000) using assemblies of contiguously overlapping bacterial artificial chromosome (BAC) clones (BAC contigs). The results showed extensive divergence of gene content of related genome segments, both between the B. oleracea paralogous segments and between them and the corresponding homoeologous segments within the genome of Arabidopsis. Genes were found to have been lost from the duplicated Brassica genome segments in an interspersed pattern. This pattern of interspersed loss of genes from duplicated segments is widely observed in plants. It may, along with the frequently observed tandem duplication of genes, commonly be the result of unequal crossing over (Bancroft, 2001).
Brassica rapa contains the Brassica A genome and is closely related to B. oleracea, which contains the Brassica C genome. The lineages of B. rapa and B. oleracea diverged ca. 4 Ma (Inaba and Nishio, 2002). Although the chromosome numbers of the species differ (n = 9 for B. oleracea and n = 10 for B. rapa), genetic mapping confirmed the gross organization of their genomes to be highly collinear (Lagercrantz and Lydiate, 1996). The size of the genome of B. rapa is, at ca. 500 Mb, significantly smaller than that of B. oleracea, which is ca. 600 Mb (Arumuganthan and Earle, 1991). Brassica napus is an allopolyploid, arising from the hybridization of an A genome progenitor and a C genome progenitor (U, 1935). Its chromosome number is n = 19 and the size of its genome is ca. 1200 Mb (Arumuganthan and Earle, 1991). Genetic mapping confirmed that the progenitor A and C genomes are essentially intact and have not been rearranged (Parkin et al., 1995). The hybridization to form B. napus probably occurred during human cultivation, i.e. less than 10 000 years ago. Thus, Brassica species, in conjunction with Arabidopsis, provide an opportunity to study the evolution of genome structure over a wide range of time scales since polyploidization occurred.
The genomes of newly synthesized B. napus polyploids have been reported to undergo rapid change (Song et al., 1995), contradicting the results for natural B. napus. Did the gene loss observed in the triplicated segments of the genome of B. oleracea occur almost exclusively as part of the initial stabilization of the newly polyploid proto-Brassica genome? If so, the genome structure of extant polyploids should be stable. Or is interspersed gene loss an ongoing process? If this is the case we may have to expect significant differences in genome structure between related species, subspecies and even varieties. To address these questions, and to assess the reliability of observations in newly synthesized Brassica napus polyploids for understanding events in natural polyploids, we took the approach of comparative physical analysis of the microstructure of specific segments of the genomes of B. oleracea, B. rapa and B. napus.
BAC library construction and analysis
In order to conduct a comparative physical mapping analysis, we first had to construct large-insert libraries of genomic DNA of B. rapa and B. napus. We selected an inbred line, B. rapa var. trilocularis RO18 (a Yellow Sarson-type oil crop grown in Pakistan) and constructed a library in vector pBIBAC2 (Hamilton, 1997). The library was designated JBr and 36 864 clones were arrayed into 96 × 384-well microtitre plates. We selected a doubled haploid B. napus (oilseed rape) variety Tapidor and constructed a library in vector pBAC/SACB1 (Bendahmane, 1999). The library was designated JBnB and 73 728 clones were arrayed into 192 × 384-well microtitre plates. Both libraries were constructed using Sau3AI-digested DNA from purified nuclei, as previously described (O'Neill and Bancroft, 2000).
To evaluate the average insert size over each entire library, one well was randomly chosen from each of the microtitre plates in which the libraries were arrayed. DNA was isolated from the corresponding clones, digested with NotI to release the clone inserts, and resolved by PFGE to estimate insert sizes. Of the 96 B. rapa clones analysed, 20 (21%) apparently did not have an insert. Of those that did, the mean insert size was 128 kb. Therefore, we estimate that the 79% recombinant clones in the 36 864-clone library should provide ca. 7.5-fold redundant representation of the ca. 500 Mb genome of B. rapa. Of the 192 B. napus clones analysed, 23 (12%) apparently did not have an insert. Of those that did, the mean insert size was 145 kb. Therefore, we estimate that the 88% recombinant clones in the 73 728-clone library should provide ca. 7.8-fold redundant representation of the ca. 1200 Mb genome of B. napus.
Assembly of BAC contigs using Arabidopsis gene-specific probes
We aimed at assembling BAC contigs representing the regions of the genome of B. rapa and B. napus that are homoeologous to the segments of the genome of B. oleracea that have been reported previously (O'Neill and Bancroft, 2000). Eighteen Arabidopsis gene-specific hybridization probes, designed to genes in a 222-kb region of the genome of Arabidopsis chromosome 4, which is partially duplicated on chromosome 5, were used to screen high-density colony filters of the JBr and JBnB BAC libraries as previously described (O'Neill and Bancroft, 2000). The annotation nomenclature of the Arabidopsis genome has changed since the report of the B. oleracea physical mapping work. To permit comparison with this previously reported analysis, a key to the different nomenclatures is shown in Table 1.
Table 1. Number of bacterial artificial chromosomes hybridizing to gene-specific probes
Number of clones
Number of loci
Number of clones
Number of loci
DNA was prepared from the BAC clones identified by colony hybridization, digested with HindIII, resolved on replica 121-lane agarose gels and Southern blotted. The blots were then probed with the appropriate Arabidopsis gene-specific probes and binned based on the sizes of restriction fragments that hybridized. The numbers of BACs hybridizing to the gene-specific probes is summarized in Table 1. For eight B. rapa clones and eight B. napus clones, the hybridizing fragments were of novel size, but many bands of the restriction digest matched the other clones in the assigned bin. These observations are most likely due to the hybridizing fragment being at the end of the BAC inserts, resulting in hybridization being detected with a fragment representing a vector-insert junction. The lists of clones in each bin are shown in supplementary data Table S1 for B. rapa BACs and Table S2 for B. napus BACs. These data were used for the majority of the BAC contig assembly based on the identification of overlaps between BACs revealed by hybridization with common patterns of fragments. For further illustration of the procedure for BAC contig assembly, please see O'Neill and Bancroft (2000).
To identify overlaps between B. napus clones that were not revealed by hybridization to common Arabidopsis gene-specific probes, an additional method for contig assembly was used. The method used fingerprinting based on HindIII restriction digests and identification of overlaps in the fingerprints using FPC (http://www.sanger.ac.uk/Software/fpc). This revealed overlaps between JBnB4G7 and JBnB143C12, JBnB110O5, JBnB89G18, JBnB33K19. The analysis of fingerprints confirmed that clones JBnB15K6 and JBnB74N14 did not overlap clones JBnB145J20 or JBnB144G23. The assembled contigs are shown in Figure 1 for B. rapa and Figure 2 for B. napus.
Genome representation of the BAC contigs
The Arabidopsis gene-specific hybridization probes identified a total of 51 loci in the JBr BAC library, with a mean representation of 4.6-fold redundancy, and 89 loci in the JBnB BAC library, with a mean representation of 5.4-fold redundancy. The same probes had identified a total of 65 loci in the B. oleracea BIBAC library, JBo, with a mean representation of 5.0-fold redundancy (O'Neill and Bancroft, 2000). The similarity in the redundancy of representation indicates that all three Brassica libraries, as expected, provide about the same depth of coverage of the respective genomes. There are several possible reasons for a smaller number of loci being identified using the JBr library than had been found using the JBo library, but most likely this indicates that a greater proportion of weakly hybridizing clones (i.e. those containing more diverged homologues) were not detected above background in the colony hybridization experiments with the JBr library. For example, the probe designed to At4g17730 only detected three loci from the screening of the JBr BAC library, but had detected 11 loci (with weak hybridization reported for seven of those) when used to screen the JBo BAC library (O'Neill and Bancroft, 2000). The JBnB library might be expected to contain the sum of the number of loci identified in the JBo and JBr libraries, but fewer were found. Most likely the difference is again due to a greater proportion of weakly hybridizing clones not being detected above background in the colony hybridization experiments. As these differences did not affect the contigs assembled, they are not relevant to the questions being addressed.
Comparative analysis of genome microstructure in B. rapa
As was observed for the genome of B. oleracea, the genome of B. rapa exhibits extensive triplication. The seven JBr BAC contigs assembled (A–G in Figure 1) correspond to contigs A–G, respectively, previously reported for B. oleracea (O'Neill and Bancroft, 2000). Each region of the B. rapa genome shows an interspersed pattern of conserved and non-conserved genes. One rearrangement can be identified relative to the structure of the genome of Arabidopsis: a homologue of gene At4g17440 was detected in a location between homologues of At4g17730 and At4g17800 in contig G.
Analysis of the BAC contigs representing the homoeologous regions of the genomes of B. oleracea (O'Neill and Bancroft, 2000) and B. rapa provides insights into genome evolution. In B. rapa, we found 42 genes (62% of the genes potentially encompassed by the contigs) to be conserved, in collinear order, relative to Arabidopsis, with 26 interspersed genes (38%) lost. A very similar pattern of interspersed gene loss was also observed in B. oleracea where 44 genes (68% of the genes potentially encompassed by the contigs) were found to be conserved, with 21 interspersed genes (32%) lost. There were two differences detected in the identities of the genes conserved. These are: a homologue of At4g17760 in contig G of B. oleracea, but not contig G of B. rapa and a homologue of At4g17440 in contig E of B. rapa, but not contig E of B. oleracea. In the common ancestor of B. rapa and B. oleracea, we can infer the presence of both a homologue of At4g17760 in contig G and a homologue of At4g17440 in contig E. Thus, each extant species has lost one gene homologue out of the 38 that can be inferred as present in the common ancestor in the regions that are represented in the contigs for both B. oleracea and B. rapa (i.e. 4, 6, 7, 9, 6, 3 and 3 of the genes represented in contigs A, B, C, D, E, F and G respectively). Two rearrangements had been identified in the genome of B. oleracea, but these were not found to be present in B. rapa. In addition, the rearrangement that was identified in B. rapa (the homologue of At4g17440 in contig G) had not been identified in B. oleracea. Thus the genome rearrangements appear to be either species-specific or genome-specific. These results indicate that the evolution of genome microstructure is an ongoing process.
Comparative analysis of genome microstructure in B. napus
The genome of B. napus contains extensive sixfold representation of the Arabidopsis genome, i.e. both the C genome (from B. oleracea) and the A genome (from B. rapa) are present. Sixteen contigs were assembled (Figure 2). These contigs show an interspersed pattern of conserved and non-conserved genes. These patterns allow 15 of the contigs to be unambiguously aligned with the corresponding BAC contigs A–G (or portions thereof) assembled as previously reported for the genome of B. oleracea (O'Neill and Bancroft, 2000) and of B. rapa. Where there are differences between the microstructure of the corresponding contigs in B. oleracea and B. rapa, the B. napus contigs can be assigned to the progenitor genome, i.e. Er, Fr and Gr to the B. rapa progenitor and Eo, Fo and Go to the B. oleracea progenitor. The contig that could not be aligned (H) appears to correspond to part of contig F as identified in B. oleracea. The evidence for merger of the equivalent segment of contig F in B oleracea is weak, having been demonstrated only by cross-hybridization of BAC clones, so may be erroneous.
One of the rearrangements identified in the genome of B. oleracea, the occurrence in contig E of a homologue of At4g17280 adjacent to At4g17570, can be identified in B. napus. This rearrangement, relative to the Arabidopsis genome, is not present in the corresponding contig E of B. rapa, so is specific to the Brassica C genome. The other rearrangement previously identified in B. oleracea, the presence of an inverted segment at the end of contig F, is not present in the contig F of B. napus originating from the B. oleracea progenitor (Fo), so appears species-specific, if it is authentic. The only rearrangement identified in B. rapa, the occurrence in contig G of a homologue of At4g17440 adjacent to At4g17730, is not present in the contig G of B. napus originating from the B. rapa progenitor (Gr), so is also species-specific.
Although the pattern of interspersed gene loss in B. napus is very similar to those identified in the corresponding regions of the genomes of B. rapa and B. oleracea, there are only two differences in addition to variation in the rearrangements present. The homologue of At4g17480 identified in contigs D of both B. oleracea and B. rapa is absent from contig D1 of B. napus. There is a homologue of At4g17440 identified in contig Eo of B. napus, whereas there was no homologue of this gene detected in the corresponding contig E of B. oleracea. Thus, the results of our analysis of genome microstructure in B. napus support the conclusion from comparative genetic mapping that genome structure was changed little upon polyploid formation (Parkin et al., 1995).
We constructed two new large-insert BAC libraries, JBr and JBnB, and used them to conduct detailed comparative physical mapping of defined segments of the genomes of B. rapa and B. napus. The JBr library is also being used for the systematic assembly of BAC contigs representing the genome of B. rapa (http://brassica.bbsrc.ac.uk/IGF/?page=body/project.htm) and the JBnB library is also being used for partial physical and genetic mapping of the oilseed rape genome (http://brassica.bbsrc.ac.uk/IMSORB/). These libraries, along with the JBo BAC library of B. oleracea that we constructed previously and the Arabidopsis genome sequence, provide the opportunity to study the effects of polyploidy on plant genome microstructure over a wide range of time scales, as summarized in Figure 3.
The comparative analyses we conducted allow us to confirm that the microstructure of the genome of B. rapa is very similar to that of B. oleracea and has a fundamentally triplicated structure with respect to an Arabidopsis-like genome. This was expected, as the lineages of B. oleracea and B. rapa diverged only ca. 4 Ma (Inaba and Nishio, 2002). The microstructures of the triplicated segments (paralogues) are primarily diverged from each other, and their Arabidopsis homoeologues, by an interspersed pattern of gene loss. Interspersed gene loss appears to be a general mechanism by which conservation of genome microstructure degrades following polyploidy, being observed across the duplicated genome segments in Arabidopsis (Arabidopsis Genome Initiative, 2000; Ku et al., 2000) and in maize (Fu and Dooner, 2002), which has a tetraploid ancestry (Helentjaris et al., 1988). The microstructure of the genome segments studied in B. oleracea and B. rapa are not identical. There were two differences in the complement of conserved genes. One microstructural rearrangement was identified in the genome of B. rapa, which differed from the two rearrangements identified in B. oleracea. A modest number of gross chromosomal rearrangements (16) have been found to differentiate the genomes of B. oleracea and B. rapa (Parkin et al., 2003). We can conclude from these observations that the evolution of genome microstructure, like genome macrostructure, is an ongoing process.
The present paradigm, which is based primarily on studies in newly synthesized polyploids (Song et al., 1995), is that there is rapid genome change upon polyploidization in Brassica. This was based on the identification of changes in size or presence of ca. 1% of bands on Southern blots probed with RFLP markers over three generations (from F2 to F5). A consequence of this paradigm is that essentially all microstructural changes observed in the Brassica lineage would have occurred shortly after polyploid formation, so the rate of change more recently would have been extremely low. However, we have shown that the evolution of genome microstructure is ongoing, with differences of gene conservation and microstructural rearrangements observed between B. oleracea and B. rapa. The results of our analysis of the microstructure of the genome of natural B. napus also support a change of paradigm. Contrary to the expected major changes in genome microstructure, our results show that, in the segments analysed, the genome microstructure of B. napus is very similar to that of its diploid relatives. The extent of microstructure variation between the Brassica A and C genomes, as represented in B. napus Tapidor compared with B. oleracea alboglabra and B. rapa trilocularis, is similar to that observed between B. rapa subspecies (unpublished). Therefore, the small differences in microstructure that were observed are most likely due to the ancestors of natural B. napus differing slightly in the microstructures of their genomes compared with the lines of B. oleracea and B. rapa we used for genome analysis, and not subsequent genome evolution.
Although we cannot rule out sequence-level changes, our results demonstrate that, in the segments analysed, genome microstructure has not been greatly affected by the induction of polyploidy. Although our data are limited to a small proportion of the genome (ca. 0.7%), they are in agreement with the lack of genome macrostructure changes in B. napus compared with B. oleracea and B. rapa observed by Parkin et al. (1995). In contrast to Song et al. (1995), Axelsson et al. (2000) found no evidence for rapid genome change in resynthesized B. juncea using restriction fragment length polymorphism linkage analysis, supporting the likely generality of our conclusions. The data obtained by Song et al. (1995) from resynthesized polyploid Brassica may result from homoeologous recombination events that have no impact on gene content or order, or from novel DNA methylation patterns in the hybrid (as have been observed in other systems, e.g. Xiong et al., 1999). The data may also be an artefact of the method used for the production of the plant lines studied, which included colchicine treatment. Thus, although polyploidy clearly provides scope for more extensive evolution of gene function, its role in genome remodelling may have been overstated. Ultimately, the greater understanding that is emerging of polyploid genome evolution should enable more rapid progress to be made towards the predictive development of polyploid crops.
BIBAC DNA isolation and restriction digestion
DNA from the hybridizing clones was extracted by following methods designed for fingerprint analysis of BACs as described by Marra et al. (1997). The DNA in each well was digested with 20 U HindIII in a 15-μl reaction volume for 2 h. Then 2 μl of 6x dye (15% Ficoll, 0.06% bromophenol blue, 0.06% xylene cyanol, 10 mm EDTA) was added to the digest. Digest volume was reduced to 10 μl by centrifuging the plates without the lids at 1550 g for 20 min.
Gel preparation and loading
Agarose (SeaKem LE, FMC Bioproducts, Rockland, ME, USA) gels (1%) were run in Gator Wide Format System model A3-1 (Owl Scientific, Portsmouth, NH, USA). Special 121-well gel combs were made locally by following the Sanger Centre design. Marker DNA (Analytical Marker DNA, Wide Range, catalogue no. DG1931, Promega and Roche molecular weight marker V, catalogue number 0821705) was loaded every fifth lane starting from the first lane. Two microlitres of HindIII-digested BAC DNA was loaded in each well. The gels were run in 1x TAE in a cold room for 16–18 h. Gels were stained in 250 ml of 20 mm Tris and 0.1 mm EDTA containing 25 μl Vistra Green (Amersham Life Sciences, Chalfont, St. Giles, UK) for at least 1 h. Gels were then scanned in a FluorImager 595 scanner (Molecular Dynamics, Sunnyvale, CA, USA). Exact gel size images were printed out for future analysis.
Southern blot analysis of BAC clones
DNA fragments resolved on agarose gels were blotted onto Amersham Hi-Bond N+ membrane using 0.4 n NaOH as a transfer buffer essentially as described by the manufacturer. Hybridization and washing was carried out at 50°C as previously described (O'Neill and Bancroft, 2000). Random prime labelled probe was generated using 32P according to manufacturer's (GIBCO-BRL) direction. Fifty nanograms of gene-specific probe and 1 ng of marker DNA were used for labelling. The sizes of the hybridizing bands were calculated by using the nearest marker lane as a guide. Each hybridized band was verified by a corresponding band on the DNA gel image by overlaying the autoradiograph on top of a full-size print of the gel image.
Southern blots were prepared with larger amounts of BAC DNA for the probe–clone combinations giving signals that were too weak to detect using the 121-lane gels. These were probes specific to genes At4g17460, At4g17480, At4g17500 and At4g17570 against clones JBr44N24, JBr45O11, JBr97F18, JBr61A4, JBr87B14, JBr91K3, JBnB20M2, JBnB176O17, JBnB186M1, JBnB103N1, JBnB129H17 and JBnB134C11. The methods used were as described previously (O'Neill and Bancroft, 2000).
Identification of BAC overlaps using FPC
Gel images were imported into Image v3.11 developed at the Sanger Centre (http://www.sanger.ac.uk/software/Image). The bands of each fingerprinted clone were edited manually and the edited bands were then transferred to FPC v4.7, also developed at the Sanger Centre (http://www.sanger.ac.uk/Software/fpc). The vector bands were removed during the transfer process. We used a cut-off setting of 1 × 10−12 to assemble the contigs automatically. In order to obtain overlapping contigs the cut-off setting was relaxed to 1 × 10−10. The fingerprints of clones in the merged contigs were then carefully analysed by manually examining the fingerprints of the joining clones.
Estimation of BAC insert sizes
The sizes of BAC inserts were estimated based on the sum of restriction fragment sizes identified for each clone by Image v3.11.
This work was supported by the John Innes Centre Competitive Support Grant and UK Biotechnology and Biological Sciences Research Council grant 208/IGF12449. The work on Chinese cabbage was carried out as part of the Bio-Green project with financial support from RDA, Korea. The BAC libraries and clones are available from the JIC Genome Laboratory/GeTCID service (http://www.getcid.co.uk).