Author for correspondence: T. H. N. Ellis Tel: +44 (0) 1603 450243 Fax: +44 (0) 1603 450045 Email: firstname.lastname@example.org
Molecular marker maps have been generated for pea (Pisum sativum), but these have not been well integrated with previous genetic and cytogenetic maps. Here, these data sets are brought together to summarize the current state of knowledge, exploiting the common genetic markers used in many different studies. The pea linkage map was known to be simply related to the maps of lentil and chickpea, and these relationships are presented in the context of the present map. The genomes of these legumes are interpreted as having an equivalent gene content, but differing in their repetitive DNAs.
Pea genetics has been an object of study for a long time (Knight, 1799; Mendel, 1866), but surprisingly it is only relatively recently that a coherent picture has emerged of the overall structure of the genome, both in terms of karyotype and the organization of the genetic linkage groups. This review aims to provide a synthesis of pea genetic data into a common structural framework.
At present pea is unpopular as a model for genetic analysis because the plants are large and the genome is complex. However, the large genome provides an opportunity to study genome architecture, and the large size of the plants and their suitability for physiological and biochemical analysis may yet prove an advantage in some contexts. Pea is also a significant crop; it is both a vegetable and an arable crop. Here we will describe some features of the organization of the repeated sequences which bloat this genome. It should be clear that these sequences represent an opportunity for the understanding of recent and rapid genome divergence.
While there are current technical and financial obstacles to the pursuit of pea genetics and genomics, this organism has many interesting mutants available for study. These are a rich resource for biology and some of these, especially those involved in the regulation of compound leaf form (Hofer & Ellis, 1998), will be difficult to identify and study in other organisms. An important tool for the exploitation of pea genetics in the near future will be comparative genomics. The small genome legumes Lotus japonicus and Medicago truncatula are taxonomically closely related to pea (Doyle et al., 1997) and can provide a route for circumventing the problems of scale in the genome. At present the relationships between these three genomes are being studied actively; the indications are promising but as yet no consistent body of data has been published.
The pea karyotype comprises seven chromosomes: five acrocentric chromosomes, 5, 3, and 6, with 4 and 7 having a secondary constriction corresponding to the 45S rRNA gene cluster. There are two submetacentric chromosomes (chromosomes 1 and 2, Fig. 1). The acrocentric chromosomes are distinguishable on the basis of arm length, centromere and nucleolus organiser position (Figs 1 and 2; Hall et al., 1997a). The two small submetacentrics have been separated by some authors (Blixt, 1958; Fuchs et al., 1998) on the basis of chromosome morphology alone. This can be achieved by a systematic allocation of the smaller chromosome, but the size difference between these two is within experimental error, so such systematic allocation is suspect.
The use of translocation stocks in pea genetics used to be common, as this allowed rapid assignment of loci associated with translocation points, through linkage to semisterility. These studies made extensive use of a set of translocation stocks generated by Lamm & Miravalle (1959). There was some dispute concerning these stocks between Lamm and Lamprecht. Lamprecht (1961) calls Lamm’s conclusions ‘absurd’, in part for arguing that R and Gp were on the same chromosome. Lamm and Miravalle’s translocations were thought to involve all chromosomes in the karyotype, but on the basis of cosegregation of the translocation point and classical genetic markers only three linkage groups were shown to be involved (Fig. 2; Lamm & Miravalle, 1959). Subsequent assignment (Lamm, 1977, 1983; Folkeson, 1990) of the translocation points of the lines L108, L111 and L112 extended these assignments to include linkage groups IV, VI and VII (Fig. 2). The translocations in the lines L114 and L180 are not shown on Fig. 2, and remain obscure. Great confusion ensued from disputes concerning this translocation set, as the interpretations of these translocations was compounded with fragmentary linkage and karyotype analysis. However, the pea chromosomes are now identifiable on the basis of morphology and in situ hybridization (Fuchs et al., 1998). The data presented in Fig. 2 are consistent with the observations of Kosterin et al. (1999), but the chromosomal assignments in Fig. 2 differ from those of Weeden et al. (1998) regarding chromosomes 1 and 2 and the satellite chromosomes 4 and 7. The difficulty in distinguishing chromosomes 1 and 2 has been discussed above, but Fuchs et al. (1998) were able to distinguish these chromosomes with the aid of in situ markers and assigned LegK to chromosome 6; LegK corresponds to an RFLP mapped to linkage group II. The assignment of the satellite chromosomes is based on the genetic behaviour reported for L108 and L112 (Lamm, 1977, 1983; Folkeson, 1990).
Pea genome size
The pea nuclear genome comprises about 4 × 109 bp (Michaelson et al., 1991), but estimates vary by c. 20%. Some of this variation has been attributed to inter (sub) specific variation (Baranyi et al., 1996). The modal GC content of pea is 37.7% (ρCsCl = 1.6969), but in common with many eukaryotes, this GC content varies and appears to be distributed in relatively long regions of similar base composition (Salinas et al., 1988). The average GC content is 37.4% and approx. 30% C residues are 5methyl-cytosine and approx. 50% of 5meC are in the sequence C(A/T)G (Pradhan & Adams, 1995). The pea genome is composed mainly of high copy dispersed repeated sequences, and thermal denaturation studies have shown these to be distinct sequence classes (Murray & Thompson, 1982). Taxonomically, pea is within the Vicieae family of the legumes, and is thus closely related to Vicia and Lathyrus. These genera are notable because their diploid species range approx. 10-fold in genome size (1C value; Fig. 3).
A major issue to understand is the extent to which the repeated sequences within the pea genome are organized in contiguous arrays as would be suggested by the isochore profile (Salinas et al., 1988) where distinct components of the genome have different density in equilibrium density gradient centrifugation.
Within the Vicieae C-values vary considerably, and pea has an unexceptional genome size (Fig. 3). This variation is consistent with the idea that, for the large genomes, most of their DNA is of recent origin (i.e. since, or concomitant with, the divergence of the tribe). Within ViciaC-value variation has been studied in relation to the abundance of retrotransposon families. The Ty1-copia group varies considerably, accounting for substantial amounts of the C-value variation, but the copy number of these elements is not correlated with C-value (Pearce et al., 1996), suggesting the need to propose that variation in the abundance of other repeated sequences (e.g. Ty3-gypsy elements, LINEs and SINEs in the retrotransposons) accounts for this variation in DNA content. This leads to the expectation that a large proportion of the pea genome should be composed of retrotransposons that have undergone recent expansion and sequence diversification; these should be an ideal source of abundant genetic markers, because they will pepper the genome.
The Ty1-copia group of retrotransposons has been studied in pea; approx. 30 distinct families have been identified and these range in copy number from about 30–3000 (Lee et al., 1990; Ellis et al., 1998; Pearce et al., 2000). Several families of these Ty1-copia elements have been shown to be activated transcriptionally by protoplast formation or treatment with fungal elicitors (Kato et al., 1999). These elements may provide tools for transposon tagging experiments, in much the same way as Tos17 has been used in rice (Agrawal et al., 2001).
The survey of pea Ty1-copia group elements is incomplete and probably represents no more than approx. 1% of the genome. The Cyclops element (Chavanne et al., 1998) is a member of the Ty3-gypsy group, is conserved in sequence among legume species, and it alone represents approx. 1% of the pea genome. There are no reports of SINE or LINE elements known to the authors. DNA transposons are represented in the pea genome; the mutation in the R locus was caused by the insertion of Ips-r, a Ds-like element, but the corresponding autonomous Ac-like sequence has not been identified (Bhattacharyya et al., 1990). Sequences related to En/Spm have been identified in the upstream region of a legumin gene (Shirsat, 1988), and sequence specific amplified polymorphism (SSAP; Ellis et al., 1998) based on this sequence are highly informative and consistent with a copy number of approx. 50–100 000. These DNA transposons potentially provide gene tagging tools, but there is no known example of an active DNA transposon in pea. Clearly, however, the Ips-r family has been active relatively recently.
This brief description of the repeated sequences in the pea genome, together with other unidentified repeated sequences accounts for approx. 5% of the genome. Clearly, a great deal is missing from the inventory, but these dynamic components of the genome provide abundant genetic markers. Within the Ty1-copia class, elements of the PDR1 family are expected about once every 20 Mb, and these elements are known to be located close to genes (Lee et al., 1990). The Ty3-gypsy class element Cyclops and the pea Spm-like sequences should each occur about once every 100 kb. These are large distances in relation to markers for large insert clones, but if the pea genome is relatively easy to align with that of Medicago truncatula or Lotus japonicus, then the corresponding physical distances in these genomes should be proportionately smaller (c. 5–10-fold; Fig. 3). Multiplex markers based on these repeated sequences have the potential for high throughput genetic analysis (Flavell et al., 1998), and where the flanking sequences are low copy these should be transportable for the identification of large insert clones from the model legume species.
The distribution of retrotransposon insertion site polymorphism within the genus is of some interest. P. sativum and P. abyssinicum are both cultivated types, but are very different as judged by their shared retrotransposon insertion sites. This has led to the suggestion that these were independently domesticated (Ellis et al., 1998; Pearce et al., 2000). Independent domestication is of interest per se, but implies that the gene pools of these two (intercrossable) groups are very different. Furthermore, very few retrotransposon insertion sites are shared between the (sub?) species. This means that an exceptionally high density of markers is readily available, and the genetic map positions of many of the P. sativum insertion sites are already known. This has implications for the introgression of traits between P. sativum and P. abyssinicum and also for fine mapping studies.
An integrated linkage map
The abundance of genetic markers for pea, whether AFLP (Vos et al., 1995), RAPD (Lacou et al., 1998), retrotransposon (Flavell et al., 1998; Pearce et al., 2000) or EST based (Gilpin et al., 1997) has allowed the development of moderate density linkage maps. The availability of common markers has permitted the integration of the maps derived from different crosses (Gilpin et al., 1997; Lacou et al., 1998), and the development of a consensus linkage map for pea (Weeden et al., 1998, http://hermes.bionet.nsc.ru/pg/30/map.htm). This map has the advantage of integrating a wide data set, but does not include many of the classical mutant loci (Blixt, 1972). In an attempt to rectify this deficiency we have re-calculated a linkage map (Fig. 4) integrating the data from markers shared among three recombinant inbred (RI) populations (JI15 × JI1194, JI15 × JI399 and JI281 × JI399; Hall et al., 1997b). This was done by combining the data for 120 markers shared between at least two of the three RI populations (with five exceptions) and the map was generated by JoinMap v2.0 (Stam & Van Ooijen, 1995). This skeleton map goes some way towards resolving map length difficulties discussed in Hall et al. (1997b) and by integration with the three separate RI maps provides the position for 20 mutant loci (Hall et al., 1997b). The location of Uni and Stp by virtue of the location of the corresponding structural genes has been included (Hofer et al., 1997; Taylor et al., 2001). The map presented in Fig. 4 uses these assignments to give points of reference to the many crosses from which the classical genetic map of pea was assembled (Blixt, 1972; Weeden et al., 1993, 1998; Rozov et al., 1999). Bulked segregant analysis (Mitchelmore et al., 1991) with SSAP retrotransposon markers was used to locate classical markers (e.g. assigning Sil to the end of linkage group IV Fig. 4, opposite to Rrn1Fig. 2). In this procedure F2 (and F3) populations were generated where one parent was derived from an RI mapping population, and the other carried the mutant allele. Bulks of 10 segregants with the mutant phenotype were compared to the parents and wild types; SSAP bands, derived from the RI parent, that were missing from the mutant bulks were interpreted as being linked to the wild-type allele of the segregating mutant. This procedure gives an approximate position of the locus being investigated and hence gives a location for linked mutant loci (e.g. for Sil, Wsp is known to be linked (Marx, 1987) in turn the translocation points in L111 and L112 are associated with Wsp (Folkeson, 1990; Figs 2 and 4).
These data, which indicate the position of classical genes on the RI map, have been combined with information from other crosses in published data where common markers can be positioned. For example, A2 can be located due to linkage with Uni and St (Gorel et al., 1997) and to Sym31 by Men et al. (1999) these further helped to define linkage group III makers. The map presented in Fig. 4 is both approximate and a summary. Some marker orders are undoubtedly incorrect, but this cannot be resolved without performing the appropriate crosses. Linkage intensity differs between pea crosses, and this compounds difficulties in deducing marker order where multiple crosses are involved. While some loci are undoubtedly misplaced relative to each other on the map presented in Fig. 4, their approximate position is supported by published linkage data.
For some markers such as Rb there is conflicting data from different crosses, in this case suggesting either a terminal or subterminal position on linkage group III (Hall et al., 1997b; Weeden & Boone, 1999). Similarly the position of Lf with respect to A is difficult to resolve on this map because the crossover frequency in the region of A is low in the populations from which this map is derived. The total length of the integrated map (937 cM) is shorter than those of the individual constituent populations, and is closer to the expectation from chiasma distribution (Hall et al., 1997b). One contribution to excess map length is misscoring due to DNA methylation (Ellis et al., 1992; Knox & Ellis, 2001). The pea linkage map presented by Lacou et al. (1998) was based on PCR markers, without the use of restriction enzymes, and was shorter than the JI281 × JI399 derived linkage map with which it was integrated.
Despite local difficulties in order, and map length differences between crosses, the map is consistent with a large body of data and is an attempt to integrate information from a wide range of sources. The map should be taken as a working hypothesis, suggesting which crosses to make, rather than a definitive map.
Linkage maps have been published for lentil (Weeden et al., 1992; Eujayl et al., 1998) and chickpea (Simon & Muehlbauer, 1997) and these can be aligned with this integrated pea map. The markers used for integration are presented in the relevant publications, and correspond mostly to isozyme markers with known locations on the pea map (Weeden et al., 1998). These comparisons show extensive regions of similarity (Fig. 5). Similarly an extensive map for alfalfa has been published (Kalóet al., 2000). Alignment between the pea and alfalfa maps is currently underway (Vincent et al., 2000; G. Kiss pers. comm. and unpublished).
Despite many problems, the representations of the pea linkage map have now converged (Fig. 4), and the sources of discrepancy between the present map and those published earlier can be identified.
Linkage group I
The association between what is now considered group II and group I (Fig. 4) comes from the attribution of linkage between A and D by Lamprecht (1956). This association is difficult to make because A (aa, plants lack anthocyanin pigment) is epistatic to D (dd plants lack anthocyanin in the leaf axils). Lamprecht (1956) made unwarranted assumptions concerning the proportions of unseen classes.
Linkage group II
The S – Wb – K segment was identified early (group II, Fig. 4), but the association with the A – Lf(group II, Fig. 4) segment was not made until quite recently (Paruvangada et al., 1995). Part of the confusion comes from the difficulty in consistent naming of Wb and Wa. A locus called Bl was sometimes attributed to Wa and at others to Wb. For this reason Oh (linked to Wa on group VII, Fig. 4) had been mistakenly assigned to the S – Wb – K segment.
Linkage group III
For the most part this linkage group remains as it was assigned in early maps, except that the Le – V segment earlier assigned to linkage group IV is now known to be one part of this linkage segment. The assignment of Le to linkage group IV was one reason for supposing a translocation segregated in one of the early molecular mapping papers (Ellis et al., 1992). Interestingly the St – Le linkage can be found in the work of Lamm & Miravalle (1959), but because the location of Le on group IV was believed in strongly, this association was taken to be a consequence of a translocated segment.
Linkage group IV
In early maps from Lamprecht this included the loci N–Z – Fn, but Lamprecht later corrected this to N–Z – Fa, so this group is essentially intact, but has been extended.
Linkage group V
Lamm maintained that the Bt – Tl – R segment was linked to the Gp – Cp – Fs segment, but this was opposed by Lamprecht, and his view prevailed (Fig. 4). Interestingly, the recombination frequency in the R – Gp interval varies widely, and chiasma number on this chromosome ranges between 1 and 3 depending on the cross (Hall et al., 1997a), which could account for the conflicting data concerning linkage between R and Gp (Ellis et al., 1992).
Linkage group VI
Early maps assigned the loci Fl – Pl – P – Wlo to this linkage segment, and these associations are consistent with the present map (Fig. 4) which has been expanded to include more loci.
Linkage group VII
The map shown in Fig. 4 depicts this linkage group as devoid of classical markers over much of its length, and the linkage segment Rrn2 – Pep3 – PgdP – PgmC identified by Weeden and coworkers was not initially assigned to a linkage segment with classical markers.
The consensus pea linkage map (Fig. 4) is consistent with what is known about the structure and genetic relationships among various translocation karyotypes. It appears that the pea linkage map can be aligned reasonably easily with its close relatives, lentil and chickpea. This alignment holds despite large differences in DNA content between these diploid species. This suggests that the genome size differences within the Vicieae are not a consequence of large segmental duplications, and that the basic genome structure is similar among these species, but that these genomes are differently populated by repetitive sequences. This bodes well for the potential to exploit the emerging genomics tools for Lotus japonicus or Medicago truncatula for the analysis of the pea genome. Gene isolation from pea has been possible despite the difficulties (Bhattacharyya et al., 1990; Hofer et al., 1997; Lester et al., 1997; Martin et al., 1997; Smith et al., 1997; Craig et al., 1998, 1999; Taylor et al., 2001), but comparative genomics opens the way to making this a well defined path.
Repetitive sequences have amplified and diverged within species, thus these sequences are highly polymorphic in location within these species and can provide abundant genetic markers. As yet we do not know whether all these repetitive sequences are equally dispersed in the genomes of these legumes; some repetitive elements such as PDR1 have been shown to be located adjacent to low copy sequences. This remains to be tested for other repeated sequen)ces. If these are clustered (San Miguel et al., 1996) then repeated sequence markers could represent powerful tools for comparative genome analysis as they would mark the discontinuities in otherwise contiguous genomes. On the other hand, these repetitive elements may be closely interspersed with genes. This may seem a depressing prospect, but we know that P. abyssinicum and P. fulvum share very few retrotransposon insertion sites with P. sativum, so if this is the case we have a simple method for identifying markers in the immediate vicinity of genes of our choice.
The pattern of allele sharing among disparate Pisum accessions is beginning to emerge (Pearce et al., 2000), holding the prospect that a genealogy of Pisum germplasm may be deduced from marker data. In turn this implies that genetic inference from germplasm analysis may soon be a possibility.
Pea genetics now presents a clear view of the linkage map and hence provide ordered access to an abundance of interesting allelic variation accumulated over approx. 200 yr. The relationship between this structure and emerging tools from model legume genomics is becoming clear, suggesting that positional cloning, aided by an abundance of genetic markers, is likely to become easier. The genetic structure of the pea germplasm collections is beginning to be described, and this in turn will be related to the genetic map and the diversification of allelic variants within the genus. The structural genomics of pea is overshadowed by developments in other species, but is in a good position to draw on resources invested elsewhere.
Thanks are due to AV Vershinin, JMI Hofer, MR Knox and JA Downie for comments on this manuscript. THNE acknowledges the support of the EU funded projects MEDICAGO (QLG2-CT-2000–30676) and TEGERM (QLK-CT-2000–01502). SJP was supported by a BBSRC Case Studentship with Plant Breeding International. In the writing of this article, many references had to be omitted due to space constraints, a full set of references concerning the placement of markers on the pea genetic map can be found at http://www.jic.bbsrc.ac.uk/staff/noel-ellis/n.e.home/classicalmaptables/html.