Identification of perennial ryegrass (Lolium perenne (L.)) and meadow fescue (Festuca pratensis (Huds.)) candidate orthologous sequences to the rice Hd1(Se1) and barley HvCO1 CONSTANS-like genes through comparative mapping and microsynteny
I. P. Armstead,
Plant Genetics and Breeding Department, Institute of Grassland and Environmental Research, Aberystwyth, Ceredigion SY23 3EB, UK
• Microsynteny with rice and comparative genetic mapping were used to identify candidate orthologous sequences to the rice Hd1(Se1) gene in Lolium perenne and Festuca pratensis.
• A F. pratensis bacterial artificial chromosome (BAC) library was screened with a marker (S2539) physically close to Hd1 in rice to identify the equivalent genomic region in F. pratensis. The BAC sequence was used to identify and map the same region in L. perenne.
• Predicted protein sequences for L. perenne and F. pratensis Hd1 candidates (LpHd1 and FpHd1) indicated they were CONSTANS-like zinc finger proteins with 61–62% sequence identity with rice Hd1 and 72% identity with barley HvCO1. LpHd1 and FpHd1 were physically linked in their respective genomes (< 4 kb) to marker S2539, which was mapped to L. perenne chromosome 7.
• The identified candidate orthologues of rice Hd1 and barley HvCO1 in L. perenne and F. pratensis map to chromosome 7, a region of the L. perenne genome which has a degree of conserved genetic synteny both with rice chromosome 6, which contains Hd1, and barley chromosome 7H, which contains HvCO1.
Comparative genetic mapping between plant species has established that there has been a recognizable conservation of genomic organization which reflects evolutionary relationships. This can be most clearly seen in the genetic analyses that have aligned the genomes of various grass species (Dunford et al., 1995; Moore et al., 1995; Gale & Devos, 1998; van Deynze et al., 1998; Smilde et al., 2001; Jones et al., 2002). The publication of the complete genome sequences of rice (Goff et al., 2002) and Arabidopsis (Arabidopsis Genome Initiative, 2000) has allowed the description of these underlying syntenic relationships to be taken to a more detailed level through detailed annotations and comparative physical mapping of gene sequences (http://www.gramene.org; Sutton et al., 2003). While comprehensive microsyntenic relationships between model and crop species may soon become available for major crop species, this is unlikely to be true for other species, including the forage and amenity grasses of the Lolium/Festuca complex. For this latter group, the driver for the establishment of microsyntenic relationships with model species will be specific experimental aims. In forage and amenity grasses, one of these aims is to further our understanding of the processes that control the timing of the transition from vegetative to reproductive growth, as this has impacts both on ‘performance’ traits associated with the vegetative growth phase and seed yield.
Hd1 has been mapped to rice chromosome 6, where it is closely linked to marker S2539, and underlies a major quantitative trait locus for heading date (Yano et al., 2000; Yamamoto et al., 1998). Comparative mapping between rice and Lolium perenne has shown that this region of rice is syntenic with L. perenne chromosome 7 (Jones et al., 2002; Armstead et al., 2004, 2002) and by inference with chromosome 7 of the closely related forage grass Festuca (Schedonorus) pratensis (Alm et al., 2003; King et al., 1998). CO-like sequences have also been mapped in barley (Griffiths et al., 2003) and wheat (Nemoto et al., 2003). In barley, the CO-like sequences (HvCO1–9) were shown to map to several different chromosomes, with the most similar sequences to Hd1 (HvCO1 and 2) mapping to chromosomes 7H and 6H, respectively. The CO-like sequences derived from wheat (TaHd1-1–3) were assigned to the long arm of homeologous group 6. Recently, Martin et al. (2004) isolated CO-like sequences from L. perenne (LpCO) and Lolium temulentum (LtCO) and demonstrated that LpCO exhibited CO-like expression patterns in L. perenne and was able to complement the Arabidopsis co2 mutant. However, the relative genomic positions of these Lolium sequences have yet to be established. It has already been shown in the analysis of the L. perenne equivalent of the Hd3 region in rice that the relative physical position of sequences in the rice genome can be used as a template for identifying genetic markers for mapping in L. perenne (Armstead et al., 2004). The present study describes the use of microsynteny between L. perenne and rice to identify putative orthologous gene sequences of Hd1 and CO in L. perenne and F. pratensis.
Genetic mapping, quantitative trait locus analysis and heading date evaluation
Genetic mapping of chromosome 7 using JoinMap® 3.0 software (van Ooijen & Voorrips, 2001) and heading date evaluation were carried out on an F2L. perenne population of 188 individuals, as described by Armstead et al. (2004). Additional markers ACA-CTA-321, CAC-CTA-254 and ACA-CAC-225 were described by Skøt et al. (2004). A sequence-tagged-site (STS) amplified by primer pair S2539.2F–S2539.5R, which was derived from the rice sequence AK060995 (which includes the expressed sequences tag (EST) marker S2539), was scored as a cleaved amplified polymorphic sequence (CAPS) marker after restriction digestion using PshA1 (New England Biolabs, Hitchin, UK) of the polymerase chain reaction (PCR) product. Quantitative trait loci (QTL) detection by interval and multiple QTL mapping (MQM) was implemented using MapQTL 4.0 software (van Ooijen et al., 2002). Markers C764 and LtemCOa were used as the final cofactor selection for MQM mapping.
Festuca pratensis BAC library screening and BAC sequencing
The F. pratensis BAC library, containing 2.5 genome equivalents, was produced and PCR screened as described by Donnison et al. (2005). The BAC identified was sequenced directly (without subcloning) on an ABI 3100 (Applied Biosystems, Warrington, UK) automated fluorescent sequencer over the region described in GenBank Acc. AJ833018.
Amplification, cloning and sequencing of PCR products
DNA extracted from L. perenne genotypes taken from the F2 mapping family was used as the template for PCR amplifications. The PCR amplifications of products < 2 kb using primers pairs S2539.2F–S2539.5R, Hd1.1F–Hd1.3R and HD1.1F–Hd1.4R were performed using Taq DNA polymerase and the manufacturer's buffer systems (Roche, Lewes, UK). Thermal cycling for all three primer combinations was performed beginning with 1 min at 94°C, followed by 10 cycles of 1 min at 94°C, 1 min at 60°C (with the temperature reduced by 1°C per cycle), 3 min at 72°C, followed by 25 cycles of 1 min at 94°C, 1 min at 50°C and 3 min at 72°C. The PCR products were cloned into the pGEM-T Easy vector (Promega, Southampton, UK). Polymerase chain reaction amplifications of products > 2 kb using primer pairs S2539.1F–Hd1(ld).1R and Hd1all.1F–Hd1all.2R were performed using Elongase Enzyme Mix using the manufacturer's buffer systems with 2 mm Mg2+ (Invitrogen, Paisley, UK). Thermal cycling for both primer combinations was performed with an initial denaturation of 30 s at 94°C, followed by 10 cycles of 30 s at 94°C, 1 min at 60°C (with the temperature reduced by 1°C per cycle), 10 min at 68°C, followed by 35 cycles of 30 s at 94°C, 1 min at 50°C and 10 min at 68°C, followed by 7 min at 72°C. The PCR products were cloned into the pCR-XL-TOPO vector (Invitrogen). The PCR primer sequences (5′-3′) used were as follows: S2539.1F (AGGCCCTCGCCCTTGATGAG), S2539.2F (TCAGTGGCATCAAGGCCCTT), S2539.5R (CTGAGCCTGGAGCACTACTC), Hd1.1F (GCAGCAGATGCAAAAGGAGT), Hd1.3R (ACTATACCCGCTTCCATTGA), Hd1.4R (GAGAACATCTGGTCCACTTC), Hd1(ld).1R (CCTGCTCTGCCCCCACAAGT), Hd1all.1F (AAGATAGATGCAGCATCTTC), Hd1all.2R (CTTCTGTTATGATTCCAGTC). Automated fluorescent sequencing was carried out on an ABI 3100 (Applied Biosystems).
Protein and mRNA prediction from genomic sequences
Primers S2539.2F/5R amplified a single fragment of 875 bp from both haplotypes segregating in the L. perenne mapping family. Sequencing of these fragments and alignment with rice target sequence (AK060295) indicated that they spanned two introns and had c. 87% sequence homology over the equivalent exonic regions (Fig. 1). Sequence comparison of the two L. perenne haplotypes identified four single nucleotide polymorphisms, one of which affected the recognition sequence for restriction enzyme PshA1 and allowed for the development of a codominant CAPS marker (S2539). This was used to map the L. perenne equivalent of the AK060295 sequence to a region of L. perenne chromosome 7 that had previously been shown to have a degree of conserved synteny with rice chromosome 6 (Jones et al., 2002; Armstead et al., 2004, 2002). The S2539 CAPS marker was closely linked with markers generated from a heading-date linkage disequilibrium study ACA-CTA-321, CAC-CTA-254 and ACA-CAC-225 (Skøt et al., 2004) (Fig. 2). Quantitative trait locus detection using interval and MQM analysis produced results that were similar to those described by Armstead et al. (2004) and indicated the presence of a major quantitative trait locus for heading date, the peak of which was associated with the C764 region, which accounted for up to 64% of the variance associated with the trait (Fig. 2). The use of C764 and LtemCOa as cofactors in MQM analysis did not indicate the presence of a significant genetic effect associated with the putative LpHd1 region.
Isolation of F. pratensis BAC and identification of Hd1 candidate sequence
Primers S2539.2F/5R were used to PCR screen the F. pratensis BAC library and a PCR product of the correct size was found to be represented three times within the BAC library. A single BAC was isolated and used as a template for PCR amplification using primer pairs Hd1.1F–Hd1.3R and Hd1.1F–Hd1.4R; these latter primer pairs were based on homology between the rice Hd1 and the wheat TaHd-1 sequences (see Table 1) and amplified PCR products of c. 1250 and 1500 bp, respectively (sizes of the equivalent regions from rice are c. 700 and 1000 bp). These primers had previously been used directly on L. perenne genomic DNA and had either given no amplification product or amplified nontarget sequence. The BAC clone was sequenced directly and a contiguous region of 9336 bp was identified which contained the putative F. pratensis orthologous regions for sequence AK060295 and Hd1 (Fig. 1). A comparison of this region derived from F. pratensis with that derived from rice (taken from rice BAC clone AP003044) showed that the orientation of the putative genes was the same in both species, but the intergenic region (distance between the genomic positions of predicted mRNA sequences) was longer in rice (c. 8400 bp) than in F. pratensis (c. 3000 bp).
Table 1. Percentage similarity of protein sequences for CONSTANS-like sequences from Lolium perenne (LpHd1), Lolium temulentum (LtCO), Festuca pratensis (FpHd1), rice (Hd1a and Hd1b), barley (HvCO1 and HvCO2) and wheat (TaHd1-1, TaHd1-2 andTaHd1-3)
Amino acid residues (n)
Percentage similarities are based upon the number of identical amino acid residues after sequence alignment. The different genotypic origins of sequences with the same name is denoted by the suffix a or b (see full Genbank Accession information).
Isolation of S2539-Hd1 candidate region from L. perenne
Primers based upon the F. pratensis BAC sequence were designed which amplified two overlapping sequences from a single haplotype of the L. perenne mapping family. The first sequence (c. 6000 bp, amplified by primer pair S2539.1f/Hd1(ld).1r) included the majority of the AK060295 region up to the first putative intron of the LpHd1 gene. The second sequence (c. 3400 bp, amplifed by primer pair Hd1all.1F/Hd1all.2R) encompassed the entire LpHd1 gene and included a 1427 bp overlap with the first sequence (Fig. 1). This confirmed that the AK060295 and Hd1 regions were contiguous in L. perenne as well as F. pratensis. Attempts to amplify the entire genomic region including the complete sequences for AK060295 and LpHd1 as a single PCR product from L. perenne were unsuccessful.
Comparison of predicted mRNA and protein sequences for AK060295 from rice and its putative orthologue from F. pratensis
The mRNA and protein sequences derived from AK060295 were predicted using fgenesh. The derived mRNA sequence consisted of 939 bp and contained four exons. The protein translation of this sequence (312 amino acids (aa)), was used as a model for mRNA and peptide prediction using fgenesh+ for the F. pratensis sequence. The derived F. pratensis mRNA sequence also consisted of 939 bp and contained four exons (Fig. 1). The derived F. pratensis protein consisted of 312 aa and, based on identical aa residues, shared an 86% sequence similarity with the rice sequence. This indicates that the F. pratensis genomic sequence contained a candidate gene for an orthologue of rice AK060295 (Fig. 1).
Comparison of predicted mRNAs and proteins for the candidate Hd1 genes derived from L. perenne and F. pratensis with LpCO from L. perenne, LtCO from L. temulentum, Hd1 from rice and HvCO1 from barley
fgenesh+ was used to predict the mRNA and protein sequences derived from the Hd1 region in F. pratensis and from L. perenne; in these cases the barley HvCO1 protein sequence was used as the model (AF490468). The predicted mRNA sequences contained 1134 bp and 1131 bp from L. perenne and F. pratensis, respectively, and were derived from two exons; the predicted mRNA from L. perenne showed 100% sequence identity with the CO-like gene product (AY600919) derived from L. perenne described by Martin et al. (2004). The corresponding predicted proteins, LpHd1 and FpHd1, consisted of 377 aa and 376 aa. The percentage similarity based on identical aa residues between the F. pratensis, L. perenne and various CO-like sequences from L. temulentum, rice, wheat and barley is detailed in Table 1 and shows that the L. perenne sequence has 96% similarity with the F. pratensis and L. temulentum sequences. Compared with sequences derived from the other species, the L. perenne and F. pratensis sequences are most similar to barley HvCO1 (72%), rice Hd1 (61–62%), barley HvCO2 (54%) and wheat TaHd1 (53–55%) sequences. The similarity with other CO-like sequences derived from barley (HvCO3–8) was < 25% (data not shown). Detailed analysis of the protein sequences showed that both the L. perenne and F. pratensis sequences contained the B-box and CCT domains associated with CO-like sequences (Robson et al., 2001) (Fig. 3). The first B-box for the L. perenne and F. pratensis sequences was complete, containing all the expected conserved cysteine (C) and histidine (H) residues; the second B-box had serine and tyrosine substitutions for the first and fifth conserved C residues, respectively, as did the HvCO1 sequences; the second conserved C residue had a H substitution in the L. perenne and F. pratensis sequences and a tyrosine substitution in the HvCO1 sequence (Fig. 3a). The second B-box was generally less well conserved between L. perenne, F. pratensis, barley and, particularly rice, than the first B-box (Fig. 3b). The CCT domain was strongly conserved between L. perenne, F. pratensis, barley and rice (Fig. 3b), with only 3 out of 43 aa residues differing between L. perenne, F. pratensis and barley and 5 out of the 43 aa residues differing between L. perenne, F. pratensis and rice.
The ability to use the comprehensive information that has become available for model plant species in a broader context is important for two different reasons. First, it allows for the transfer of information from model/major crop species to less intensively studied species via comparative genomics and related disciplines. Second, it allows for biological models and assumptions that may appear to be true for the model organisms to be tested in a wider variety of species, in different experimental contexts and towards different experimental ends. Previously, comparative mapping work between L. perenne, F. pratensis, rice and the Triticeae (Armstead et al., 2002; Jones et al., 2002; Alm et al., 2003) had demonstrated that general relationships could be identified in terms of genetic synteny and that the rice genome could be used as a template for fine-mapping for Lolium spp. (Armstead et al., 2004). The present work has demonstrated that the physical order of genes can also be conserved at the microsyntenic level.
Both CO and CO-like genes have been identified as being of key importance in the floral induction pathways of both dicots and monocots. Many such genes have been cloned and analysed from a variety of species and detailed comparisons have been made of this family of genes both in terms of structure and function. However, one problem in trying to make cross-species comparisons within families of genes is that it is difficult to be certain whether true orthologues are being compared. Additionally, optimizing PCR amplification conditions to distinguish between closely related genes within the genome of a particular genotype can be problematic. In the case described in the present work, the PCR primers designed to identify the Hd1 orthologous sequence from L. perenne failed to amplify the target sequence from genomic DNA, whereas they were effective when used directly on the BAC-library derived plasmid containing the FpHd1 sequence, indicating that dosage and/or competition were important in amplifying the correct sequence. However, it was apparent that in the rice genome, marker S2539 was closely linked with Hd1 both genetically and physically (Yano et al., 2000) and that if the microsyntenic relationship was conserved between rice, L. perenne and F. pratensis this might be an alternative genomics-based approach to identifying the Hd1 orthologous sequence. The use of the F. pratensis BAC library and the cloning of the PCR fragments from L. perenne indicated that the microsyntenic relationship did exist. Translations of the AK060295 and Hd1 putative genes from L. perenne and F. pratensis provided further evidence for this by showing that there was a high degree of similarity between these and the equivalent sequences from rice and barley.
Comparative mapping has shown that the region of L. perenne and F. pratensis chromosome 7 to which S2539-LpHd1 maps, has a degree of conserved genetic synteny with rice chromosome 6 – including the region which contains the rice FT orthologue Hd3a and the Hd3 quantitative trait locus. This paper has now identified that L. perenne/F. pratensis chromosome 7 is likely to contain the equivalent of the rice Hd1 region, which is also associated with one of the major heading date QTL in rice (the Hd1 quantitative trait locus; Yano et al., 2000). However, while the order of the markers mapped in common between rice and L. perenne is conserved, there are considerable differences in the comparative genetic distances, particularly in the interval between the Hd3 and Hd1 regions (Fig. 4). This does not necessarily reflect the gain or loss of intervening physical distance but it does not seem unlikely that this has occurred. Chromosome 7 of L. perenne/F. pratensis has also been aligned with Triticeae 7 (Armstead et al., 2002; Jones et al., 2002; Alm et al., 2003;) although there are not enough markers mapped in common to describe a detailed relationship. However, HvCO1, the sequence with which LpHd1 and FpHd1 has the greatest homology, has been shown to map to a region of barley chromosome 7H which is colinear with the region of rice chromosome 6 that contains Hd1 (Griffiths et al., 2003; van Deynze et al., 1998). Therefore, both on the basis of sequence homology and the available mapping information, LpHd1 and FpHd1 are strong candidates as orthologues to both rice Hd1 and barley HvCO1.
Nemoto et al. (2003) have suggested TaHd1-1 from wheat is an orthologue of rice Hd1 on the basis of sequence similarity and the ability to complement a Hd1-deficient rice line. The same study assigned the TaHd1 genes to the long arm of wheat homeologous group 6 and explained the disparity between the expected position (on the basis of established syntenic relationships between the Triticeae and rice) on homeologous group 7 and its observed position on group 6 by the likelihood of there having been chromosomal rearrangements. While comparisons of the B-Box and CCT domains of the TaHd1 sequences have indicated that they are clearly closely related to those of Hd1 (Nemoto et al., 2003), which is itself closely related to HvCO1 and HvCO2 (Griffiths et al., 2003), if the whole of the TaHd1 predicted proteins are considered, they are more closely related to the HvCO2 sequences (92–94% similarity) than they are to the HvCO1 sequences (52–53%; see Table 1). In addition, HvCO2 and TaHd1 have been mapped to equivalent linkage groups (in terms of conserved genetic synteny) namely, barley 6H and wheat homeologous group 6, respectively (Griffiths et al., 2003). Therefore, the present study and that of Griffiths et al. (2003) suggests that it is not unlikely that there may be an Hd1 orthologous sequence yet to be identified on wheat homeologous group 7.
The relative closeness of the markers associated with the Hd3 (LpHd3) and Hd1 (LpHd1) equivalent regions in L. perenne (the distance from C764 to S2539 is c. 7 cm) presents a problem in terms of distinguishing between single and multiple QTL in this region. Armstead et al. (2004) using the same L. perenne mapping family as the present study, reported the position of a major quantitative trait locus for heading-date associated with the LpHd3 region of chromosome 7. A repeat of this QTL analysis including markers S2539, ACA-CTA-321, CAC-CTA-254 and ACA-CAC-225 using MQM analysis and C764 and LtemCO1 as cofactors (and a variety of other different cofactor combinations; data not shown) indicated that the peak of the QTL was still associated with the LpHd3 region and that there was no significant quantitative trait locus associated with the LpHd1 region (Fig. 2). Markers ACA-CTA-321, CAC-CTA-254 and ACA-CAC-225, which flanked S2539 in a c. 2-cm interval, were derived from amplified fragment length polymorphism (AFLP) bands produced in a study designed to identify molecular markers in linkage disequilibrium with heading-date across a range of L. perenne ecotypes (Skøt et al., 2004). The fact that all three of these markers tightly flank the LpHd1 region and not the LpHd3 region (Fig. 2) indicates that both the LpHd3 and LpHd1 regions can significantly affect heading-date phenotype in this region of chromosome 7. However, while the heading-date quantitative trait locus described in this paper seems to be closely associated with the LpHd3 region, it does only represent a single year's field evaluation of this mapping family (Armstead et al., 2004). Consequently, a degree of caution should be shown when associating the quantitative trait locus only with the LpHd3 region as opposed to the closely linked LpHd1 region.
A number of the genes fundamental in the induction to flowering have been shown to have a high degree of conservation of molecular structure between Arabidopsis and monocot species, though precise modes of action may differ (Gocal et al., 2001; Cremer & Coupland, 2003; Hayama et al., 2003; Tadege et al., 2003; Andersen et al., 2004; Martin et al., 2004). How these genes function in controlling heading date in monocot species other than rice is, as yet, largely unknown and directly validating LpHd1 and FpHd1 as orthologues of Hd1 and HvCO1 through functional analysis is beyond the scope of this paper, although the work of Martin et al. (2004) would seem to indicate significant roles for these genes. Further experiments are under way in Lolium and Festuca species to try to assign precise functions to LpHd1 and FpHd1 and other flowering-time associated genes. The ability, as described in this paper, to associate sequences from different species on the basis of comparative genomics as well as sequence homology will allow the results from Lolium and Festuca species to be compared with other systems with greater reliability.
This work was supported by the Biotechnology and Biological Sciences Research Council, UK.