Orthologous comparison in a gene-rich region among grasses reveals stability in the sugarcane polyploid genome

Authors

  • Nazeema Jannoo,

    1. Centro de Biologia Molecular e Engenharia Genética (CBMEG), Universidade Estadual de Campinas, 13083-970, Campinas, SP, Brazil,
    Search for more papers by this author
  • Laurent Grivet,

    1. Centro de Biologia Molecular e Engenharia Genética (CBMEG), Universidade Estadual de Campinas, 13083-970, Campinas, SP, Brazil,
    2. Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), UMR 1096, Avenue Agropolis, TA40/03, 34398 Montpellier Cedex 5, France, and
    Search for more papers by this author
  • Nathalie Chantret,

    1. Institut National de la Recherche Agronomique (INRA), UMR 1096, Avenue Agropolis, TA40/03, 34398 Montpellier Cedex 5, France
    Search for more papers by this author
    • Present address: INRA, UMR 1097, Domaine de Melgueil, 34130 Maugio, France.

  • Olivier Garsmeur,

    1. Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), UMR 1096, Avenue Agropolis, TA40/03, 34398 Montpellier Cedex 5, France, and
    Search for more papers by this author
  • Jean Christophe Glaszmann,

    1. Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), UMR 1096, Avenue Agropolis, TA40/03, 34398 Montpellier Cedex 5, France, and
    Search for more papers by this author
  • Paulo Arruda,

    1. Centro de Biologia Molecular e Engenharia Genética (CBMEG), Universidade Estadual de Campinas, 13083-970, Campinas, SP, Brazil,
    Search for more papers by this author
  • Angélique D’Hont

    Corresponding author
    1. Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), UMR 1096, Avenue Agropolis, TA40/03, 34398 Montpellier Cedex 5, France, and
    Search for more papers by this author

*(fax +33 467 615 605; email dhont@cirad.fr).

Summary

Modern sugarcane (Saccharum spp.) is an important grass that contributes 60% of the raw sugar produced worldwide and has a high biofuel production potential. It was created about a century ago through hybridization of two highly polyploid species, namely S. officinarum and S. spontaneum. We investigated genome dynamics in this highly polyploid context by analyzing two homoeologous sequences (97 and 126 kb) in a region that has already been studied in several cereals. Our findings indicate that the two Saccharum species diverged 1.5–2 million years ago from one another and 8–9 million years ago from sorghum. The two sugarcane homoeologous haplotypes show perfect colinearity as well as high gene structure conservation. Apart from the insertion of a few retrotransposable elements, high homology was also observed for the non-transcribed regions. Relative to sorghum, the sugarcane sequences displayed colinearity, with the exception of two genes present only in sorghum, and striking homology in most non-coding parts of the genome. The gene distribution highlighted high synteny and colinearity with rice, and partial colinearity with each homoeologous maize region, which became perfect when the sequences were combined. The haplotypes observed in sugarcane may thus closely represent the ancestral Andropogoneae haplotype. This analysis of sugarcane haplotype organization at the sequence level suggests that the high ploidy in sugarcane did not induce generalized reshaping of its genome, thus challenging the idea that polyploidy quickly induces generalized rearrangement of genomes. These results also confirm the view that sorghum is the model of choice for sugarcane.

Introduction

The large monophyletic grass family (Poaceae) encompasses around 10 000 species (Clark et al., 1995), including important crops such as rice, maize, wheat, barley, sorghum, millet and sugarcane. Comparative genetic mapping and sequencing between grass species has highlighted global conservation of gene content and order (Feuillet and Keller, 2002; Moore et al., 1995) despite considerable genome size variations (Bennett and Leitch, 1995). Two trends are responsible for the increase in genome size and complexity. One is an increase in monoploid genome size with, for example, a 13-fold difference between barley (5000 Mb) and rice (390 Mb), which is related mainly to transposable element amplification (Bennetzen, 2005). The other involves duplications of the whole genome, i.e. polyploidization, with, for example, rice and sorghum as diploids and bread wheat as a hexaploid.

Sugarcane (Saccharum spp.), a non-cereal grass, probably has the most complex of all crop genomes studied to date, mainly due to the very high degree of polyploidy (>10x) together with its inter-specific origin (D’Hont, 2005; Grivet and Arruda, 2001). Modern sugarcane cultivars derive from the combination of the polyploid species S. officinarum, the domesticated sugar-producing species with x = 10 and 2n = 8x = 80, and S. spontaneum, a vigorous wild species with x = 8 and 2n = 5x = 40 to 16x = 128 and many aneuploid forms (D’Hont et al., 1998; Sreenivasan et al., 1987). Both species are thought to have an autopolyploid origin (Grivet et al., 1996; Sreenivasan et al., 1987). Breeders combined both genomes a century ago. S. officinarum was back-crossed to recover the thick sugar-containing stalks of this species, and this process was accelerated through the selection of hybrids derived from 2n transmission of S. officinarum chromosomes (Bremer, 1961). Modern cultivars are derived from these inter-specific crosses. They are highly polyploid (more than decaploid) and aneuploid, with around 120 chromosomes (reviewed by Sreenivasan et al., 1987) and a genome size of around 10 000 Mb (D’Hont, 2005). The meiosis of modern sugarcane cultivars mainly involves bivalent pairing (Burner, 1994; Price, 1963), and chromosome assortment results from a combination of polysomy and preferential pairing (Grivet et al., 1996; Hoarau et al., 2001; Jannoo et al., 2004). Molecular cytogenetics (Cuadrado et al., 2004; D’Hont et al., 1996; Piperidis and D’Hont, 2001) and genetic mapping studies (Grivet et al., 1996; Hoarau et al., 2001) have shown that modern cultivars typically have 70–80% of chromosomes entirely derived from S. officinarum, 10–20% from S. spontaneum, and a few chromosomes derived from inter-specific recombinations.

Between S. officinarum and S. spontaneum, genetic mapping has revealed extensive global synteny and colinearity, with a few exceptions (Grivet et al., 1996; Ming et al., 1998). Simple structural changes are expected in accordance with the difference in basic chromosome number between the two species (x = 10 and 8, respectively).

Both S. officinarum and S. spontaneum are considered to be ancient highly polyploid species (Sreenivasan et al., 1987). The understanding of genome evolution after polyploidization has markedly improved recently, due, in particular, to comparative genomic sequence analysis. Polyploidization is often followed by extensive and rapid genomic alterations with massive silencing and elimination of duplicated genes (Adams and Wendel, 2005).

One particular region, bearing Adh1 in sorghum, has been thoroughly studied (Ilic et al., 2003) within the Poaceae family, revealing a complex history of local rearrangement in rice, sorghum and maize (Ilic et al., 2003; Tikhonov et al., 1999). The latter two species belong to the Andropogoneae tribe (as does sugarcane). We sequenced this region for two homoeologous sugarcane haplotypes, one derived from S. spontaneum and one from S. officinarum, selected form at least 13 homo(e)ologous haplotypes that we found co-existing in a typical modern sugarcane cultivar. Their comparison has provided insight into homo(eo)logous haplotype organization within the highly polyploid sugarcane genome. We compared them in detail with sorghum (Sorghum bicolor), the closest sugarcane relative studied to date, which shares the simplest macro-synteny relationship with sugarcane (Asnaghi et al., 2000; Dufour et al., 1997; Glaszmann et al., 1997; Grivet et al., 1994; Guimaraes et al., 1997; Ming et al., 1998; Sobral et al., 1994). We also assessed these data relative to the model diploid grass genome of rice (Oryza sativa), and to the extensively rearranged paleopolyploid genome of maize (Zea mays) (Gaut et al., 2000; Lai et al., 2004) in order to further characterize the history of rearrangements of this region in the grass family and the different evolutionary dynamics within each species.

Results

Selection of two homoeologous sugarcane BAC clones orthologous to sorghum BAC 110K5

The sequence of sorghum BAC 110K05 (Tikhonov et al., 1999) was exploited to select homoeologous sugarcane BAC clones representing the two ancestral species involved in modern cultivars (i.e. S. spontaneum and S. officinarum). The sorghum sequence, on which 15 complete genes, including Adh1, were annotated (Ilic et al., 2003; Tikhonov et al., 1999), was compared to the SUCEST sugarcane EST collection using the BLAST program. ESTs homologous (≤ E−30) to all genes were found, and the corresponding cDNAs were used as probes to screen the sugarcane R570 BAC library, except for gene 9 as no relevant cDNA could be recovered. Twenty BAC clones on which a minimum of two cDNAs hybridized were identified (Figure 1). The cDNA corresponding to gene 8 hybridized to eight BACs to which no other cDNA hybridized. BAC HindIII fingerprints as well as RFLP hybridization patterns were used to assemble BACs into haplotypes representing various homo(eo)logous chromosomes. Nineteen BAC clones were shown to correspond to 13 haplotypes (I–XIII); the remaining BAC clone (XIV) could correspond to part of haplotype X, XI, XII or XIII or be a distinct haplotype (Figure 1).

Figure 1.

 Identification of sugarcane BAC clones orthologous to the sorghum BAC 110K05 through hybridization of sugarcane cDNAs homologous to the sorghum genes.
Sorghum genes are numbered according to the method described by Tikhonov et al. (1999). A black square indicates that the corresponding sugarcane cDNA hybridized to the BAC clone. A white square indicates no hybridization signal. A gray square indicates not tested. Only BAC clones hybridizing to at least two distinct probes are indicated. BAC clones corresponding to the same haplotype are assembled within the same box. The asterisk indicates this BAC clone may correspond to haplotype X, XI, XII or XIII or be a distinct haplotype. The origin of the haplotype –S. officinarum (S. off.) versus S. spontaneum (S. spont.) – is given when known. The estimated size of the BAC clones is indicated. The two BACs that have been sequenced are highlighted in gray.

The species origin of the eight BAC clones containing Adh1 was assessed by comparing the sequence of part of the Adh1 gene with five S. officinarum and four S. spontaneum accessions involved in the genealogy of modern sugarcane cultivars. Depending on the accession, one to three distinct haplotypes were identified. Haplotype sequence diversity was lower in S. officinarum than in S. spontaneum, as expected from diversity studies (Lu et al., 1994). On the neighbor-joining tree (Figure 2), haplotypes from S. officinarum were all assembled on a separate branch. Most S. spontaneum haplotypes were grouped within another branch, and one haplotype from Mandalay (haplotype I) was separated from the rest of the S. spontaneum haplotypes. Among the haplotypes corresponding to the BAC clones, four were grouped with the S. officinarum haplotypes and two with the S. spontaneum haplotypes.

Figure 2.

 Neighbor-joining tree constructed with a 711 bp sequence of Adh1 for eight BAC clones (in italics), five S. officinarum accessions (white circle) and four S. spontaneum accessions (black circle).
For ancestral accessions, the haplotype number is indicated when more than one has been sequenced. The two BACs that have been sequenced are highlighted in grey. The structure of the maize Adh1 gene is shown with the primer positions.

Given the pedigree of R570, ancestral chromosomes have been disrupted by only four meioses since the first inter-specific hybridization. Moreover, it has been suggested that pairing between homoeologous chromosomes is infrequent (D’Hont et al., 1996; Jannoo et al., 2004). It is thus very likely that the BAC sequences studied here have not been rearranged by inter-specific cross-over, which makes the Adh1 sequences representative of the entire BAC origin. S. spontaneum BAC 265O22, which was the only single BAC that completely spanned the sorghum target region, and S. officinarum BAC 51L01, which was the clone presenting the next highest overlap with the sorghum BAC 110K5, were selected for sequencing.

Sugarcane sequence annotation

Gene repertoire. Sixteen complete genes were annotated on the 126 kb S. spontaneum BAC 265O22 and nine on the 97 kb S. officinarum BAC 51L01 (Figure 3 and Table 1). The genes were assigned numeric names based on the homology with their sorghum counterparts. In addition, a partial gene corresponding to sorghum partial gene 3.5 was identified between genes 3 and 4 in both S. spontaneum and S. officinarum. ESTs with high homology to all complete genes, except gene 8, were identified in the SUCEST database. For all genes except gene 3.5, the coding sequence could be translated into a complete protein sequence. Proteins from other species presenting high homology with the amino acid sequence deduced from the 18 genes were found in the database (Table 1). The function of twelve of them could be tentatively attributed.

Figure 3.

 Comparison of the physical organization of sorghum BAC 110K5 and two homeoelogous sugarcane BACs, BAC 265O22 derived from S. spontaneum and BAC 51L01 from S. officinarum.
Genes are indicated by grey boxes, non-LTR retrotransposons by black boxes, LTR retrotransposons by striped boxes, solo LTRs by squared boxes, pseudogene by a lozenge, MITEs by vertical arrows, and microsatellites by stars.

Table 1.   List of genes found on BAC 265O22 and/or BAC 51L01
GeneNo. exonsProtein giving highest BLAST score (GenBank accession)
19Oryza sativa protein similar to guanine nucleotide binding protein (AAP03425)
29Oryza sativa unknown protein (AAP03423 )
310Zea mays Adh1 (AAC34295 )
4/520Oryza sativa unknown protein (AAP03421 )
62Arabidopsis protein similar to pfkB family of carbohydrate kinases (At5g58730)
712Oryza sativa protein similar to cyclin H-1 (AAP03415)
7.54Oryza sativa unknown protein (AAP03409)
108Oryza sativa unknown protein (AAP03424 )
115Arabidopsis protein targeted to mitochondria proteins (At5g10860)
121Sorghum bicolor protein targeted either to mitochondria or chloroplast proteins (T50848)
134Oryza sativa protein similar to ABA-responsive protein (AAP03417)
141Sorghum bicolor protein similar to H+-transporting ATP synthase-like protein (T50849)
151Oryza sativa protein similar to cytokinesis-specific syntaxin-related protein (AAP03411)
169Arabidopsis protein targeted to chloroplast protein (At3g28460 )
176Hordeum vulgare protein similar to endo-1,4-β-glucanase (BAA94257 )
181Oryza sativa fertility restorer-like protein (AAP03402 )

Repetitive DNA.BAC 265O22 : three LTR (long terminal repeat) retrotransposons were identified, two inserted into introns of gene 4/5 and part of a third one located at the extremity of the BAC clone. The first one may still be functional, whereas the second one was probably degenerate as the coding region included frame shifts and in-frame stop codons. Two solo LTRs were inserted in intergenic regions. For both of them, another partial LTR sequence was inserted in the 5′ region. A pseudogene similar to that encoding the mudrA protein of the maize mutator transposon was also identified. Altogether, large transposable elements (TEs) represented 18% of this BAC (Figure 3 and Table 2). Twenty-three miniature inverted-repeat transposable elements (MITEs) were detected, 15 of which were similar to Tourist elements (Figure 3). For some of them, we could identify the target duplication site (TDS). Seventeen MITEs were inserted into intergenic regions and six into gene introns. Nine microsatellites (≥10 bp) were detected, with two located in gene introns (Figure 3).

Table 2.   Description of large transposable elements identified on BACs 265O22 and 51L01
BAC cloneTransposable elementHomology*LocationSize (bp)LTR homology
  1. *Homology found either by comparing with NCBI database or TIGR Graminea repeat database

265O22 (S. spontaneum)LTR retrotransposon (Copia-like)Hopscotch (SRSiTERTOOT00088, E-209)Intron 11 of gene 4/5561297%
LTR retrotransposon (Copia-like)Hopscotch (SRSiTERTOOT00088, E0)Intron 18 of gene 4/55303100%
Solo LTRLevithan LTR (SRSiTERTOOT00084, E-158)Intergenic region between gene 4/5 and 63941 
Solo LTR(a)Levithan LTR (SRSiTERTOOT00084, E-127)Intergenic region after gene 184123 
PseudogeneMutator transposase (gb AAP03357.1, E-109)Intergenic region after solo LTR1586 
LTR retrotransposon (Gypsy-like)ORF1 Athila (gb AAL77115.1, E-25)Intergenic region, extreme end of the BAC2401 
51L01 (S. officinarum)Non-LTR retro-element (LINE)Reverse transcriptase Oryza sativa (gb AAK52166.1, E0)12 bp after stop codon of gene 25417 
Non-LTR retro-element (LINE)LINE of Oryza australiensis (ORSiTERT00300002, E-139)Intergenic region between gene 3 and 4/55596 
Solo LTRLevithan LTR (SRSiTERTOOT00002, E-173)Intergenic region between gene 4/5 and 64915 
LTR retrotransposon (Copia-like)Hopscotch (AAP51971.1, E0)Intron 8 of gene 75619100%
LTR retrotransposon (Copia-like)Hopscotch (AAP51971.1, E0)Intron 4 of gene 10596798%

BAC51L01: two long interspersed repetitive elements (LINEs) were detected, both oriented similarly and truncated in their 5′ region, as seems to be the case for LINEs in plants (Kumar and Bennetzen, 1999). Two LTR retrotransposons were inserted in introns (genes 7 and 10). The first one is potentially functional but not the second one. We also identified a solo LTR. Altogether, large transposable elements (TEs) represented 28% of this BAC (Figure 3, Table 2). Seventeen MITEs were identified, nine of which were similar to Tourist elements (Figure 3). More than half of the MITEs (11) were present in introns. Twelve microsatellites (≥10 bp) were detected, four of which were within gene introns (Figure 3).

Sequence comparison in sugarcane and sorghum

Global sequence conservation. Complete BAC sequences were aligned to each other (Figure 4 and). Globally, a high level of sequence conservation was observed in the two Saccharum BAC clones and the sorghum BAC clones. The overlapping regions between the two Saccharum BAC clones corresponded to 79 689 bp in the former and 97 616 bp in the latter. Within these regions the aligned sequences represented 61 089 bp and displayed 96.5% identity. Among them, the non-coding parts (intergenic region + introns) displayed an average identity above 95.9%. The non-aligned sequences represented 23.3% and 37.4% in S. spontaneum and S. officinarum, respectively, and they corresponded mainly to large transposable elements (18.6% and 28.1%, respectively) and to MITEs, insertions/deletions (indels) or non-assigned intergenic sequences. Between sorghum and sugarcane overlapping sequences, 56 895 bp aligned with 88.9% identity in S. spontaneum and 49 324 bp aligned with 88.9% identity in S. officinarum. Among them, the non-coding parts displayed an average of 86.8% and 87% identity, respectively. As expected from the phylogenic relationships, the conservation of intergenic regions between sugarcane and sorghum was lower than within sugarcane, but still remained very high.

Figure 4.

 Pairwise percentage identity between homoeologous BACs from S. officinarum, S. spontaneum and sorghum, scaled from 50 to 100%. The S. officinarum sequences are considered as reference.

Gene order and structure conservation.

The colinearity and orientation of genes were well conserved in the three species, except for genes 8 and 9, which were present in sorghum and absent in both sugarcane homoeologous sequences (Figure 3). The cumulative size of all exons for the nine genes common to the three BACs was well conserved. Two slight variations were observed, a nonsense mutation located 51 bp before the end of gene 6 in S. spontaneum and a missense mutation at the stop codon of gene 12 in S. officinarum. The other seven genes had exactly the same length and diverged only by single nucleotide polymorphisms. There was high sequence identity between gene coding sequences, i.e. above 98% between S. officinarum and S. spontaneum and 94.5% between S. officinarum or S. spontaneum and sorghum.

Intron sequences could be partially aligned. Their detailed comparisons are described in. The non-aligned part of the intron sequences mainly corresponded to insertions/deletions (indels). A third of the indels of four or more base pairs (excluding those corresponding to repetitive elements) included perfect short direct repeats along the indel borders. This suggests that particular events such as illegitimate recombination (replication slippage, etc.) and/or transposable element excision could be responsible for these indels.

Time of divergence based on a well-characterized gene.

Synonymous substitutions at the Adh1 gene were used to infer the time of divergence between sorghum and sugarcane, and between the two Saccharum species. The average synonymous substitution rate of Adh genes in grasses has been reported to be 6.5 × 10−9 substitutions per synonymous site per year (Gaut et al., 1996; Gaut et al., 1999). On this basis, sorghum and sugarcane are estimated to have diverged 8–9 million years ago, and S. officinarum and S. spontaneum are estimated to have diverged 1.5–2 million years ago.

Common repetitive elements.

One large transposable element, i.e. a solo LTR, was found in a common position in both Saccharum species. In S. spontaneum, the solo LTR contained the insertion of a partial sequence of another solo LTR. Two complete LTR retrotransposons were present in both Saccharum species but at different positions. The synonymous substitutions between their two LTR sequences were used to estimate their insertion time. In each Saccharum species, one retrotrotransposon was found to have inserted very recently (<0.7 million years ago) as its two LTRs had exactly the same sequence. The other retrotransposons presented differences of 2% and 3% between their two LTRs in S. spontaneum and S. officinarum, respectively, and thus were estimated to have been inserted about 1.5–2 million years ago (computation based on the method described by Gaut et al., 1996). No large transposable element was detected in the sorghum sequence.

Nine MITEs were found in exactly the same position in S. officinarum and S. spontaneum haplotypes, and thus correspond to insertions that occurred before the divergence of these species. Among the other MITEs, few were located in perfectly aligned regions. These MITEs clearly inserted after the divergence of S. officinarum and S. spontaneum. Comparison of the positions of identified MITEs between sorghum and sugarcane revealed that none of them could be definitely considered to be derived from the same insertion event. However, some MITEs were located in the same regions in the three BAC clones according to the dot plots, so these regions resemble MITE insertion ‘hot spots’.

Seven microsatellites were detected at identical positions in both sugarcane haplotypes, five of which were also found at identical positions in sorghum. One microsatellite was found at identical positions in the S. spontaneum haplotype and sorghum, and another one at identical positions in the S. officinarum haplotype and sorghum. They all display a variation in repetition number between the haplotypes, except one.

Comparison of gene content and order between sugarcane, rice and maize orthologous regions

The gene content and order displayed by the orthologous rice and maize regions studied by Ilic et al. (2003) were compared to those in sugarcane (Figure 5). The rice sequence shared 11 genes with the orthologous sugarcane region, with exactly the same order and orientation. In addition, rice displayed one gene (7.5b) that corresponded to a tandem duplication of gene 7.5a, and lacked two genes (3 and 3.5).

Figure 5.

 Colinear genomic regions in sugarcane (homoeologous BACs 265O22 and 51L01), rice (BAC 84L17), sorghum (BAC 110K05) and two homoeologous maize segments (Adh1 region and contig 276N13/123C01). Genes are indicated by gray boxes and shaded areas connect conserved genes. The figure is modified from Figure 2 of Ilic et al. (2003) and supplemented with the sugarcane data.

Relative to sugarcane, the two homoeologous maize sequences shared seven and four genes, respectively. Only one gene (10) was common to the two maize sequences. The conservation of gene content between each maize sequence and sugarcane was thus very partial, but when combined, both maize homoeologous regions yielded the same gene content, order and orientation as in sugarcane.

Discussion

Extensive sequence conservation in the highly polyploid sugarcane context

The two haplotypes compared were selected among 13 homoeologous haplotypes identified in a modern sugarcane cultivar and represented both the S. officinarum and S. spontaneum components of its genome. The two haplotypes showed perfect gene colinearity as well as high gene structure conservation and nucleotide similarity (98%). Moreover, apart from the insertion of a few retrotransposons, the non-transcribed regions displayed high homology. In addition, Southern hybridization of 15 contiguous gene probes scattered throughout the region indicated the complete conservation of gene content and order among all 13 homoeologous haplotypes. Although hybridization data do not allow differentiation of genes from rearranged genes or pseudogenes, at least they show that these genes have not been completely deleted. The high conservation of genome organization observed here between sugarcane haplotypes contrasts with most previous studies which showed extensive genomic rearrangements among haplotypes in both recent and old polyploid genomes (Chantret et al., 2005; Feldman and Levy, 2005; Feuillet et al., 2001; Ilic et al., 2003; Lai et al., 2004; Ma et al., 2005; Wicker et al., 2003).

The haplotypes analyzed in this study evolved successively in two different polyploid contexts. First, they evolved separately within two distinct highly polyploid species, i.e. S. officinarum (8x) and S. spontaneum (4x to 16x). We estimated the divergence time of these species to be 1.5 to 2 million years ago based on the molecular clock of the Adh gene sequences. Then, a century ago, these haplotypes were juxtaposed within modern cultivars by inter-specific hybridization. The haplotype organization conservation observed here may result from several factors. Species of the Saccharum genus are considered to be autopolyploid, resulting from the addition of chromosome sets derived from the same species. Genome evolution in autopolyploids has been less thoroughly described and they may not have undergone the same events as allopolyploids. First, they did not encounter any inter-specific hybridization associated with genomic stresses that are known to activate TE elements (Kashkush et al., 2002) and induce genome rearrangements (Bennetzen, 2005). Secondly, the pairing behavior may have an impact on tolerance to genome disturbance. Pairing behavior is typically polysomic in autopolyploid species and disomic in allopolyploid species. In an allopolyploid, if one gene is deleted or rearranged such that its primary function is altered in one of the genomes, the disomic pairing behavior ensures that the homoeologous chromosome that still bears this gene will be consistently transmitted to the progeny. Consequently, there may be a progressive loss of paralogous gene functions without any decrease in viability or fitness. This phenomenon is at its early beginnings in hexaploid wheat and is almost achieved in maize, which approaches a diploid genic state (Ilic et al., 2003; Lai et al., 2004; Langham et al., 2004). However, in autopolyploids such as S. officinarum and S. spontaneum, the random assortment of chromosomes at meiosis may counter-select gene deletions or rearrangements because they may give rise to gametes lacking a complete gene set.

Haplotype organization has also been conserved despite juxtaposition a century ago through inter-specific hybridization, which may be due to the fact that this co-existence is very short. This ‘recent hybrid’ condition is enforced by the vegetative propagation of sugarcane: only four to seven meioses have occurred since the original inter-specific cross. In addition, it may also be related to the relatively small size of the monoploid (basic) genome of sugarcane (between 750 and 950 Mb; D’Hont and Glaszmann, 2001). Indeed, most reports of rapid rearrangements among homoeologs involve allopolyploid species with large monoploid genomes, namely wheat and maize. In these large monoploid genomes composed mainly of TE elements, the inter-specific hybridization event responsible for allopolyploidy must have led to high activation of TE and thus to rapid and important rearrangements. In smaller genomes, such increases in transposon activity and rearrangements following allopolyploidization may also occur but probably at a much lower scale. This lower level of rearrangements has recently been illustrated with synthetic allopolyploids among Arabidopsis genomes (Madlung et al., 2005). In cotton, Grover et al. (2004) compared a 100 kb region in two distinct genomes (A and D) that have evolved in isolation for 5–10 million years and together for around 1 million years after being reunited by allopolyploidy, and the results showed high sequence conservation in both genic and intergenic regions. The two genomes of cotton studied display relatively small genomes of 980 and 1860 Mbp/1C , which is comparable to the size of the monoploid genome of sugarcane species.

Araujo et al. (2005), comparing wheat, maize and sugarcane, suggested that changes in TE copy number do not occur immediately after polyploidization but progressively after transcriptional activation. Indeed, polyploid wheats present more TE copies than the sum of their parental species genomes, while synthetic Triticeae amphidiploids present no significant change (Li et al., 2004). In sugarcane, Araujo et al. (2005) observed no significant difference in TE copy number between S. officinarum, S. spontaneum and hybrid cultivars, but noted that 2.3% of their transcriptome is composed of TEs (Vettore et al., 2003), as compared to 2.4% for the hexaploid bread wheat (Li et al., 2004) and 0.014% for maize (Meyers et al., 2001).

Note also that the sugarcane region studied here appears to correspond to a gene-rich region (one gene/8–10 kb ), like the cotton region studied by Grover et al. (2004). The homology level of intergenic sequences may thus be particularly high as this region is poor in TEs and probably little affected by their movements.

The very high global homology observed between the two homoeologous BAC clones contrasts with earlier observations that S. officinarum and S. spontaneum chromosomes can clearly be labelled differentially by genomic in situ hybridization using total genomic DNA (D’Hont, 2005; D’Hont et al., 1996), as can the A and D genomes of cotton Gossypium hirsutum (D’Hont, unpublished data). The main difference between the two BAC clones concerned the TE content. This is in agreement with the general idea that differential labelling relies mainly on species-specific repeated sequences. The qualitative and/or quantitative differences in TE content, which are probably more abundant outside gene islands, may be the main factor responsible for this differential labelling. This could be further tested by FISH.

High sequence conservation between sugarcane and sorghum

At the gene level, the sequence comparison between sugarcane and sorghum revealed high colinearity, with the exception of genes 8 and 9 that are present only in sorghum, and high gene structure conservation and nucleotide similarity (close to 95%). Gene 8 has all the sequence features of a functional gene, whereas gene 9 seems partial. In addition, besides a few large retrotransposable elements present in sugarcane and absent in sorghum, the comparison uncovered a striking homology in most non-coding parts of the genome. Such sequence conservation, despite the divergence of sugarcane and sorghum 8–9 million years ago, may be related to the genomic stability of this gene-rich region. In addition, the monoploid genome sizes are similar in sorghum (730 Mb) and sugarcane (750 Mb for S. spontaneum and 930 Mb for S. officinarum; D’Hont and Glaszmann, 2001), suggesting that no major TE burst has occurred since they diverged, thus reducing the opportunity for important genomic remodeling.

At the macro-level, through comparative genetic mapping (Asnaghi et al., 2000; Dufour et al., 1997; Glaszmann et al., 1997; Grivet et al., 1994; Guimaraes et al., 1997; Ming et al., 1998), sorghum was found to be the studied species with the simplest synteny relationships with sugarcane. Our microcolinearity results confirm this finding and suggest that sorghum is a choice model species for sugarcane. The availability of its entire sequence in the near future will be of great interest for sugarcane genomics, in particular for map-based gene isolation.

History of rearrangements in the orthologous region in rice, sugarcane, sorghum and maize

The data presented here also enabled us to add sugarcane to the analysis of the complex history of rearrangements in the orthologous region based on rice, maize and sorghum described by Ilic et al. (2003). Maize, sorghum and sugarcane belong to the Andropogoneae tribe. Maize and sorghum diverged from a common ancestor around 20 million years ago (Gaut and Doebley, 1997) and sugarcane and sorghum 8–9 million years ago. Rice represents the model for grass species and diverged from a common Andropogoneae ancestor around 50–70 million years ago (Wolfe et al., 1989).

As between rice, sorghum and maize, comparisons involving sugarcane revealed a general conservation of gene content and order, with, however, a few rearrangements. The addition of sugarcane to the global comparison enabled us to specify the chronology and lineage of some of the observed rearrangements. The absence of genes 8 and 9 in sugarcane as in rice and maize and as opposed to sorghum indicates that their insertion in the sorghum region dates back to <8 to 9 million years ago, after sorghum and sugarcane diverged from their common ancestor. The presence of two additional adjacent genes (Adh1 and gene 3.5) in sugarcane as in sorghum and maize as compared to rice supports the assumption of Ilic et al. (2003) that the corresponding translocation occurred after the divergence between the common Andropogoneae ancestor and rice. The tandem duplication (gene 7.5b) observed in rice and absent in sorghum and maize was also found to be absent in sugarcane, thus confirming its rice lineage specificity. Ilic et al. (2003) estimated that it occurred 44 million years ago.

Without genes 7.5b, 8 and 9, the sugarcane haplotypes align perfectly with the combination of both maize homoeologous regions, and may thus best represent the ancestral Andropogoneae haplotype. Multiple sequence comparisons of orthologous haplotypes among species are essential to provide insight into modes of local evolution over large periods of time and to highlight different global evolutionary dynamics between species. The case of the region studied here enabled Ilic et al. (2003) to provide a remarkable illustration of the instability of the maize genome, attributed to its redundancy as a paleopolyploid. Our data suggest that sugarcane has not undergone any major reshaping of its genome despite the high polyploidy level. Other genomic regions will have to be analyzed to confirm this result.

Experimental procedures

BAC clones, EST and cDNA resources

Fourteen genes were initially annotated on the sequence of sorghum BAC clone 110K05 (GenBank accession number AF124045; Tikhonov et al., 1999). Then Ilic et al. (2003) found two additional genes annotated as gene 3.5 and 7.5, with the former being partial, and they re-annotated two genes as a single one (4/5).

Sugarcane BAC clones were selected from the R570 sugarcane cultivar library (Tomkins et al., 1999). The BAC library contains 103 296 clones and covers the 2C sugarcane genome approximately 1.3-fold, i.e. covers the monoploid (i.e. basic) genome about 14-fold. For fingerprinting and RFLP analysis, 1 μg of BAC DNA was restricted with HindIII. Sugarcane EST and cDNA were from the SUCEST database, which includes 237 954 ESTs assembled in 43 141 putative transcripts (Vettore et al., 2003; http://sucest.lbi.ic.unicamp.br).

Sequence comparison of the Adh1 gene

A region located between exons 4 and 8 of the Adh1 gene was sequenced for nine sugarcane accessions (four S. spontaneum: Mandalay, Glagah, SES14, Coimbatore; five S. officinarum: Kaludai Boothan, BNS3066, Badila, Crystalina, Mauritius Guinghan) and eight of the selected BAC clones. This region covers 1.2 kb in maize and 1 kb in sorghum. Primers were designed in exons 4 and 8 based on the alignment of the Adh1 gene from maize (AF123535), sorghum (BAC 110K05 sequence) and sugarcane ESTs. As expected, a fragment of around 1 kb was obtained by PCR in all cases. For BAC clones, the amplified fragment was directly sequenced and good quality sequences were obtained except for BAC 13, which was excluded from the analysis.

For the sugarcane accessions, the PCR products obtained represent a mixture of haplotypes due to the polyploid nature of the genome. Fragments were thus cloned from the PCR product, and 12 of them were sequenced for each accession. PHRED software (Ewing and Green, 1998) was used to assess the quality of amplified sequences. For each accession, data corresponding to the same haplotype were assembled into a consensus sequence using PHRAP (http://www.phrap.org) and visualized with CONSED (Gordon et al., 1998). A good quality 711 bp sequence common to all BAC clones and plant haplotypes was used to construct a neighbor-joining tree using the MEGA2 (Molecular Evolutionary Genetic Analysis) software package (Kumar et al., 2001; http://www.megasoftware.net). Sorghum and maize Adh1 sequences were included as out-groups.

BAC sequencing

DNA of BAC clone 51L01 and 265O22 was extracted using the Qiagen Large Construct Kit (http://www.qiagen.com/). Aliquots of 10 μg DNA were sonicated, end-repaired with T4 polymerase and Klenow, and DNAs ranging from 1 to 3 kb were cloned into the puc18 vector using the Ready-to-go kit (Amersham; http://www5.amershambiosciences.com/). Clones were sequenced and assembled from 2304 and 2496 shotgun sequences for BAC 51L01 and 265O22, representing an average redundancy of 14 and 12, respectively. PHRAP assembly of the shotgun clones alone did not result in a single contiguous sequence due to the presence of repetitive DNA elements or structures that were problematic for sequencing [stretches of poly(G)]. The gaps were small (< 1000 bp) in all cases. We were able to close the gaps by either designing primers bordering the gap or re-sequencing clones from both sides of the gaps. The sequences of BACs 51L01 (97 616 bp) and 265O22 (126 104 bp) were submitted to GenBank under accession numbers AM403006 and AM403007, respectively.

Sequence annotation and analysis

For each BAC clone, CDS (coding DNA sequences) were predicted by comparing the BAC sequence to the SUCEST database and dbEST and non-redundant databases from NCBI using blastn, blastx and tblastx algorithms (Altschul et al., 1997). Three gene-finding programs, i.e. GeneMark.hmm (Lukashin and Borodovsky, 1998), Genescan trained for maize (Burge and Karlin, 1997) and FGENESH (Solovyev and Salamov, 1997), were also used for resolving doubtful cases concerning the exon/intron limits.

Repetitive elements were determined by homology searches to known transposable elements. Repeatmasker (http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker) and Censor programs (http://www.girinst.org/censor/index.php) were used as well as a combination of blast and Wu-BLAST2 searches of the GenBank/EMBL non-redundant database and TIGR rice and Gramineae repeat databases (http://tigrblast.tigr.org/euk-blast/index.cgi?project=plant.repeats).

Whole BAC sequence comparisons were carried out using the PipMaker program (http://pipmaker.bx.psu.edu/cgi-bin/pipmaker?basic). Detailed comparisons were performed using the DOTMATCHER, WATER, MATCHER, PALINDROME and EINVERTED applications of the EMBOSS package (http://bioinfo.pbi.nrc.ca:8090/EMBOSS/). BAC clones were aligned to each other using LAGAN (Brudno et al., 2003) with the default parameters.

Nucleotide substitution rate

Genes were aligned using clustalx. Synonymous and non-synonymous substitution rates were estimated as described by Gaut et al. (1996) using the distance measures described by Nei and Gojobori (1986) and the Juke–Cantor correction as implemented in MEGA2. Divergence time (T) was estimated for the Adh1 gene using k = Ks/2 T, where k is the absolute rate of synonymous substitution per site per year, and Ks is the estimated number of synonymous substitutions per site between homologous sequences (Gaut et al., 1996; Ramakrishna et al., 2002).

Acknowledgements

NJ was supported by a grant from the Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP). We thank Magalie Lescot and Brigitte Courtois for critical reading of the manuscript.

Accession numbers: GenBank accession numbers AM403006 (BAC 51L01) and AM403007 (BAC 265O22).

Ancillary