Chloroplast genome sequence confirms distinctness of Australian and Asian wild rice


Daniel L. E. Waters, Southern Cross Plant Science, Southern Cross University, Lismore, NSW, Australia. Tel: 02-6620-3443; Fax: 02-6622-2080; E-mail: partly by Grant-in-aid B (Oversea project, No. 21405016) Japan, and the Australian Research Council.


Cultivated rice (Oryza sativa) is an AA genome Oryza species that was most likely domesticated from wild populations of O. rufipogon in Asia. O. rufipogon and O. meridionalis are the only AA genome species found within Australia and occur as widespread populations across northern Australia. The chloroplast genome sequence of O. rufipogon from Asia and Australia and O. meridionalis and O. australiensis (an Australian member of the genus very distant from O. sativa) was obtained by massively parallel sequencing and compared with the chloroplast genome sequence of domesticated O. sativa. Oryza australiensis differed in more than 850 sites single nucleotide polymorphism or indel from each of the other samples. The other wild rice species had only around 100 differences relative to cultivated rice. The chloroplast genomes of Australian O. rufipogon and O. meridionalis were closely related with only 32 differences. The Asian O. rufipogon chloroplast genome (with only 68 differences) was closer to O. sativa than the Australian taxa (both with more than 100 differences). The chloroplast sequences emphasize the genetic distinctness of the Australian populations and their potential as a source of novel rice germplasm. The Australian O. rufipogon may be a perennial form of O. meridionalis.


Oryza sativa, the dominant cultivated rice species, shares the genus Oryza with approximately 22 other species (Khush 1997). Genetically, the primary factor that differentiates the species within this genus is genome organization and ploidy as determined by hybrid chromosome pairing behavior (Lu et al. 1997). On the basis of this evidence, six diploid genomes, AA, BB, CC, EE, FF, and GG, and four allotetraploid genomes BBCC, CCDD, HHKK, and HHJJ have been identified (Lu et al. 2009). Oryza sativa belongs to the AA genome group along with the only other cultivated species O. glaberrima, and the wild rice species O. rufipogon, O. nivara, O. longistaminata, O. glumaepatula, O. barthii, and O. meridonalis (Ge et al. 1999).

Within O. sativa there are two subspecies, indica and japonica. These subspecies are the product of either one (Molina et al. 2011) or two independent domestication events (Sweeney and McCouch 2007). Regardless of whether there have been one or two independent domestication events, the process of domestication and more recently selection during plant breeding has resulted in cultivated O. sativa resting on a relatively narrow genetic base. Because of this, AA genome wild rice species have been a valuable source of new genes and alleles for resistance to a range of pests and diseases (Brar and Khush 1997). However, Asian wild rice is in close contact with cultivated rice and there is constant gene flow between the cultivated and wild populations that contaminates the Asian wild rice gene pool with cultivated alleles, an example of which is the shattering gene being found in wild rice (Sweeney and McCouch 2007). In contrast, with the exception of failed attempts to establish commercial rice growing in the Northern Territory and Western Australia in the 1950s, the Burdekin irrigation area in the early 1990s (Anonymous, 2005) and the more recent crop of 650 ha in Western Australia, Australian wild rice has been largely genetically isolated from cultivated rice. Because of this, the Australian wild rice gene pool has not been contaminated with cultivated Oryza alleles to the same extent as the Asian wild rice gene pool making Australian wild rice a potential source of valuable alleles for rice breeding.

The AA genome Oryza species endemic to northern Australia are O. meridionalis and O. rufipogon (Figs. 1 and 2). These species are primarily distinguished by anther size, Australian O. rufipogon does not share the small anthers of O. meridionalis and life history, O. meridionalis is an annual species while O. rufipogon is a perennial species. These species grow in close proximity to each other, O. rufipogon grows in transient pools and ponds where some water persists during the dry season while O. meridionalis grows on the periphery of these same bodies of water surviving the dry season as seed. This is analogous to the relationship between Asian O. rufipogon and O. nivara. Oryza nivara has been variously described as an annual species that grows in swamps, which dry out during the dry season, unlike O. rufipogon that grows in deep permanent water, or as an ecotype of O. rufipogon with which it shows a continuous distribution in location and morphology (Naredo et al. 1997).

Figure 1.

Oryza rufipogon growing in its native habitat near Mareeba, north Queensland, Australia. Anther length is the primary morphological feature used for discrimination between Australian O. rufipogon (>3–7.4 mm) and O. meridonalis (1.5–2.5 mm).

Figure 2.

Estimated distribution of Australian Oryza species: O. rufipogon in blue, O. meridionalis in green, and O. australiensis in red. Based on occurrence records provided by Australia's Virtual Herbarium, accessed through the Atlas of Living Australia website (

Despite many different approaches, the taxonomy of the AA genome Oryza remains a work in progress. The relationship between O. rufipogon, O. nivara, and O. meridionalis is unclear. Analysis of chromosome pairing has confirmed that O. meridionalis and O. rufipogon are AA genome species (Lu et al. 1997). In common with most early molecular taxonomic treatments, however, O. rufipogon samples used by Lu et al. (1997) were sourced from Asia only and therefore did not provide evidence of the relationship between Australian O. rufipogon and O. meridionalis. Experimental crosses between Australian O. rufipogon and O. meridionalis produced interspecific hybrids although fertility and seed set of the hybrids was low (Naredo et al. 1997). Restriction fragment length polymorphism (RFLP) and Short interspersed elements (SINE) data derived from sample sets including both Australian and Asian O. rufipogon and O. meridionalis suggest these species are different (Wang et al. 1992; Xu et al. 2005), and that Australian O. rufipogon is more closely related to Asian O. rufipogon than it is to O. meridionalis (Wang et al. 1992).

Phylogenies derived from nuclear data can be problematic because recombination may confound phylogenetic resolution and lead to the construction of inconsistent trees (Poke et al. 2006; Takahashi et al. 2008). Plastid sequence data, in contrast, are haploid and offer the advantages of high copy number without recombination. As the number of informative characters increases, so does phylogenetic resolution. Next Generation (or massively parallel) sequencing can cost-effectively sample large numbers of informative characters and hence dramatically increase phylogenetic resolution. Whole chloroplast genome sequencing for phylogenetic analysis without prior isolation or amplification is now relatively straightforward for plant species (Nock et al. 2011). This approach captures a large quantity of chloroplast sequence data, and whole plastome sequences can be used to resolve phylogenetic relationships among even closely related species (e.g., Parks et al. 2009; Zhang et al. 2011). We have applied this approach to the analysis of the relationship between Australian and Asian wild AA genome wild rice populations and found the Australian wild rice species to be genetically distinct from closely related Asian AA genome wild rice.

Material and Methods

Plant materials

The Asian O. rufipogon strain was collected from GPS location N9 59.376 E105 39.883, Can Tho, Vietnam (Australian Plant DNA Bank Number–AC11–1008369). Australian O. rufipogon was sourced from the Australian Tropical Crops and Forages Collection, Biloela (Ref Number-AusTRCF 309313; Australian Plant DNA Bank Number–AC01–1002323; originally collected 23 March, 1994, 0.9K west Gilbert River Bridge in Gulf Development Rd, latitude =–18.206, longitude = 142.865). Oryza meridionalis was sourced from the Australian Tropical Crops and Forages Collection, Biloela (Ref Number-AusTRCF 300118_B; originally collected Northern Territory, Australia). Anther length was the primary morphological feature used for discrimination between Australian O. rufipogon (>3–7.4 mm length) and O. meridonalis (1.5–2.5 mm length).

DNA extraction and sequence analysis

DNA was extracted from leaf tissue of four individuals plants from each accession using a Qiagen DNeasy Plant kit (Qiagen, Hilden, Germany). Approximately 3 µg of total DNA from each sample was prepared for sequencing according to Illumina genomic, paired-end sample preparation protocol (Part # 1005063 Rev. A). DNA was sheared using an adaptive focused acoustics method on a Covaris S2 device with the following settings: duty cycle 10%; intensity 5; cycles per burst 200 for 180 sec at 6°C.

Ligation products were purified by agarose gel electrophoresis (2% agarose, 120 V for 120 min). Fragments of predominantly 500 base pairs (bp) were excised from the gel and the products isolated with a QIAquick Gel Extraction kit (Qiagen, Hilden, Germany)without heating. PCR products were further purified with a QIAquick PCR Purification kit (Qiagen, Hilden, Germany) and quantified using a DNA 1000 chip on an Agilent BioAnalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA). Approximately 4 pmol per individual and 3 pmol of the PhiX control lane were sequenced for 76 × 2 cycles on an Illumina Genome Analyser (GAIIx) (Illumina, San Diego, CA, USA.). Base calling was performed with Illumina software Pipeline 1.5.

Paired-end sequence reads were trimmed of low quality data with a quality score limit of 0.01, and adaptor sequence in CLC Genomics Workbench 4.0.3 ( Reads of less than 30 bp in length were discarded. Trimmed short read sequences were assembled by read mapping to a cultivated rice (O. sativa spp. japonica var. Nipponbare) chloroplast genome reference sequence (Genbank accession GU592207). Read mapping was undertaken in CLC Genomics Workbench with the following long-read parameters: global alignment, length fraction 0.9, similarity index 0.9, mismatch cost 3, deletion, and insertion costs 3. Match mode was random to allow for assembly of both inverted repeat regions and repetitive elements. In order to avoid contribution of less abundant nuclear and mitochondrial reads to the final consensus sequence, conflict resolution mode was vote majority.

Consensus sequences for O. rufipogon and O. meridionalis were exported to Geneious 5.3 ( and aligned with chloroplast genome sequences from Genbank (Fig. 1) using Mauve (Darling et al., 2004). Genbank accessions included in the alignment were O. sativa japonica GU592207, O. nivara AP006728, O. sativa indica AY522329, and O. australiensis GU592209.

Appropriate nucleotide substitution models were selected using Modeltest and MrModeltest (Posada and Crandall 1998). Aligned data were analyzed under maximum parsimony (MP) and maximum likelihood (ML) criteria using the TVM + I model (G = 0.92) in PAUP* ( with gaps were treated as missing data. Heuristic searches were conducted with 200 random addition replicates and tree bisection-reconnection (TBR) branch swapping. Oryza australiensis was the outgroup in rooted trees with 2000 bootstrap replicates to evaluate nodal support. Bayesian phylogenetic analysis was conducted using MrBayes 3.1 (Ronquist and Huelsenbeck, 2004) using the GTR + I model. Two independent runs of 1 × 106 Monte Carlo Markov Chains (MCMC) were performed following burn in of 1 × 105 MCMC, each starting with a different random tree. Nodal support for Bayesian consensus trees was evaluated by posterior probability distribution. Consensus sequences were annotated using Dual Organellar Genome Annotator (DOGMA) (Wyman et al. 2004) and manually adjusted as needed before submission to Genbank.


After trimming, 96.6%, 97.78%, and 96.43% of paired-end reads with average lengths of 72.1 bp, 71.5 bp, and 71.7 bp were retained for O. rufipogon (Aust), O. rufipogon (Asian), and O. meridionalis, respectively (Table 1). Consensus sequence lengths generated from reference mapping of Illumina sequence reads were 134,558 base pairs (bp) for O. meridionalis, 134,544 bp for O. rufipogon (Asian), and 134,557 bp for O. rufipogon (Aust).

Table 1.  Summary statistics for wild rice chloroplast genome assembly. Sequence reads were mapped to a cultivated rice O. sativa spp. Japonica var. Nipponbare reference (Genbank accession GU592207).
SpeciesSourcePaired-end readsAverage read lengthReads aligning to chloroplast genome (%)Median coverageConsensus sequence (bp)Accession number
O. rufipogon Australia67,046,27572.012.984870134,557JN005833
O. rufipogon Vietnam62,321,35471.52.27781134,544JN005832
O. meridionalis Australia46,409,18171.74.811231134,558JN005831

The alignment of seven chloroplast genomes was 134,701 bp in length. One of the inverted repeats (IR) was excluded from the alignment prior to phylogenetic analysis. The modified alignment was 113,897 bp in length. Of the 978 variable characters, 90 were parsimony informative. Optimal phylogenetic trees obtained by MP (968 steps; consistency index = 1.00), ML (-lnL = 162,141.42), and Bayesian analysis were concordant. The relationship between O. sativa japonica and O. rufipogon (Asia) was unresolved. There was strong support (MP, ML bootstrap ≥ 99.3%; Bayesian posterior probability = 1.00) for an Australian A genome clade containing O. rufipogon (Aust) and O. meridionalis, and for an Asian A genome clade containing O. rufipogon (Asia), O. nivara, and Asian cultivated rice O. sativa ssp. japonica and O. sativa ssp. indica (Fig. 3). O. rufipogon and O. meridionalis chloroplast genomes from Australia differed by only 32 positions. The monophyly of Australian A genome wild rice was supported by 38 shared derived characters or synapomorphic SNPs (Table 2). Homoplasy in the dataset was not detected (homoplasy index = 0.00) and there were no derived characters shared between O. rufipogon chloroplast genome sequences from Asia and Australia. The monophyly of wild and cultivated Asian rice was supported by 16 synapomorphic SNPs.

Figure 3.

Bayesian phylogenetic tree illustrating the relationships among Oryza chloroplast genome sequences. The same topology was obtained by Bayesian analysis, maximum likelihood (ML), and maximum parsimony (MP). Nodal support is shown above branches as Bayesian (posterior probability)/ML/MP (percent bootstrap). Scale is substitutions per site. Oryza australiensis is the outgroup. Genbank accession numbers follow previously published sequences.

Table 2.  Synapomorphic SNPs (in bold italics) for Australian and Asian AA genome clades of wild and cultivated rice included in this study. Sequence position is according to the reference sequence, O. sativa spp. japonica var. Nipponbare (GU592207).
PositionAsian CladeAustralian CladeOutgroupGene
  1. *Intergenic region.

  2. LSC = large single copy; SSC = small single copy; IR = inverted repeats.

448A G ApsbA
817A G GpsbA
1522G A G*
2221 T GGmatK
3068 T G AmatK
3578 A TT*
4547K A T*
6247C T C*
6673A G A*
8144C T C*
11735C T C*
11758C T C*
13768 C AA*
13983A C A*
15330 C GG*
15634 C GG*
16060A C A*
16317T C T*
16842T A T*
17267 G A T*
17351T C T*
18173C G C*
18863A C A*
22455A C ArpoC1
24758 G TT*
25016A G ArpoC2
27965A C ArpoC2
29901C T Crps2
31858A G A*
32596C T C*
33680G A GatpF
43643 A GG*
50078 T CC*
50146 T CC*
55344A G ArbcL
60191C T C*
61211G T G*
61382G T GpsbJ
63315 T AA*
64817T A T*
65577 T CC*
90581T G T*
104189T G T*
105076G A G*
105922A C A*
106562T G TndhD
106705T G TndhD
108465 T G G*
110607 G AA*
110844A G A*
124575A C A*


Previous analyses suggest the perennial Australian wild rice O. rufipogon is more closely related to Asian O. rufipogon than it is to the annual Australian wild rice O. meridionalis (Wang et al. 1992; Xu et al. 2005). Here we show that the plastome of Australian O. rufipogon is more closely related to O. meridionalis than it is to Asian O. rufipogon, demonstrating Australian AA genome wild rices are distinct from Asian wild and cultivated rice. In addition to this, SNP haplotyping of all individuals of O. meridionalis so far examined sourced from both Australia directly and accessions held by the National Bio-resources project (NBRP) of Japan carry the O. meridionalis type of chloroplast DNA reported here (data not shown).

This study utilized whole chloroplast sequence data that brings particular advantages to the analysis. Plastid genomes do not undergo recombination and are present in high copy number relative to nuclear loci (Takahashi et al. 2008). This attribute has been exploited for many studies including plant barcoding (CBOL 2009). However, until recently, a relatively small number of nucleotides have been routinely sampled for chloroplast based plant identification. For example, approximately 1450 base pairs from rbcL and matK were used as the foundation for a DNA barcode for land plants. Although useful, this approach only allowed discrimination of 72% species in a sample set of 907 species. The complete chloroplast genome has two orders of magnitude more information than the conventional rbcL and matK plant barcode loci and by accessing a greater number of characters, greater phylogenetic resolving power is possible.

Chloroplast DNA is maternally inherited in most angiosperms (Hagemann 2010). Interspecific hybridization can lead to “chloroplast capture” whereby the plastome of one species introgresses into another, and this has been used to explain inconsistencies between chloroplast and nuclear gene trees. Historical or more recent hybridization between sympatric populations of O. meridionalis and O. rufipogon in Australia provides an alternative explanation for the observed results.

During domestication, Asian cultivated rice went through a significant bottleneck and brought with it only 10–20% of the genetic diversity found within its progenitor species, O. rufipogon (Kovach and McCouch 2008). The genetic diversity within wild rice has been exploited to enhance cultivated rice, primarily by improving yield and agronomic traits (Kovach and McCouch 2008). In order to most effectively exploit the genetic diversity within wild rice, the hybrid offspring needs to be fertile. Oryza sativa is an AA genome species and other AA genome wild species are the most accessible in terms of generating fertile hybrid offspring, including O. rufipogon, O. nivara, O. barthii, O. longistaminata, O. glumaepatula, and O. meridionalis. Crosses between O. meridionalis, Australian O. rufipogon, and other AA genome Oryza species generate fertile hybrids and so the alleles within these species are available to O. sativa breeding programs following conventional crossing regimes (Naredo et al. 1997). Because Australian AA genome wild rice has been largely isolated from O. sativa during the course of O. sativa domestication and cultivation, the Australian wild rice is a valuable source of novel alleles for rice improvement.

Oryza nivara is variously described as an annual ecotype of Asian O. rufipogon or as a separate species (Zheng and Ge 2010). The relationship between Australian O. rufipogon and O. meridonalis is somewhat similar with O. meridonalis until recently being described as an annual form of Australian O. rufipogon (Wang et al. 1992). In both cases the key differentiating feature is the life history of these species or ecotypes. Our results suggest the divergence of the Australian and Asian AA genome rice predates the divergence of O. nivara from Asian O. rufipogon and Australian O. rufipogon from O. meridonalis. If so, the appearance of the annual and perennial habits in each of these species and or ecotypes in Australia and Asia were separate events. Genetic and genomic analysis of Asian and Australian O. rufipogon, O. nivara, and O. meridonalis may allow identification of loci or gene networks that differentiate between the perennial and annual species or ecotypes in each of these cases.


We would like to thank Dr Lang of the Cuu Long Delta Rice Research Institute, Vietnam, for kindly providing the Asian O. rufipogon used in this experiment based on a collaboration supported by Grant-in-aid B (Oversea project, No. 21405016), Japan, and the Australian Research Council for supporting this research. The authors acknowledge the assistance of Sally Norton from the Australian Tropical Crops and Forages Collection for supply of seed samples. Illumina sequence data was produced at Southern Cross Plant Genomics, Southern Cross University by Mark Edwards, Stirling Bowen and Asuka Kawamata.