New grass phylogeny resolves deep evolutionary relationships and discovers C4 origins


  • Grass Phylogeny Working Group II

    Search for more papers by this author
    • Sandra Aliscioni1, Hester L. Bell2, Guillaume Besnard3,4, Pascal-Antoine Christin5, J. Travis Columbus2, Melvin R. Duvall6, Erika J. Edwards5, Liliana Giussani7, Kristen Hasenstab-Lehman2, Khidir W. Hilu8, Trevor R. Hodkinson9, Amanda L. Ingram10, Elizabeth A. Kellogg11, Saeideh Mashayekhi2, Osvaldo Morrone7, Colin P. Osborne12, Nicolas Salamin13,14, Hanno Schaefer15, Elizabeth Spriggs5, Stephen A. Smith5,16 and Fernando Zuloaga7
      1Cátedra de Botánica Agrícola, Facultad de Agronomía, Universidad de Buenos Aires, Av. San Martín 4453, C1417DSE, Buenos Aires, Argentina; 2Rancho Santa Ana Botanic Garden and Claremont Graduate University, 1500 North College Avenue, Claremont, CA 91711-3157, USA; 3CNRS, UPS, ENFA, Laboratoire Evolution & Diversité Biologique, UMR 5174, 31062 Toulouse 4, France; 4Imperial College London, Silwood Park Campus, Buckhurst Road, Ascot SL5 7PY, UK; 5Department of Ecology and Evolutionary Biology, Brown University, Box G-W, Providence, RI 02912, USA; 6Department of Biological Sciences, Northern Illinois University, 1425 W Lincoln Hwy, DeKalb, IL 60115-2861, USA; 7Instituto de Botánica Darwinion, Labardén 200, Casilla de Correo 22, B1642HYD, San Isidro, Buenos Aires, Argentina; 8Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA; 9Department of Botany, School of Natural Sciences, University of Dublin, Trinity College, Dublin D2, Ireland; 10Department of Biology, Wabash College, PO Box 352, Crawfordsville, IN 47933, USA; 11Department of Biology, University of Missouri-St Louis, St Louis, MO 63121, USA; 12Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, UK; 13Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland; 14Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland; 15Department of Organismic and Evolutionary Biology, Harvard University, 22 Divinity Avenue, Cambridge, MA 02138, USA;16Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.

Author for correspondence:
Erika J. Edwards
Tel: +1 401 863 2081


  • Grasses rank among the world’s most ecologically and economically important plants. Repeated evolution of the C4 syndrome has made photosynthesis highly efficient in many grasses, inspiring intensive efforts to engineer the pathway into C3 crops. However, comparative biology has been of limited use to this endeavor because of uncertainty in the number and phylogenetic placement of C4 origins.
  • We built the most comprehensive and robust molecular phylogeny for grasses to date, expanding sampling efforts of a previous working group from 62 to 531 taxa, emphasizing the C4-rich PACMAD (Panicoideae, Arundinoideae, Chloridoideae, Micrairoideae, Aristidoideae and Danthonioideae) clade. Our final matrix comprises c. 5700 bp and is > 93% complete.
  • For the first time, we present strong support for relationships among all the major grass lineages. Several new C4 lineages are identified, and previously inferred origins confirmed. C3/C4 evolutionary transitions have been highly asymmetrical, with 22–24 inferred origins of the C4 pathway and only one potential reversal.
  • Our backbone tree clarifies major outstanding systematic questions and highlights C3 and C4 sister taxa for comparative studies. Two lineages have emerged as hotbeds of C4 evolution. Future work in these lineages will be instrumental in understanding the evolution of this complex trait.


The grass family (Poaceae) includes > 11 000 recognized species with a cosmopolitan distribution and occupies an enormous range of habitats (Clayton & Renvoize, 1986; Osborne et al., 2011). Grasses also include the three most important crops in the world (wheat (Triticum aestivum), maize (Zea mays) and rice (Oryza sativa)) and several productive species with great biofuel potential (Byrt et al., 2011). Many grass lineages have evolved C4 photosynthesis, a complex and coordinated set of anatomical and biochemical modifications that act to concentrate CO2 at the site of fixation by Rubisco during the Calvin cycle (Sage, 2004; Edwards et al., 2010). The direct effect of the C4 pathway is to reduce photorespiration and saturate photosynthesis with CO2, which has allowed C4 grasses to colonize open and drier habitats in tropical and subtropical regions (Osborne & Freckleton, 2009; Edwards & Smith, 2010). Extant C4 grass diversity is upwards of 4500 species, and C4 grasses dominate many important ecosystems and contribute 20–25% of terrestrial primary productivity (Still et al., 2003).

Despite the enormous economic and ecological importance of grasses, the evolutionary history of the group is still only partially understood. Phylogenies have accumulated over the past 20 yr, but most studies focused on specific groups below the subfamily level. The few family-wide phylogenetic studies (e.g. Clark et al., 1995; GPWG, 2001; Duvall et al., 2007) identified three species-poor lineages that are successively sister to all other grasses (Anomochlooideae, Pharoideae and Puelioideae) and placed the bulk of grass diversity in two main clades, known by their acronyms as BEP (Bambusoideae, Ehrhartoideae (formerly Oryzoideae) and Pooideae) and PACMAD (Panicoideae, Arundinoideae, Chloridoideae, Micrairoideae, Aristidoideae and Danthonioideae). More recently, the use of morphological traits (Bouchenak-Khelladi et al., 2008) as well as supermatrix approaches (Edwards & Smith, 2010) has allowed extensive taxonomic coverage. These strategies, however, have not resolved relationships among the subfamilies in either the BEP or the PACMAD clade, mainly because data gathering approaches were not optimal and led to large amounts of missing data. A concerted effort was thus needed to produce a molecular phylogenetic study of the family that combined dense taxon sampling with a large and sufficiently complete molecular data set.

All C4 grasses belong to the PACMAD group, but their polyphyly has been long recognized (Kellogg, 2000). Variations in the genetic basis and anatomical and biochemical details of the C4 pathway among phylogenetic groups strongly support the hypothesis of multiple C4 origins from C3 ancestors (Sinha & Kellogg, 1996; Christin et al., 2010). However, the exact number of C4 lineages has been constantly increasing with the addition of more taxa, ranging from the early estimates of four origins (Kellogg, 2000) to 17–20 in more recent studies (Christin et al., 2008; Edwards & Smith, 2010). Many genera of tropical grasses have only recently been analysed, preventing a precise evaluation of the number of C4 groups and their relationships to C3 grasses.

Here, we built a nearly complete data matrix of three chloroplast markers commonly used in grass phylogenetics to obtain a densely sampled and well-supported phylogeny for the grass family. Our first aim was to obtain a solid phylogenetic framework to study evolution in grasses. This new backbone tree will also provide the starting point for future work towards a complete, species-level phylogeny for the grasses. Our second aim was to improve the identification of photosynthetic transitions by drastically increasing taxon sampling in clades containing multiple C3 and C4 taxa. This phylogenetic information will be crucial for comparative and multidisciplinary studies addressing C4 ecology, evolution, and genetics.

Materials and Methods

Strategies for taxon sampling

We selected three genetic markers from the chloroplast genome: the coding genes rbcL (ribulose 1,5-bisphosphate carboxylase/oxygenase large subunit) and ndhF (NADH dehydrogenase subunit F), and the region encompassing the matK (maturase K) coding gene and the trnK (tRNA-Lys) introns (trnK/matK). These markers have been widely used in grass phylogenetics (e.g. Hilu et al., 1999; Hilu & Alice, 2001; Giussani et al., 2001; Christin et al., 2008) but not in concert. Our strategy was aimed at filling in the gaps to achieve a dense and relatively balanced sampling of species across the major grass lineages, particularly in the PACMAD clade. We screened GenBank for these markers and supplemented the available data by sequencing genomic DNA (gDNA) for selected taxa available from previous studies (Hilu et al., 1999; Hilu & Alice, 2001; Aliscioni et al., 2003; Christin et al., 2008; Vicentini et al., 2008; Taylor et al., 2011a; Morrone et al., 2011) or isolated from herbarium specimens.

The master data set includes 545 accessions representing 531 species and 311 genera, representing nearly two-thirds of currently recognized PACMAD genera. We focused sampling efforts in groups that were suspected to contain photosynthetic transitions, especially the Panicoideae, which encompasses the majority of putative C4 origins (Sinha & Kellogg, 1996; Giussani et al., 2001; Christin et al., 2009; Edwards & Smith, 2010). In this subfamily we included as many genera as possible. As most genera contain only one photosynthetic pathway, and assuming most genera to be monophyletic, this should make the count of photosynthetic transitions more accurate.

Genomic regions and DNA sequencing

For newly generated sequences, the three markers were PCR-amplified in 600–800-bp overlapping fragments with available primers (Taylor et al., 2011a). However, much of the genomic DNA extracted from herbarium specimens was of poor quality, and amplification of long fragments failed. We therefore developed a battery of primers to amplify the different markers in overlapping segments as short as 250 bp (Supporting Information Table S1).

PCRs were carried out in a total volume of 25 μl, including c. 40–100 ng of gDNA template, 5 μl of 5× GoTaq reaction buffer, 0.1 mM dNTPs, 0.1 μM of each primer, 1 mM of MgCl2, and 0.5 unit of Taq polymerase (GoTaq DNA Polymerase, Promega, Madison, WI, USA). The PCR mixtures were incubated in a thermocycler for 3 min at 94°C followed by 36 cycles consisting of 1 min at 94°C, 30 s at 48°C and 1 min at 72°C. This was followed by 10 min at 72°C. Successful amplifications were cleaned with an Exo-SAP-IT treatment (Affymetrix, Santa Clara, CA, USA) and sequenced using the Big Dye 3.1 Terminator Cycle Sequencing chemistry (Applied Biosystems, Foster City, CA, USA). All sequences have been deposited in GenBank (Table S2).

Sequence analyses

Sequences were initially aligned in ClustalW (Thompson et al., 1994) and adjusted manually to account for gaps, particularly in trnK introns, matK and ndhF, following the criteria of Kelchner (2000). Homology assessment was difficult in some regions of the trnK alignment, so those regions, comprising 403 aligned bp total, were excluded from the analyses.

Phylogenetic trees were obtained from the three markers simultaneously through Bayesian inference as implemented in MrBayes 3.1 (Ronquist & Huelsenbeck, 2003). The substitution model was set to a GTR + G + I, determined as the best-fit model through hierarchical likelihood ratio tests. To avoid over-parameterization and to reduce computational time, the data set was not partitioned among genes. Two different analyses, each of four parallel chains, were run for 11 717 000 generations, sampling a tree each 1000 generations after a burn-in period of 3 000 000. The convergence of the MCMC (Markov Chain Monte Carlo) run and the adequacy of the burn-in length were confirmed using the program tracer (Rambaut & Drummond, 2007). A majority rule consensus tree was computed on the 17 434 sampled trees. Phylogenetic trees were also inferred under a maximum likelihood criterion using the software RAxML (Stamatakis, 2006), under the GTRCAT substitution model. Support values for the branches were obtained from 1000 standard bootstrap pseudoreplicates.

Reconstruction of photosynthetic transitions

We typed all species in our tree as C3 or C4 according to different sources, summarized in GrassPortal (Osborne et al., 2011). Steinchisma hians is a C3-C4 intermediate and was included in the C4 category. We implemented various approaches to infer transitions between C3 and C4 photosynthesis, including stochastic mapping (Minin & Suchard, 2008) and ancestral state estimation using likelihood and a Markov model of discrete trait evolution (Pagel, 1999). All analyses were run using our Bayesian consensus tree (Supporting Information Fig. S1). To evaluate the influence of phylogenetic uncertainty on our analyses, we performed additional likelihood reconstruction and stochastic mapping analyses on an additional 1025 topologies sampled randomly from our post burn-in Bayesian posterior tree distribution. We estimated the number and placement of photosynthetic transitions on each tree, summarized the reconstructions across all trees, and identified two small but key regions where phylogenetic rearrangements affected our inferences of C4 evolution.

Finally, to determine the potential influence of differential diversification rates in C3 vs C4 lineages on our estimates of transition rates (e.g. Maddison, 2006), we also implemented a maximum likelihood approach that simultaneously estimates diversification rates and transition rates for binary characters (Maddison et al., 2007). We used only the PACMAD portion of our phylogeny for these analyses because it is far better sampled and because it contains all of the C4 taxa and suspected transitions between character states. An appropriate λ value was estimated using the cross validation procedure in r8s (Sanderson, 2003), and used to smooth our Bayesian consensus tree with a root age set at 1. We then multiplied the branch lengths by 100 to make the computational steps more feasible. We distributed as many PACMAD species as possible among the tips of our tree based on genus richness estimates from either the Grass Genera of the World or the Tropicos taxonomic database (Watson & Dallwitz, 1992; Tropicos, 2011). Where more than one member of a genus was present, we considered only a single representative, and where a genus was polyphyletic, several genera were combined and a composite richness value was assigned for the entire clade. The final phylogenetic data set contained 209 representatives, with c. 70% of all PACMAD species assigned as related to a particular included taxon. We used the adjustments provided by Fitzjohn et al. (2009) to incorporate this information as unresolved clades at the tips of our tree and ran analyses using the ‘diversitree’ package in R.

Results and Discussion

Phylogeny of the grasses

Consistent with most previous studies, our analyses recover a grade of three lineages – Anomochlooideae, Pharoideae, and Puelioideae – subtending the BEP and PACMAD clades (Fig. 1). Both the BEP and PACMAD clades are strongly supported in our analyses, as are each of the constituent subfamilies. The branching order of the six PACMAD subfamilies is resolved with strong support for the sister taxon relationship of Arundinoideae plus Micrairoideae (AruM clade), and the Centropodia-Chloridoideae and Danthonioideae (CD clade). There is less bootstrap support for the sister taxon status of these two clades, and their relationship to Aristidoideae and Panicoideae, though Bayesian support is strong (Figs 1, S1 and S2). In general, Bayesian and RAxML inferred topologies and support were quite congruent (Figs S1, S2).

Figure 1.

Relationships among the subfamilies of Poaceae, inferred from Bayesian analysis of three chloroplast markers. The BEP (Bambusoideae, Ehrhartoideae (formerly Oryzoideae) and Pooideae) clade is in black, and PACMAD (Panicoideae, Arundinoideae, Chloridoideae, Micrairoideae, Aristidoideae and Danthonioideae) is in grey. Numbers alongside subfamilial names represent the proportion of species we sampled relative to the total species richness of each clade. All named lineages received 100% maximum likelihood bootstrap support (BS) and Bayesian posterior probabilities (PPs) of 1.0, with the exception of Arundinoideae; nodes receiving lower support are noted, with PP values above the line and BS below the line. Locations of C4 origins are indicated by numbers, which correspond to Table 1.

The position of Aristidoideae in our analyses is consistent with a number of studies (Clark et al., 1995; Mathews & Sharrock, 1996; Soreng & Davis, 1998; Hilu et al., 1999; Hsiao et al., 1999; GPWG, 2001; Duvall et al., 2007; Sánchez-Ken et al., 2007; Christin et al., 2008; Vicentini et al., 2008). The CD clade has been recovered in several studies (Barker et al., 1995; Soreng & Davis, 1998; Hilu et al., 1999; Duvall et al., 2007; Sánchez-Ken et al., 2007; Christin et al., 2008; Bouchenak-Khelladi et al., 2008; Edwards & Smith, 2010; Peterson et al., 2011), whereas the AruM clade appears in fewer (Duvall et al., 2007; Sánchez-Ken et al., 2007; Christin et al., 2008; Edwards & Smith, 2010; Peterson et al., 2011; Teerawatananon et al., 2011). The CD + AruM clade was recovered elsewhere only by Duvall et al. (2007) and Peterson et al. (2011), and, in fact, relationships among PACMAD subfamilies in their analyses mirror ours. We used SH tests (Shimodaira & Hasegawa, 1999) to determine whether any other previously suggested topologies could be rejected by the data. We were unable to reject other possible relationships, a somewhat surprising result given the high bootstrap and posterior probability values for several clades.

Although not a particular focus of this study, we also resolved the BEP clade, finding strong support for the sister relationship of Bambusoideae and Pooideae. However, our analysis here did not include Streptogyna, whose uncertain position in the BEP clade has often confused relationships. Our final data matrix, Bayesian consensus tree, and maximum likelihood tree are available for download on TreeBASE (TreeBASE accession #11973;

Evolution of C4 photosynthesis in grasses

Our analyses strongly rejected a symmetric model of photosynthetic transitions in the grasses. Using molecular or smoothed ultrametric branch lengths, the optimal likelihood model implied that reverse transitions from C4 to C3 photosynthesis are exceedingly unlikely (molecular: QC3,C4 = 3.45, QC4,C3 = 8e−4, logeL = − 88.426; smoothed: QC3,C4 = 0.42, QC4,C3 = 9e−5, logeL = − 91.56, where Q is the instantaneous rate of transition between character states, and L is the likelihood; Table S3). This asymmetry is also recovered when accounting for possible differences in diversification rates between C3 and C4 lineages. A six-parameter model, which allowed for unequal transition rates between character states, was strongly preferred to one enforcing transition rates to be equal using the Akaike Information Criterion (AIC): (ΔAIC = 38.5, P = 1.9e−10). Under this model, we inferred an instantaneous transition rate of C3/C4 that was 50-fold higher than C4/C3 (QC3,C4 = 5.09e−3, QC4,C3 = 1.0e−4; Table S3). These results are consistent with other work emphasizing the prevalence of C4 origins over losses, and are further supported by variation in the anatomy, biochemistry and genetic determinism of the C4 pathway used by the different phylogenetic lineages (Christin & Besnard, 2009; Christin et al., 2010; Roalson, 2011). With the recent discovery of C3Eragrostis walteri as a member of C3 Arundinoideae (Ingram et al., 2011), Alloteropsis semialata subsp. eckloniana stands alone as the sole remaining candidate for a loss of C4 photosynthesis in grasses (Ibrahim et al., 2009). While lengthy discussion of Alloteropsis is beyond the scope of this paper, it is becoming increasingly plausible that Alloteropsis includes multiple parallel transitions from C3 to C4 (Christin et al., 2010).

Depending on the topology, ancestral character reconstructions inferred between 22 and 24 origins of the C4 pathway (Fig. 2, Table 1). Although many of these origins have been identified in previous analyses (summarized in Christin et al., 2009), their stability in light of our expanded taxon sampling increases our confidence that we have now correctly placed most of the C4 grass lineages. These include Eriachne (+ Pheidochloa; see Morrone et al., 2011), Aristida (excluding Aristida longifolia), Stipagrostis, Chloridoideae, Centropodia, Tristachyideae, Andropogoneae, Paraneurachne muelleri, Steinchisma (C3-C4 intermediates), the large ‘MPC’ clade (Melinidinae + Panicinae + Cenchrinae), and the Paspalum and Axonopus groups of Paspalineae (two origins based on the position of C3Streptostachys asperifolia). Digitaria stands as an additional strongly supported independent C4 lineage in our data set, although the nuclear marker phyB (phytochrome B) has placed Digitaria as sister to the MPC clade (Vicentini et al., 2008), and phosphoenolpyruvate carboxylase (ppc) places it within the MPC clade (Christin et al., 2007).

Figure 2.

Inference of C4 evolution in grasses. Histogram represents variation in the number of inferred C4 origins across all of PACMAD (Panicoideae, Arundinoideae, Chloridoideae, Micrairoideae, Aristidoideae and Danthonioideae) using stochastic mapping and 1025 alternative phylogenies sampled from a Bayesian posterior distribution of trees. The scatterplot inset indicates that most of the uncertainty in reconstructing C4 evolution results from poor phylogenetic resolution of one small, mixed C3/C4 clade, the Arthropogonineae, shown above. C3 only, black; C4 only, red; mixed C3 and C4, yellow.

Table 1.   C4 lineages identified in this study, with recommendations for comparative studies of closely related C3/C4 species pairs
 C4 lineageClear C3 sister for comparative work?
  1. Bold indicates high confidence in that particular origin (both that it is an independent origin and that it is correctly placed; regular text indicates uncertainty in either or both). Numbers refer to Fig. 1. *Alternatively, this may represent a reversal to C3 photosynthesis. See text for details.

  2. MPC, Melinidinae + Panicinae + Cenchrinae.

1AristidaAristida longifolia
3ChloridoideaeNo, and not likely
4CentropodiaEllisochloa rangei
7AndropogoneaeNo, and not likely
8ReynaudiaNo, and not likely
9AxonopusStreptostachys asperifolia
10PaspalumNo, and not likely
11AnthaenantiaOtachyriinae p.p.
12SteinchismaSteinchisma laxa
13ArthropogonNot yet, but likely
15OncorachisNot yet, but likely
16Coleataenia 1Not yet, but likely
17Coleataenia 2Triscenia
18DigitariaNo, and not likely
19EchinochloaNot yet, but likely
22–24AlloteropsisAlloteropsis eckloniana*

While our intensive sampling captures nearly all of the purported closely related C3 and C4 lineages in grasses (Table 2), we still lack a clear picture of photosynthetic evolution in two specific areas of the tree. The uncertainties in these two groups are responsible for all of the variation in our sensitivity analyses, with the number of inferred origins in each group ranging from two to five. The first is the Boivinellinae (sensu Morrone et al., 2011), a well-supported lineage in Paniceae that includes Alloteropsis and a second C4 group, Echinochloa. Because they have never been placed as sisters in any phylogenetic analysis known to us, we feel confident that these represent distinct C4 lineages. However, the identity of their closest C3 relatives remains unclear. Alloteropsis is often united with Entolasia, as is the case here, though with limited statistical support. This clade also presents additional uncertainty regarding C4 evolution within Alloteropsis, as discussed above. The nuclear gene ppc places Echinochloa as sister to the MPC clade (Christin et al., 2007), suggesting a possibly complex origin for this genus.

Table 2.   Unsampled genera in Paspaleae and Paniceae, their phylogenetic placement in Morrone et al. (2011), and their potential to represent an additional C3/C4 transition
GenusNo. of speciesC3/C4SubtribeMorrone placementWould inclusion result in a new transition?
Cyphonanthus1C4ArthropogoninaeSister to Steptostachys ramosaNo
Keratochlaena1C4ArthropogoninaeSister to Mesosetum chaseaeNo
Ocellochloa12C3PaspalinaeSister to EchinolaenaNo
Renvoizea10C3PaspalinaeIn polytomy at the base of PaspaleaePossibly
Spheneria1C4PaspalinaeSister to ThrasyopsisNo
Lecomtella1C3PaspalinaeSister to Gerritea (morphological data only)No
Baptorhachis1C4PaspalinaeSister to Ophiochloa (morphological data only)No
Acostia1C4PaspalinaeSister to Ophiochloa, Axonopus, and CentrochloaNo
Thrasya20C4PaspalinaeIn PaspalumNo
Tarigidia1C4AnthephorinaeSister to Chaetopoa (morphological data only)No
Trachys1C4AnthephorinaeSister to Chaetopoa (morphological data only)No
Odontelytrum1C4CenchrinaeIncluded in CenchrusNo
Zygochloa1C4CenchrinaeSister to Spinifex and PseudochaetochloaNo
Paratheria2C4CenchrinaeSister to Panicum antidotaleNo
Holcolemma4C3CenchrinaeSister to Ixophorus unisextus (morphological data only)Possibly
Streptolophus1C4CenchrinaeSister to Paratheria (morphological data only)No
Tricholaena12C4MelinidinaeSister to Leucophrys with both sister to Melinis repensNo
Moorochloa3C4MelinidinaeSister to Melinis repens + Leucophrys + TricholaenaNo
Leucophrys1C4MelinidinaeSister to TricholaenaNo
Chaetium3C4MelinidinaeSister to EriochloaNo
Megathyrsus3C4MelinidinaeSister to Urochloa muticaNo
Arthragrostis3C4PanicineaeBase of Panicinae (morphological data only)No

A second interesting and problematic area of the tree is the Arthropogoninae clade. This lineage is especially well sampled here, with 14 of 16 genera (19 of 50 species) included. The two missing genera are reported as C4 and have previously been tightly associated with C4 lineages that are included in our analyses: Keratochlaena has been aligned with Mesosetum, and Cyphonanthus has been reported as sister to Oncorachis (Morrone et al., 2011). In spite of this good taxon sampling, we have little confidence in our topology here. The current placement of Triscenia, a monotypic C3 taxon from Cuba, breaks C4Coleataenia into two lineages, but Triscenia falls outside of Coleataenia in many trees from our posterior distribution. Furthermore, the relationships between the Mesosetum and Altoparadisium C4 clades and the C3Homolepis lack support, blurring statistical inferences in this area of the phylogeny.

Both the Alloteropsis lineage and Arthropogoninae illustrate the complex nature of accounting C4 origins, and how arriving at a single number may be misleading in the end. Even a perfectly resolved phylogeny will not overcome the difficulty of modelling past photosynthetic transitions. The predominance of C3 to C4 transitions and the extreme rarity of back transitions are strongly supported by different lines of evidence (this study; Christin et al., 2010), but the extreme clustering of C4 groups in certain areas of the phylogenetic tree also questions the true ‘independence’ of many C4 origins. Certain precursor traits probably evolved early in these lineages which increased the accessibility of the C4 phenotype, and certain elements of the C4 phenotype in these ‘independent’ lineages were probably inherited from their common ancestor. This pattern of extended, parallel evolution of the C4 pathway has been demonstrated in several eudicot groups (e.g. McKown et al., 2005; Christin et al., 2011). Grass lineages highlighted in the present study that comprise closely related C4 lineages separated by C3 taxa, such as Arthropogoninae or Boivinellinae, represent ideal systems in which to investigate these hypotheses. We anticipate that the well-resolved phylogeny produced in this study will stimulate new comparative research aimed towards an integrative understanding of the processes that led to repeated evolution of C4 photosynthesis in the grasses.


Combining the large amount of data generated during 20 yr of grass phylogenetics with a formidable and targeted new sequencing effort, we produced a family-wide phylogeny for grasses with a large amount of supporting DNA sequence data. The vast majority of grass species can now be assigned to clades, and the relationships among these groups received the strongest support obtained to date (Figs S1, S2). This new phylogenetic framework should facilitate comparative work on this important group of plants (e.g. Ghannoum et al., 2005; Cousins et al., 2008; Taylor et al., 2011a,b). In particular, our dense sampling of C3/C4 transitions should be especially beneficial to the C4 research community. We also structured our sampling within C4 and C3 lineages so as to produce a tree with a clade representation that is roughly proportional to extant grass diversity. Our phylogeny should thus be useful for research on various issues, such as morphological and ecological diversification, variation in speciation/extinction rates, genomic evolution, biological invasions, and domestication of the world’s most important crops.

Despite these very significant improvements, our phylogeny still covers only 5% of all recognized grass species. While we aimed to include as many genera as possible, some of these may not be monophyletic. Previous analyses have revealed many cases of nonmonophyly within grasses, including highly polyphyletic genera (e.g. Panicum, Aliscioni et al., 2003; Setaria,Kellogg et al., 2009; Calamagrostis,Saarela et al., 2010), and members of the same genus even placed in different subfamilies (e.g. Eragrostis and Merxmuellera, Barker et al., 1999; Ingram et al., 2011). While recent and ongoing taxonomic revisions are improving matters greatly (Zuloaga et al., 2006, 2007a,b, 2010, 2011; Morrone et al., 2007, 2008; Sede et al., 2008, 2009; Peterson et al., 2011), the exact relationships among the numerous grasses will remain only approximated until most species are sequenced.

Phylogenetic studies across the entire Tree of Life over the past decades have left us with improved understanding of how the major groups of organisms are related to one another. Arguably the greatest remaining challenge is one of ‘filling in the tips’; we see grasses as now currently poised to be a model lineage for experimenting with finding the best approach to this difficult problem. The number of grass species investigated is continuously increasing thanks to numerous taxonomically motivated sequencing studies of specific, smaller groups (e.g. Schneider et al., 2009; Sungkaew et al., 2009; Pirie et al., 2010; Peterson et al., 2010; Salariato et al., 2010; Tang et al., 2010). These studies in part utilize fast-evolving noncoding markers that are frequently difficult to align between distant grasses, but the backbone phylogeny developed in this study could be used to combine these independently produced phylogenies. A supermatrix approach would allow simultaneous analysis of a high number of grasses (e.g. Salamin et al., 2002; Edwards & Smith, 2010), but studies using this approach typically only include loci that are widely sampled and can be aligned across the entire group, thus leaving out large amounts of available phylogenetic information. An alternative would be to use the present family-wide phylogeny as a backbone reference on which to graft more detailed phylogenies of specific groups. Setting topological and temporal constraints based on more deeply sampled phylogenies such as the one presented here would depict an evolutionary scenario congruent with phylogenetic and paleobotanical knowledge accumulated at larger taxonomic scales.


This paper is dedicated to the memory of Dr Osvaldo Morrone, who devoted his career to understanding the evolution of grasses. His expertise, particularly in subfamily Panicoideae, made this paper and many others possible. This study was conducted by members of the NESCent working group ‘Grass Phylogeny Working Group II: Inferring the Complex Evolutionary History of C4 Photosynthesis in Grasses’, led by E.J.E, N.S, and S.A.S. The sequencing effort was funded by the grants Marie Curie IOF 252568 to P.A.C, NSF DEB-0920147 to J.T.C, NSF IOS-0843231 to E.J.E, NSF DEB-0921203 to A.I., Swiss NSF 3100A0_122433 to N.S, and a grant from the Plant Molecular Biology Center, Northern Illinois University, to M.R.D. We thank Neil Snow (Bishop Museum) for providing plant material. Carrie A. Kiel (RSABG), Colin P. Grennan, Sean V. Burke, and Sam S. Jones (Northern Illinois University) helped with the sequencing process. Several DNAs were kindly provided to G.B. by the Royal Botanic Gardens, Kew, UK.