C3, C4 or Crassulacean acid metabolism (CAM) photosynthetic pathways represent a fundamental axis of trait variation in plants, with importance at scales from genome to biome. Knowing the distribution of these pathways among wild species is a crucial first step in understanding the patterns and processes of photosynthetic evolution and its role in ecological processes at large scales (e.g. changes in the composition of biomes under global change). C4 photosynthesis is most prevalent in the Poaceae (grasses), which account for about half of all C4 species (Sage et al., 1999a). Research on the evolution and ecology of these plants has undergone a renaissance during the last 7 yr, catalyzed by phylogenetic analyses showing multiple parallel C4 origins (e.g. Christin et al., 2007; Vicentini et al., 2008; GPWG II, 2012), insights into the distribution of C4 species and assembly of the C4 grassland biome (Edwards & Still, 2008; Edwards & Smith, 2010; Edwards et al., 2010), and efforts to introduce the C4 pathway into rice (Hibberd et al., 2008; von Caemmerer et al., 2012). C4 photosynthesis is an excellent model for investigating complex trait evolution, because of the broad knowledge base describing its biochemical basis, evolutionary history, and ecological interactions (Christin et al., 2010).
Why do we need a C4 database?
Investigations of the evolution and ecological significance of C4 photosynthesis are increasingly turning to large-scale comparisons of C3 and C4 species. These are straightforward for well-characterized common or model species. However, when comparisons are extended to include large numbers of nonmodel species, two important challenges arise. First, there are > 62 000 published scientific names for grasses corresponding to over 11 000 accepted species (Clayton et al., 2002b onwards), making an average of five synonyms for each accepted name. This leads to problems when linking data based on alternative names for the same species concept, and to redundancy in published data surveys, when values for synonyms are presented as independent data. Second, although there have been extensive previous surveys of the photosynthetic pathway spanning the diversity of wild species (Hattersley & Watson, 1992; Sage et al., 1999a), the rarity of most species means that this work is incomplete, and the synonymy problem makes it difficult to identify the gaps in these data.
Accounting for synonymy and spelling variants/mistakes has become one of the central challenges for the emerging fields of ecological and evolutionary informatics, in which data are synthesized across different sources on increasingly larger scales (Jones et al., 2006; Sidlauskas et al., 2009). In one infamous example, a 22.5 million record database of plant species occurrences and traits for the Americas contained more scientific names than there are thought to be plant species on Earth (Whitfield, 2011). However, this taxonomic impediment to data synthesis has been progressively broken down by a combination of new methodological developments for name matching (Patterson et al., 2010; Boyle et al., 2013; Chamberlain & Szocs, 2013; Kluyver & Osborne, 2013), and the compilation of nomenclatural databases by botanic gardens and natural history museums (e.g. The Plant List, 2010). Here, we showcase how such resources may be used to assemble and index databases of discrete traits for large numbers of species.
Compilation and overview of the data
Our database of C3 and C4 photosynthetic types in grasses is based principally on published anatomical and stable carbon isotope evidence. We followed previous authors in assuming that all species within each genus shared the same photosynthetic pathway, unless the evidence suggested otherwise. However, we also measured δ13C for 99 species that had not previously been surveyed, including 96 species of Panicum s.l., Acostia gracilis, Lophopogon tridentatus and Thedachloa annua (Supporting Information Table S1). We also obtained information on leaf anatomy and measured δ13C to check previous unverified reports of a C3 species (Stipagrostis paradisea) in an otherwise C4 genus (Sage et al., 1999a), and a C4 subspecies (Chaetobromus dregeanus ssp. involucratus) in an otherwise C3 subfamily (Danthonioideae; Watson & Dallwitz, 1992 onwards). In both cases, our data contradicted previous reports, showing that the photosynthetic pathway of these taxa matches that of their close relatives; S. paradisea is C4 and C. dregeanus ssp. involucratus is C3 (Table 1).
Table 1. Photosynthetic type for genera previously reported to include a mixture of C3, C4 and C3–C4 intermediate species
Anatomical (H. P. Linder, pers. comm.) and δ13C evidence (Supporting Information Tables S1, S3) conflicts with a previous report that this subspecies is C4 (Watson & Dallwitz, 1992 onwards).
Anatomical evidence shows that in D. calviniensis most mesophyll cells are no more than one cell distant from bundle sheath cells, making it potentially a C3–C4 intermediate.
Genus known to be polyphyletic.
Note that a recent phylogenetic treatment (Ingram et al., 2011) places E. walterii outside the genus Eragrostis. However, its taxonomy has not yet been revised.
Anatomical evidence showing a concentration of chloroplasts within large bundle sheath cells suggests that this species is potentially a C3–C4 intermediate (Christin et al., 2013; P-A. Christin, pers. comm.).
Phylogenetic analysis places the genus Paraneurachne nested within the genus Neurachne (Christin et al., 2012).
Anatomical (Renvoize, 1986) and δ13C evidence (Tables S1, S3) conflicts with a previous report that this species is C3 (Sage et al., 1999a).
The photosynthetic pathways of Taeniorhachis repens, Veldkampia sagaingensis and 39 rare species of Panicum s.l. remain unclassified, because we were unable to take samples of type specimens from herbarium collections. Most of these species are endemics of Madagascar (26 species), and the remaining 13 species are endemics of a small number of countries in Africa and Southeast Asia, and oceanic islands (Table S2). This means that the database is complete for most countries of the world.
Our approach has been to map the photosynthetic pathway data onto accepted species names in the Poaceae taxonomy of Clayton & Renvoize (1986) and Clayton et al. (2002b onwards), which is the most comprehensive treatment of accepted names and synonymy for grasses (see Methods S1 for full methodology). Coupling our dataset with this synonymy allows users to return the photosynthetic type for all except 46 (corresponding to 41 accepted species) of the 62 678 published scientific names (accepted names and synonyms) for grasses (Clayton et al., 2002b onwards). We have developed software tools to facilitate this task for users, which are detailed in the following.
The database covers 99.6% of the 11 087 grass species. It shows that 42% of these species use the C4 photosynthetic pathway and 57% the C3 pathway (Table S3; Notes S1). Six genera (Alloteropsis, Aristida, Eragrostis, Neurachne, Panicum s.l., and Streptostachys s.l.) contain both C3 and C4 species (Tables 1,S4). Seven C3–C4 intermediate species (Table 1) are distributed between the genera Neurachne (one species) and Steinchisma (six species). Within the genus Panicum s.l., 169 species are C4, 250 are C3, and 41 remain unknown, with the photosynthetic type of Panicum ruspolii ambiguous on the basis of new δ13C measurements (Tables S1,S2,S4; Notes S1). The latter species may be a previously unrecognized C3–C4 intermediate, but further work is required to test this hypothesis. A number of further potential C3–C4 intermediates have been identified on the basis of anatomical observations (Tables 1,S3), and also need to be investigated physiologically. These are Dregeochloa calviniensis (most mesophyll cells are no more than one cell distant from bundle sheath cells; Watson & Dallwitz, 1992 onwards), Homolepis aturensis and Streptostachys asperifolia (concentration of chloroplasts in large bundle sheath cells; Christin et al., 2013; P-A. Christin, pers. comm.). In total there are therefore 11 putative C3–C4 intermediates in the grasses.
A number of caveats are important when collating and using large trait databases of this kind. The assumption that all species within each genus share the same photosynthetic pathway is reasonable in most cases. However, significant and interesting exceptions, such as the C3Aristida species in an otherwise C4 genus (Cerros-Tlatilpa & Columbus, 2009), raise the possibility of errors at the species level. Misclassification is most likely in lineages where multiple evolutionary transitions between photosynthetic pathways have occurred, especially in Paniceae and Paspaleae (Morrone et al., 2012). The polyphyly of many grass genera accentuates this problem, most acutely illustrated by Eragrostis walteri, which was previously considered to be a C3 species within a wholly C4 genus (Table 1). Recent phylogenetic work has demonstrated that this species is actually a member of the C3 Arundinoideae lineage and misplaced within Eragrostis (Ingram et al., 2011).
The polyphyly of grass genera means that Tables 1, S3 and S4 should be interpreted with caution. While they do catalogue the known distribution of C4 photosynthesis among taxa, they do not necessarily provide information about its evolutionary history. However, ongoing phylogenetic work is steadily resolving the polyphyly issue, which is most acute in the genus Panicum. We have used the conservative circumscription of Panicum s.l. adopted in GrassBase (Clayton et al., 2002b onwards) and recently carried over to the World Checklist of Poaceae (Clayton et al., 2012 onwards) and The Plant List (The Plant List, 2010), because these online resources provide the most comprehensive, global list of accepted names and synonyms, and are regularly updated in the light of new publications. Using the software tools detailed in the following, it is straightforward to link the C3/C4 data listed for Panicum s.l. (see Table S4) to the new genus circumscriptions. The same applies to Streptostachys s.l. (Table 1).
How to access the database
Easy routes for users to access information are crucial determinants of the usefulness and usage of data. Our database may be accessed via three routes. The first is static, but the second and third will report updates to the database as we make them.
First, simple tables list photosynthetic pathway by accepted scientific name, and may be accessed in the Supporting Information (Tables S3,S4). These require the user to first prepare a list of accepted species names according to the taxonomy of Clayton et al. (2012 onwards) for the taxa of interest.
Second, the name-matching and data-linkage steps may be combined within the software package Taxonome (Kluyver & Osborne, 2013; http://taxonome.bitbucket.org; persistent URL http://purl.org/NET/taxonome). Taxonome links datasets using species names, handling both synonyms and spelling variants (including spelling mistakes). It deals rapidly with millions of names, and runs via either a simple Graphical User Interface (GUI) for basic functionality or python scripts for advanced users. A user first loads the Kew taxonomy and photosynthetic pathway database via a data file obtained from the Taxonome website. Custom lists comprising any published grass names may then be rapidly matched to this database, and outputted in CSV format.
Third, the photosynthetic pathway data are linked to the Kew taxonomy, together with morphological, phylogenetic, biogeographic and environmental data within the GrassPortal system (Osborne et al., 2011; www.grassportal.org). GrassPortal enables users to easily assemble large-scale, synthetic data products based on multiple original sources, and is accessed via an intuitive and simple GUI. Using this system, users are able to assemble a list of all grass species present in a particular geographic area, linked to photosynthetic pathway, growth form, and environmental niche data.
Large-scale data synthesis
By carrying out technically challenging bioinformatic steps of data processing and linkage, services like GrassPortal open up new possibilities for a broad biological community to explore large-scale synthetic data products. For example, linkage of the photosynthetic pathway dataset with species occurrence data (Clayton et al., 2002a) allows the distribution of C4 grass species to be mapped at the global scale (Fig. 1). This map improves the global coverage compared with previous data compilations, especially for Africa, South America and Southeast Asia (Sage et al., 1999b). It particularly highlights the prevalence of C4 photosynthesis among African grasses (Fig. 1a), and the importance of central-east Africa, India and northern Australia as hotspots of C4 grass species richness (Fig. 1b). The new dataset also facilitates large-scale macroevolutionary analyses. For example, the Grass Phylogeny Working Group II (2012) used our data in phylogenetic analyses to discover multiple new C4 lineages, and to infer that evolutionary gains prevail over losses of this trait. Another recent study used our data in a macroevolutionary analysis to show an association between C4 photosynthesis and salt tolerance in grasses (Bromham & Bennett, 2014).
The integration of our C4 pathway data with information on geographical distributions, environmental niche, and phylogenetic relationships promises important novel insights into the ecological significance and evolution of this complex physiological and anatomical trait. More generally, it offers biologists an example of how functional trait data may be used in large-scale synthesis and analysis to advance our understanding of the ecological and evolutionary processes acting on organisms.
We are grateful to Liliana Giussani and Pascal-Antoine Christin for their critical comments on the manuscript. This work builds on that of previous authors who have compiled comprehensive databases on grass leaf anatomy and photosynthetic type, including C. R. Metcalfe, Walter Brown, Roger Ellis, Paul Hattersley, Les Watson, and Rowan Sage, and owes a debt of gratitude to them. We thank Les Watson for his generosity in allowing us to use data from Grass Genera of the World (http://delta-intkey.com/grass/), and the following for financially supporting this work: C.P.O. was supported by a Royal Society University Research Fellowship and NERC grant number NE/I014322/1, A.S. by the European Union's Erasmus scheme, T.A.K. by a University of Sheffield Postgraduate Studentship, and V.V. by the GrassPortal project supported by the e-content programme of the JISC. The development of GrassPortal was funded by the JISC, with additional support from the University of Sheffield, the Royal Botanic Gardens, Kew, and Knowledge-Now Limited. We thank Peter Linder of the University of Zurich for information on the leaf anatomy of Danthonioid grasses, Tony Verboom (University of Cape Town) for material of Chaetobromus dregeanus for isotopic analysis, Marjorie Lundgren for her help in acquiring material and Heather Walker for running the analyses. We also thank the following herbaria and their staff for generous help with plant material for the isotope survey: Skye Coffey (Western Australian Herbarium), Olof Ryding (Botanisk Museum, Koebenhavus Universitet), Mats Thulin (Uppsala University), Brendan Lepschi (Australian National Herbarium), Bryan Simon (Queensland Herbarium), Lyn Fish (SANBI) and the Missouri Botanical Garden.