A critical framework for the assessment of biological palaeoproxies: predicting past climate and levels of atmospheric CO2 from fossil leaves


Author for correspondence:
Greg Jordan
Tel: +613 62267237
Email: greg.jordan@utas.edu.au



II.Key concepts in the uncertainty of proxy evidence30
III.Uncertainties in major foliar physiognomic proxies of MAT31
IV.Stomatal density and stomatal index34
V.Steps forward38


This review uses proxies of past temperature and atmospheric CO2 composition based on fossil leaves to illustrate the uncertainties in biologically based proxies of past environments. Most leaf-based proxies are geographically local or genetically restricted and therefore can be confounded by evolution, extinction, changes in local environment or immigration of species. Stomatal frequency proxies illustrate how genetically restricted proxies can be particularly vulnerable to evolutionary change. High predictive power in the modern world resulting from the use of a very narrow calibration cannot be confidently extrapolated into the past (the Ginkgo paradox). Many foliar physiognomic proxies of climate are geographically local and use traits that are more or less fixed for individual species. Such proxies can therefore be confounded by floristic turnover and biome shifts in the region of calibration. Uncertainty in proxies tends to be greater for more ancient fossils. I present a set of questions that should be considered before using a proxy. Good proxies should be relatively protected from environmental and genetic change, particularly through having high information content and being founded on biomechanical or biochemical principles. Some current and potential developments are discussed, including those that involve more mechanistically sound proxies and better use of multivariate approaches.

I. Introduction

Because fossils do not provide direct evidence of past environments (Parrish, 2001), they can only provide proxy (surrogate) evidence. However, fossil-based proxies are critically important to our understanding of the origins and evolution of our environment, and provide much of the evidence for past climates (Parrish, 2001) and atmospheric composition (Ehleringer et al., 2005).

Any measurable characteristic of fossil material that can be assumed to reflect climatic, atmospheric and other environmental parameters has the potential to be used as a proxy (Parrish, 2001). Quantitative proxies make numerical estimates of specific aspects of past environments (such as mean annual temperature, MAT) from fossils. Such proxies typically work by quantifying the relationship between current or recent records of the target parameter and features of the living organism, such as morphology, anatomy or chemistry. These traits are then measured on fossils, and the relationship is used to estimate the values of the target parameters for the time and place of the deposition of the fossils.

This review provides a critical framework for an understanding of the uncertainties of quantitative fossil-based proxies of past environments. I consider in detail the use of foliar physiognomy (i.e. gross morphology of fossil leaves; see references cited by Peppe et al., 2011) and the relative frequency of stomata on leaves (Royer, 2001) to estimate MAT and past levels of atmospheric CO2, respectively. These proxies were chosen because they illustrate key principles, are major tools for inferring past environments and are under active research. As a means of understanding the principles involved, the discussion of specific proxies concentrates on the limits of proxies when used in isolation. The logic of this is that an understanding of the uncertainties of components of an inference aids in the understanding of the overall uncertainty. I also discuss how contextual information (including independent, corroborative evidence) can add strength to these proxies. Furthermore, each of the proxies should be considered as a work in progress, and I consider some possibilities for improved methods.

II. Key concepts in the uncertainty of proxy evidence

1. Uncertainty in palaeoproxies, inference space and the correlational nature of proxies

Quantitative estimates of past environments made from proxies can be considered as statistical predictions. Prediction involves the creation of a model based on a calibration dataset that is a representative sample of observations from a more general pool (technically known as the population), and the use of this model to predict values for other members of that population (Sokal & Rohlf, 1995). For analytical reasons (in particular, to allow the quantification of uncertainty), the calibration dataset is usually randomly sampled from the population, apart from unbiased external factors imposed on the sampling (fixed effects) (Sokal & Rohlf, 1995). The model is valid across the range of samples present in the population (Sokal & Rohlf, 1995). This range is called the ‘inference space’ of the model.

Some examples can illustrate the inference spaces of proxies, and how they may be constrained in terms of physical and biotic environment, genetics and time. The inference space of a proxy calibrated using a single species sampled across a certain range of environments (e.g. many stomatal frequency proxies; see Section IV) is the genotypic range of the sample, but only across the environmental range sampled. Similarly, proxies calibrated using multiple species (for example, foliar physiognomic proxies; Section III) are valid across the genetic, phylogenetic and ecological range spanned by that sample of species.

As one cannot observe the past directly, the reconstruction of past environments depends on uniformitarian assumptions (Gould, 1965). When using leaf-based proxies to estimate past environments, it is assumed that the fossils are consistent with the inference space of the model. This is, in effect, assuming that the fossils fall within the inference space of the proxy. This makes the implicit assumptions that the leaf–environment relationship has not changed and the fossils are unbiased representatives of the source plants (or that one can compensate for any biases). Because these assumptions may be violated, the application of the proxy to fossils must add uncertainty. Wolfe (1993) assumed that leaf traits are adaptations, and argued that this uniformitarian assumption is acceptable provided that there is a sufficiently strong association between the leaf traits and environment. However, the following discussion argues that the stability of the leaf–environment relationship is made more complex by the correlational nature of proxies and the effect of genetic change (evolution, extinction and movement of species).

The leaf-based proxies focused on in this review are calibrated using correlations between the distribution of the relevant leaf trait(s) in living or recent fossil floras and associated values of the target parameter (Wolfe, 1993). However, when the plants do not respond directly to the target parameter, but instead to a correlated aspect of environment, the proxy must assume that this correlation has not changed. Thus, in the modern world, seasonal temperatures may be correlated with MAT, so that a proxy that depends on a plant’s response to seasonal temperature may also predict MAT. However, changes in seasonality will alter the correlation between seasonal temperatures and MAT, leading to errors in the proxy (Jordan, 1997a,b). An understanding of the mechanism underlying the leaf–environment relationship is therefore fundamentally important for the identification of which environmental characteristic the leaves respond to directly.

There is also the potential that the relationship between the leaf trait and the environment has changed through evolutionary processes. To understand the likelihood of this having happened, one first needs to establish some concepts relating to the control of plant development. The leaf-based proxies depend on the response of the relevant leaf traits to changes in environment through phenotypic plasticity (responses without genetic change) and/or changes in genotypic composition. For proxies employing multiple species (such as the foliar physiognomic approaches; Section IV), the changes in genotypic composition may involve changes in species’ composition and genotypes within species. Genotypic composition at a site can change through evolutionary adaptation, or through immigration or extinction of genotypes. Furthermore, plastic responses are typically under genetic control, so that different species and, indeed, different genotypes within species can show markedly different plastic responses to environment (Scheiner, 1993; Agrawal, 2001). This means that changes in genotypic composition can affect proxies that involve plastic responses to environment.

2. Evidence indicating potential changes in the leaf–environment relationship

Validation tests can be used to identify the potential for biases in proxies resulting from changes in the leaf–environment relationships. Thus, proxies calibrated using a subset of the modern world (different regions, environments or species/genotypes) can be validated by testing how well they estimate for other subsets of the modern world, or whether they are internally consistent. Such tests include looking for regional variation in the relevant leaf–environment relationship, and whether the relationship varies according to the environmental factor that induces the variation in the underlying trait (see the discussion on foliar physiognomy, Section III.2). The presence of variation in the leaf–environment relationships among species or among genotypes within species can be evidence for important genetic effects on proxies (as discussed for stomatal frequency proxies, Section IV.2). Experimental manipulation provides evidence of other environmental effects on the leaf–environment relationships, and common garden experiments can be used to segregate genetic from plastic responses (Falconer & Mackay, 1996). The use of proxies to validate other proxies is logically different and is considered in Section V.1.

3. From living plant to fossil collection – taphonomic, diagenetic and sampling filters

The leaves in a fossil flora do not form a random sample of the leaves in the source vegetation (Spicer, 1989). The movement of leaves from the living plant to the site of deposition (taphonomic processes), the alteration of leaves after deposition (diagenetic or preservational processes), and collection and sample preparation can affect the species present and which leaves of a given species are present in a fossil flora (Spicer, 1989). These changes in composition can result in biases in the leaf–environment relationships (e.g. Greenwood, 1992; Wolfe, 1995).

III. Uncertainties in major foliar physiognomic proxies of MAT

The foliar physiognomic methods discussed here (leaf margin analysis, CLAMP (Climate–Leaf Analysis Multivariate Program) and digital leaf physiognomy; Wolfe, 1979, 1993; Royer et al., 2005) estimate MAT from average scores or proportions of leaf morphological traits for the woody dicot species. These approaches have been applied only from the Cretaceous to the Neogene, as they rely on angiosperms. Other foliar physiognomic proxies of various climatic or other environmental parameters (e.g. Christophel & Greenwood, 1989; Wilf et al., 1998; Spicer et al., 2003, 2004) are not considered.

Leaf margin analysis is based on the proportion of species with entire leaf margins present within sites (Fig. 1a). The concept originated with Bailey & Sinnott (1915), and was converted into a quantitative proxy using calibration sets from the Northern Hemisphere (Wolfe, 1979; Wilf, 1997; Traiser et al., 2005; Adams et al., 2008; Su et al., 2010). Different temperature–leaf margin relationships have been established for parts of the Southern Hemisphere (Kennedy et al., 2002; Kowalski, 2002; Greenwood et al., 2004; Hinojosa et al., 2011).

Figure 1.

Proxies for estimating past climates based on the leaves of woody dicot species. Each graph shows values for a wide range of sites, and a regression that has been used to estimate mean annual temperature (MAT) from fossils. (a) Percentage of species (per site) with entire margined leaves from sites in east Asia vs MAT at that site, adapted from Wolfe (1979), courtesy of the U.S. Geological Survey. (b) Predictive relationship for MAT from Wolfe’s (1995) CLAMP (Climate–Leaf Analysis Multivariate Program). The x-axis is the score along a vector in multidimensional space identified by canonical correspondence analysis to have the strongest correlation with MAT based on site means of a suite of characters describing leaf size, shape and margin type.

CLAMP (Wolfe, 1993, 1995) makes estimates of MAT and other climatic and environmental parameters from multivariate analyses of leaf size, shape and margin type. CLAMP is based on a suite of variables representing the presence/absence of categories of the size, margins and shape of leaves (see http://clamp.ibcas.ac.cn/). These data have mostly been analysed using ordination followed by regression of MAT on the resulting axes (Fig. 1b), with some attempts at the multiple regression of raw data (Greenwood & Wing, 1995; Gregory-Wodzicki, 2000; Teodoridis et al., 2011). The principal datasets are largely made up of sites in the USA, Canada, China and Japan (http://clamp.ibcas.ac.cn/). Regions outside this geographical range have been sampled using the same protocols, but these data have not been incorporated into the main datasets and models.

Digital leaf physiognomy (Huff et al., 2003; Royer et al., 2005; Peppe et al., 2011), like CLAMP, estimates MAT and other climatic parameters from multivariate analyses of leaf form. It employs digitally measured, continuous variables and analyses them using multiple regression (Huff et al., 2003; Royer et al., 2005; Peppe et al., 2011). The most recent dataset uses many CLAMP sites, but includes other sites that give a more global representation than CLAMP (Peppe et al., 2011).

1. Underlying traits and control of foliar physiognomic traits

Leaf margin analysis, CLAMP and digital foliar physiognomy are all strongly empirical. This is because leaf margin type dominates the estimates of MAT from all of these methods (Wolfe, 1995; Wilf, 1997; Peppe et al., 2011), and no current explanation for the incidence of leaf teeth implies a direct relationship with MAT. Royer & Wilf (2006) argued that leaf teeth may be sites of elevated photosynthesis during leaf expansion, so that teeth may be favoured in cold climates where rapid expansion during spring is essential. Wolfe (1993) argued that teeth may increase transpiration, thus helping to maintain sap flow in expanding leaves. In addition, leaf teeth release root pressure through guttation from hydathode tissue inside leaf teeth (Feild et al., 2005).

Leaf teeth and other aspects of foliar physiognomy are under strong genotypic control. Potts & Jordan (1994) showed strong quantitative genetic control of leaf shape and size characteristics in a eucalypt. Although Royer et al. (2009b) presented evidence that temperature change induced a plastic response in leaf margin characters in Acer rubrum, the response was only c. 15% of that expected from the temperature differences (allowing for up to c. 85% genetic control of these characters). Indeed, the key trait of the presence/absence of toothed leaf margins appears to be more or less fixed for given genotypes. Thus, species and even major groups of species often either have toothed leaf margins or not, regardless of climate. For instance, all species of Myrtaceae have entire margined leaves, even though they occur across a range of MAT of 23°C or more (Kubitzki, 2007). More generally, the phylogenetic composition of a flora strongly influences the incidence of species with toothed leaves even within regions (Little et al., 2011). The phylogenetic effect may be even greater between broad regions, as some lineages are unique to, or more common in, major regions. As a result, the observed leaf margin–climate relationship appears to be largely a consequence of community assembly processes bringing together the balance of species that creates the relationship.

2. Genetic and environmental impacts on the relationship

Large regional effects show that foliar physiognomy fails the validation test of comparing regional relationships. Temperate floras of different continents have markedly different leaf–climate relationships (Stranks & England, 1997; Gregory-Wodzicki, 2000; Greenwood et al., 2004; Aizen & Ezcurra, 2008; Steart et al., 2010; Hinojosa et al., 2011), resulting in differences in predicted MAT of as much as 5°C or more (Jordan, 1997b). Even within broad regions, relationships can vary (Adams et al., 2008), and responses to MAT arising from altitude can differ from those arising from latitude (Halloy & Mark, 1996; Jordan, 1997b).

Variation in current environments may contribute to regional differences in the leaf–climate relationship. For instance, Southern Hemisphere temperate floras contain fewer deciduous species than floras at comparable northern latitudes (Axelrod, 1966; McGlone et al., 2004), and two of these southern regions (South Africa and Australia) are famous for the predominance of sclerophyllous plants with long-lived, evergreen leaves. These differences may be a result of thermal equability and typically low soil nutrient levels favouring long-lived, evergreen leaves (Turner, 1994; Wright et al., 2004a) in the Southern Hemisphere. Given that cool-climate deciduous species have a greater incidence of leaf teeth for a given climate than do evergreen leaves, these environmentally driven effects on morphology could have induced major differences in leaf–climate relationships between northern and southern floras (Jordan, 1997b; Peppe et al., 2011).

The second potential cause for regional variation in the leaf–climate relationship is a historical genetic signal (Jordan, 1997b). Such signals include phylogenetic effects, in which regional variation in the leaf–climate relationship is attributed to differences in historically determined lineage composition (Greenwood et al., 2004; Little et al., 2011). Thus, the entire margined family, Myrtaceae, dominates many Australian nonarid habitats (Groves, 1999). However, biases resulting from a strong phylogenetic influence on leaf–climate relationships and marked regional differences in lineage composition may be damped to some degree by the way in which foliar physiognomic approaches employ averages across many lineages. In addition, the phylogenetic differences in leaf form may be at least partly associated with differences in habitat through ecological lineage sorting, as discussed by Westoby et al. (1995). Thus, even in the extreme example given above, Myrtaceae are rare or absent from very cold environments, therefore limiting the bias induced by their entire margined leaves on estimates of palaeotemperature.

It has been argued that problems of regional variation in the leaf–climate relationship can be avoided by the use of geographically local physiognomic models (Stranks & England, 1997; Kowalski, 2002; Spicer, 2007; Hinojosa et al., 2011). Indeed, some models are implicitly geographically local – for example, the CLAMP dataset is strongly focused on the northern temperate zone (Wolfe, 1993, 1995; http://clamp.ibcas.ac.cn/). However, I next argue that the utility of local models is limited because they are inconsistent with the assumption that the leaf–climate relationship has remained constant. This limitation becomes progressively greater with the greater age of fossils, regardless of whether the regional variation in the leaf–climate relationship is a result of historical genetic effects or regional differences in environment.

The historical genetic contribution to interhemispheric variation in the leaf–climate relationship has, at times, been explained by putative Gondwanan origins of the Southern Hemisphere floras, compared with the more Laurasian heritage for the northern floras (e.g. Hinojosa et al., 2006). If this was the main factor, then regional models could arguably be extended back to Gondwanan times. However, as noted by Jordan (1997b) and Peppe et al. (2011), this view does not allow for compelling evidence of more recent and profound changes in the phylogenetic composition of temperate floras worldwide in response to climate change, landscape evolution and immigration of species from other regions (Momohara, 1994; Graham, 1999; Lee et al., 2001; Tiffney & Manchester, 2001; Hill, 2004; Hinojosa et al., 2006; Svenning & Skov, 2007; Sniderman & Jordan, 2011).

If geographical variation in the modern leaf–climate relationship is a result of regional differences in current environment, regional leaf–climate relationships cannot have remained constant through time. For example, if thermal equability or soil nutrients influence foliar physiognomy, it is perilous to extend Northern Hemisphere-specific models to the pre-Quaternary, when the Northern Hemisphere had more equable climates (Wing & Greenwood, 1993) and possibly poorer soils before the soil renewing effects of Pleistocene glaciation. Given that prevailing leaf physiognomic models are largely Northern Hemisphere local models, Quaternary climates may have induced fundamental biases. Analogous problems apply across all regions.

3. Analytical, taphonomic and other biases

Multivariate proxies can incorporate aspects of leaf morphology that compensate for biases in univariate relationships. However, Peppe et al. (2010) showed that the use of categorical variables can result in significant systematic errors in CLAMP-based estimates. In addition, the correspondence analysis methodologies employed by CLAMP can distort relationships between dependent and independent variables (Minchin, 1987), which has the potential to bias the results. The alternative approach (multiple regression) can be biased if the relationships between leaf and environmental traits are not linear (as occurs within the CLAMP dataset; Wolfe, 1995).

Taphonomic effects on foliar physiognomic proxies are relatively large, but difficult to quantify (Greenwood, 1992; Spicer et al., 2005; Dilcher et al., 2009). Greenwood (1992) argued that taphonomic processes alone may have added an uncertainty of c.± 1°C to physiognomic temperature estimates using leaf size, although it is less clear how strong the effects would be on other traits. Some broad principles have become apparent. Fossil assemblages are biased towards riparian species, certain taxonomic groups over others (Tegelaar et al., 1991; Briggs, 1999) and, possibly, some morphotypes over others. Post-depositional processes may also be important, but are poorly studied for leaves. However, shrinkage caused by drying and heating (Cleal & Shute, 2007) may affect some important physiognomic features, such as leaf dimensions, size of teeth and numbers of teeth per length of leaf margin, but will have little or no effect on dimensionless measures of leaf shape, such as ratios and the presence/absence of toothed margins.

4. Overall uncertainties

Large uncertainties are associated with current leaf proxies of past climates. Even assuming no biases, globally calibrated leaf physiognomic proxies for temperature have standard errors of c. 4°C (Peppe et al., 2011). This broad uncertainty must widen when the application of the proxy to the past is considered. Phylogenetic, habitat-related, taphonomic, diagenetic and sampling effects can all introduce biases of several degrees and add uncertainty to the proxies (Burnham et al., 2001; Kowalski & Dilcher, 2003; Greenwood, 2005; Royer et al., 2009a,b; Little et al., 2011).

The degree to which phylogenetic and environmental impacts on the leaf–climate relationships affect estimates of past MAT can be expected to be time related. If regional differences in the leaf–climate relationship are mainly phylogenetic, Neogene estimates from geographically local models will be biased by Quaternary floristic restructuring. The biases will be even greater for the Palaeogene and Cretaceous fossils. If regional differences are mainly environmental, the environmental changes over the same periods will also induce biases. This means that, although geographically local models show smaller standard errors within their strict inference spaces than the global model mentioned above, such local models will become progressively less useful for pre-Quaternary periods.

IV. Stomatal density and stomatal index

Intense interest in the use of the frequency of stomata on leaves to estimate past atmospheric CO2 commenced with evidence that the number of stomata per unit leaf area (stomatal density, SDe) in a range of woody species increased as the partial pressure of CO2 (pCO2) decreased with altitude (Woodward, 1987). Initial attempts to develop stomatal palaeoproxies of atmospheric CO2 focused on SDe (Beerling & Chaloner, 1992, 1994). However, the numbers of stomata in a developing leaf are fixed well before complete leaf expansion (Schoch et al., 1980; Ticha, 1982), which means that SDe is affected by how much a leaf expands after stomatal fixation. Proxies based on the stomatal index (the ratio of stomata to stomata plus epidermal cells, SI; Salisbury, 1927) avoid this problem and have largely replaced SDe.

Stomatal frequency-based proxies from fossils from a range of extant groups of angiosperms and gymnosperms have been applied from the Holocene (the last c. 12 000 yr) back to the late Palaeozoic c. 300 million yr ago (e.g. Wagner et al., 1999; Retallack, 2001; McElwain et al., 2002; Kouwenberg et al., 2005; Van Hoof et al., 2005, 2006; Kürschner & Kvacek, 2009; Barclay et al., 2010). In addition, some estimates of changes in Mesozoic levels of CO2 have been based on fossils of extinct groups (Haworth et al., 2005; Vording et al., 2009; Bonis et al., 2010).

Most stomatal frequency-based proxies are based on the variation within individual species, and are calibrated in a range of different ways (Royer, 2001). The first is to analyse the temporal trends observed among herbarium specimens of different ages, exploiting the increase in atmospheric CO2 concentration from c. 280 to c. 396 ppm since the advent of the industrial era. The second is by experimental induction of changes in stomatal frequency with elevated or depressed CO2 levels. Given that almost all the plants involved are long lived (usually of the order of 100 yr or more), these two approaches mainly measure plastic responses within species and contain no information on adaptive responses (Royer, 2001). Other approaches incorporate some signal from adaptation by exploiting the altitudinal gradients in CO2 or by using well-dated subfossils that can be related to atmospheric CO2 recorded in ice-cores (Royer, 2001). Nearest living equivalent (NLE) proxies are calibrated using morphologically and putatively ecologically similar species (McElwain, 1998; Haworth et al., 2005; Bonis et al., 2010), and the fossils are scored for deviation in SI from the average in the calibration set.

The proxies are complicated because the calibrations reflect different aspects of atmospheric CO2. Data from ice-cores and herbarium specimens are related to changes in atmospheric CO2 concentration (CO2 as a proportion of the total composition, often expressed in ppm), whereas altitudinal clines reflect changes in pCO2, which is determined by absolute amounts of CO2 and temperature, with little change in concentration. This distinction is not important for most of the principles considered in this review because, for a given altitude, the two measures are equivalent. Where possible, the discussion is mostly expressed in terms of changes in pCO2. However, additional layers of complexity and uncertainty are added by the fact that proxies that estimate pCO2 need to be adjusted for altitude and some proxies that claim to estimate concentration may really be estimating pCO2.

1. Underlying traits and control of stomatal frequency

A relationship between stomatal density and atmospheric CO2 is expected because the main role of stomata is to regulate the permeability of leaves to gases (gas phase conductance), so that the leaves can absorb CO2 for photosynthesis without losing excessive water vapour. Maximum stomatal conductance is determined by the number and size of stomata on the leaf, and plants tend to match this parameter with the maximum demand for CO2 (Wong et al., 1979). Thus, to optimize their resource use and evade costs of having excess stomata, plants could be expected to adjust SDe according to pCO2 (Roth-Nebelsick, 2007). The costs of excess stomata are unclear, but could involve water loss through fully closed stomata, which appear to be more conductive than the general cuticle (e.g. Jordan & Brodribb, 2007), increased risk of fungal penetration (e.g. Manter et al., 2000) or some unknown developmental or maintenance costs. However, as an alternative or adjunct to changes in SDe, maximum conductance can be adjusted by altering stomatal dimensions (Maherali et al., 2002; Franks et al., 2009). The concept underlying the use of SI as a proxy for pCO2 is that plants may change SI as a means of modulating SDe.

2. Evolution, extinction, the Ginkgo paradox and NLE approaches

Although there are strong plastic (developmental) responses in both SDe and SI (Royer, 2001), there is clear evidence of genetic control of the pCO2 response. Thus, in Arabidopsis thaliana, the HIC gene codes for pCO2 responses in both SI and SDe (Gray et al., 2000), and a suite of other genes code for SI and SDe (Dong & Bergmann, 2010). Multiple genes code for pCO2 responses in SI and SDe in poplar and for SI and SDe in oak (Ferris et al., 2002; Gailing et al., 2008). In addition, responses vary markedly among species, not only in the absolute values of SI and SDe for a given pCO2, but also in how these parameters respond to changes in pCO2 (Fig. 2; see also Korner, 1988; Haworth et al., 2010). Some species show significant reverse trends (Atchison et al., 2000), and large differences can occur between closely related species (Fig. 3). Such variation among species suggests (contrary to assumptions) that the evolutionarily optimal relationship between stomatal frequency and environment is not constant, and that evolution has resulted in large differences in this optimum. Furthermore, there are large differences in stomatal function across major plant groups (Brodribb & McAdam, 2011). This makes single species’ proxies increasingly dubious across evolutionary time because such proxies assume that there has been no evolutionary change in the relationship. This is a particularly severe problem for proxies calibrated using experimentally induced responses, because these typically employ only a few individual plants. However, a component of evolutionary adaptation increases the genetic inference space of proxies calibrated using recent fossils (although they are still narrow because they are within species) (Royer, 2001).

Figure 2.

Variation among species-specific linear regressions of stomatal density (a) and stomatal index (b) against atmospheric CO2 concentration since the early industrial era, based on data from herbarium specimens (crosses, angiosperms; squares, gymnosperms; triangles, ferns). Each point represents the estimated value at 320 ppm CO2 (representing the approximate mid-value for the datasets) vs the standardized slope (slope/estimated value at 320 ppm CO2). Note the high variability in responses among species for both stomatal density and stomatal index, and the presence of relatively large numbers of positive slopes (contrary to the expected relationship). Although relationships between stomatal frequency and levels of atmospheric CO2 are generally nonlinear (Beerling & Royer, 2002), the relationships shown here were approximately linear within the sampled range. Sources: Peñuelas & Matamala (1990); Beerling & Chaloner (1993); Kürschner et al. (1996); He et al. (1998); Rundgren & Beerling (1999); Atchison et al. (2000); Royer et al. (2001); Greenwood et al. (2003); Kouwenberg et al. (2003, 2004); Wagner et al. (2005); Eide & Birks (2006); Miller-Rushing et al. (2009); Gagen et al. (2010); Haworth et al. (2010).

Figure 3.

The markedly different relationships of stomatal frequency with atmospheric CO2 concentration in Betula nana (crosses; modified from Finsinger & Wagner-Cremer, 2009) and B. pubescens (squares; modified from Eide & Birks, 2006). The two species co-occur, hybridize and are likely to be sister species (Jarvinen et al., 2004), but differ in ploidy level (B. pubescens is tetraploid, B. nana is diploid). (a) Stomatal density. (b) Stomatal index. Similar differences also occur within other genera, including Salix (McElwain et al., 1995; Rundgren & Beerling, 1999), Quercus (He et al., 1998) and Callitris (Haworth et al., 2010), without such differences in ploidy.

The problems resulting from proxies with narrow genetic inference spaces are well illustrated by considering the gymnosperm group, ginkgophytes. SI in this group has been a favoured stomatal proxy for pCO2 from c. 300 million yr ago to recent periods. Estimates have been made from extinct species from the Palaeogene, Mesozoic and even the late Palaeozoic (Retallack, 2001; Quan et al., 2009; Smith et al., 2010), and from recent fossils of the extant species. This proxy is calibrated using the only extant ginkgophyte, Ginkgo biloba, which shows a tight relationship between SI and pCO2 (Royer et al., 2001). However, this seemingly strong calibration may be a result of the narrow genetic and ecological range of Ginkgo biloba (Gong et al., 2008). The estimates using extinct ginkgophytes therefore include extreme extrapolation because the fossils came from a far more geographically widespread and presumably genetically diverse group of plants than G. biloba. Thus, fossil ginkgophytes are found in many parts of the world and are mostly considered to be from extinct species, or even genera (Taylor et al., 2009). In this light, there is every reason to suspect that the SI–pCO2 relationships for extinct ginkgophytes may have differed from that of the extant species. Attempts to focus on fossils that are morphologically similar to G. biloba (e.g. Royer et al., 2001; Bonis et al., 2010; Smith et al., 2010), may reduce, but will certainly not eliminate, the potential for the fossil taxa to show different stomata–pCO2 relationships from the extant species. Thus, palaeoproxies derived from G. biloba appear to be highly vulnerable to the effects of evolution and extinction.

This problem is called the ‘Ginkgo paradox’. Ginkgophytes have been favoured as a proxy because of the rich and ancient fossil record of the group (Taylor et al., 2009) and the strong extant relationship in the calibration set (Royer et al., 2001). However, one likely reason for the strength of calibration (the limited genetic range in G. biloba) is also a reason to expect high uncertainty in pCO2 reconstructions based on fossil ginkgophytes. The significance of this problem should increase with the age of the fossils, through the accumulating effects of extinction and evolution. Thus, when genetic variation in the leaf–environment relationship forces one to restrict the proxy to a given taxon, the taxon that provides the best model in the modern world may be the least appropriate choice, especially for estimates into deep time. Parallel logic can be applied to other models (such as geographically local models) that gain precision by using a narrow inference space.

Because they employ multiple species, NLE models (McElwain, 1998) have the potential to exhibit wider genetic inference spaces than single species’ proxies. However, applications of the NLE approach can have limitations that illustrate the need to critically appraise the nature of the inference space. Thus, Haworth et al. (2005) estimated pCO2 from fossils of an extinct family of Mesozoic conifers using a NLE model calibrated using three members of the extant conifer family, Cupressaceae, and one angiosperm. The angiosperm (Salicornia virginica) should be considered as irrelevant because it is a succulent species of saline, semi-aquatic environments (Kubitzki et al., 1993), which is very likely to be functionally different from a Mesozoic conifer. The calibration therefore depends on the extinct conifer showing the same responses to pCO2 as a morphologically similar, but phylogenetically distinct, group of extant conifers. It does so by assuming that similarity in form and inferred ecology imply similarity in function (McElwain, 1998). However, this assumption is contradicted by the work of Haworth et al. (2010), which showed that the SI–pCO2 relationship of one of the species used in calibration, Callitris rhomboidea, is qualitatively different from that of Callitris oblonga, which is very similar in morphology (Offler, 1984) and occurs in comparable and geographically overlapping habitats (Hill, 1998).

3. Causes of variation among species and within plants

SI and SDe can be affected by irradiance level, atmospheric moisture, local water availability, nutritional status and leaf economic strategy (Beerling et al., 1992; Atchison et al., 2000; Hovenden & Vander Schoor, 2004, 2006; Lake & Woodward, 2008; Sekiya & Yano, 2008; Casson & Hetherington, 2010). Changes in any of these factors can therefore result in errors in stomata-based individual estimates of past pCO2. Large systematic differences in the stomatal frequency–pCO2 relationship may reflect the evolution of major ecological differences among species (notably the evolution of greater or lesser tolerance to shade, drought or poor soils). However, given the evidence for evolutionary niche conservatism (Losos, 2008), it is reasonable to expect that the development of such large biases will tend to be relatively slow. Furthermore, Beerling (1999) argued that SI was less biased than SDe by external factors.

Stomatal frequency–pCO2 relationships are affected by modification of the dimensions of stomatal apertures. Although such changes could result in large biases in the estimates of pCO2 from stomatal frequency, stomatal aperture dimensions can often be detected from the fossil preparations used to count stomata. As a result, it may be possible to identify some of the biases, and even use them in the development of improved proxies (see Section V.3). However, large differences in the stomatal frequency–pCO2 relationship have been observed between closely related species without comparable differences in stomatal aperture dimensions. For example, two sister species, Betula nana and B. pubescens, have similar stomatal pore lengths (Wagner et al., 2000), but show large differences in the relationships of both SI and SDe to pCO2 (Fig. 3). Whether such discrepancies are widespread is unknown because of a paucity of comparative studies of closely related species, but this discrepancy should create doubt about the evolutionary stability of the relationship between stomatal frequency–pore size and pCO2. This area is clearly ripe for investigation.

4. Environmental inference spaces and the problem of curvilinear relationships

The stomatal proxies for pCO2 are often adversely affected by having narrow environmental inference spaces. Calibrations from herbarium specimens fall within the range from early industrial to recent levels. Calibrations using experimental induction increase this span somewhat. However, there are virtually no data exploring relationships at very high pCO2, with upper values in experimentally induced responses mostly of the order of approximately twice contemporary values. Estimates for periods in which pCO2 was possibly higher than this range (e.g. much of the Palaeogene, Palaeozoic and Mesozoic) are extrapolations. Although the relationship for many species is approximately linear within the range of recent change in pCO2, it becomes increasingly nonlinear at higher levels (Beerling et al., 2009). At high pCO2, it is reasonable to expect that the stomatal frequency–pCO2 relationship will have a positive asymptote because of the need for a minimum number of stomata to maintain a transpiration stream for nutrient supply or to cool the leaf (Upchurch & Mahan, 1988). As noted by Royer (2001), the resulting flat relationship means that currently employed relationships not only have high uncertainties at high pCO2, but can fail to detect such levels – fossils from high-pCO2 environments may generate stomatal frequencies that estimate much lower pCO2 (Fig 4). Beerling et al. (2009) attempted to address this problem using an empirical curve, but, because their relationship curved downwards at the highest calibration point, it also lacked power to estimate high pCO2. Such problems can be significant because the stomatal-based proxies sometimes conflict with estimates of much higher pCO2 from other proxies (Beerling et al., 2009). These issues with curvilinearity also affect proxies employing NLE approaches (McElwain, 1998).

Figure 4.

A comparison between ‘ true’ and fitted curves for stomatal index vs atmospheric CO2 concentration, showing how the extrapolation of flattening curves can lead to misleadingly low estimates. Note that the estimate of c. 510 ppm CO2 is based on the hypothetical observation from 2000 ppm CO2. The estimated response curve follows the formula for stomatal index in Ginkgo biloba developed by Royer et al. (2001). The ‘true’ response is hypothetical, but approximates Beerling et al.’s (2009) empirical fit for observations up to 500 ppm CO2 and, at higher pCO2, follows an asymptotic curve parallel to the Royer et al. (2001) curve.

The stomatal frequency response at pCO2 below the calibration range (e.g. during glacials) is also poorly known. There are very few experimental investigations of below ambient pCO2 to provide empirical relationships (Hovenden & Schimanski, 2000; Gerhart & Ward, 2010), and the upper values for SI and SDe must presumably be constrained by the available space on the leaf lamina.

5. Other biases

Diagenetic shrinkage of cuticles (see Section III.3) will affect SDe, but will have little impact on SI. A few studies have investigated possible taphonomic effects on SDe and SI (Uhl & Kerp, 2005). Biases towards robust and possibly small leaves may be important.

6. Overall uncertainties

Considerable uncertainty in the stomatal-based proxies relates to evolutionary adaptation and extinction, as shown by their failure to pass the cross-validation test of comparison across species. However, the impacts of evolution and extinction on the proxies are difficult to quantify by the observation of the modern or recent world. Extrapolation to high levels of pCO2, where the stomatal frequency–pCO2 relationship flattens, creates very high uncertainty for any estimates for pre-Neogene periods. Although SI shows better empirical relationships than SDe to pCO2, SI-based proxies are exposed to the same biases as SDe-based proxies.

These uncertainties appear to be large for all periods, except for the Holocene, and become much greater for more ancient periods. The extreme examples of this are the Mesozoic and Palaeozoic estimates from fossils of extinct lineages (Retallack, 2001; Haworth et al., 2005; Bonis et al., 2010).

V. Steps forward

Research towards the development of improved proxies should focus on the minimization of the risks involved in extrapolating into the past. I would like to highlight some prospective means of achieving these goals. Many of these involve the incorporation of more information into the models or better mechanistic bases (especially those in which the response has a clear biomechanical or biophysical basis), as well as employing other ways of minimizing phylogenetic/evolutionary impacts on the proxies.

1. Multiple proxy and cross-validation approaches

The use of multiple proxies provides considerable scope and has been applied to the estimation of both past climates (e.g. Yang et al., 2007) and levels of atmospheric CO2 (e.g. Roth-Nebelsick et al., 2004). A logically comparable approach to using multiple proxies is to cross-validate against existing estimates (e.g. Chen et al., 2001 for pCO2), although it is worth noting that estimates from different proxies are contradictory for many periods (Royer et al., 2001). Furthermore, successful cross-validation does not provide evidence that the methods are valid universally, just that the methods have worked under those conditions.

The components of individual proxies act in series, so that, if one step fails, the whole proxy comes into question. By contrast, multiple proxies work in parallel, so that congruent results from multiple proxies increase the chance of a valid answer. To understand the cumulative value of multiple proxies and cross-validation, one can consider the multiplicative way in which uncertainties combine (Sokal & Rohlf, 1995). As a hypothetical example, let us consider two convergent, but independent, proxies which have a 50% (i.e. P = 0.5) chance that the true value of the target parameter falls within a certain range. When these proxies are used together, the probability that the true value will fall in this range only improves to 75% (P = 1 − 0.52). However, the probabilities rapidly become more favourable if the individual proxies have higher certainties (e.g. two 80% values combine to give 96%; i.e. P = 1 − 0.22), or if there are a greater number of convergent proxies. Although it is rarely possible to make such precise calculations, these examples demonstrate that the combination of two poor proxies will lead to a relatively poor combined result.

Multiple proxy approaches also provide an objective means of identifying erroneous estimates, through the presence of contrasting results from different proxies. However, the presence of such anomalies leads to the problem of choosing among alternatives. Workers must avoid favouring the proxy with which they are most familiar. In addition, multiple proxy approaches can generate misleading inferences if several proxies are biased in the same way. For example, natural selection may affect different proxies in similar ways.

An important source of proxies to supplement foliar physiognomic proxies is the widely used taxonomic approach, which includes indicator species and coexistence methods (Mosbrugger & Utescher, 1997). Typically, the taxonomic approach assumes that the relevant palaeoclimate fell within the observed bioclimatic range of the living relatives of the fossil, or at least that deviations from this rule can be identified (Wolfe, 1995). This approach shows limitations that are similar to those affecting foliar physiognomic approaches – the observed and potential ranges of living species may differ because the species’ range has been altered by changes in the abiotic or biotic environment, or by evolution or selective extinction (Jordan, 1997a). It also depends on the accurate identification of the fossils and nearest living relatives. However, the presence of many different taxa in many fossil assemblages means that this approach contains a large amount of potentially useful palaeoclimatic information, because each taxon identified provides a separate source of evidence (Mosbrugger & Utescher, 1997).

Multi-proxy approaches to the estimation of past levels of CO2 can employ nonstomatal proxies (e.g. Pagani et al., 1999, 2005; Hönisch et al., 2009), as well as stomatal frequency from a range of taxa, although attempts to date have involved a small range of species (e.g. Royer et al., 2001; Roth-Nebelsick et al., 2004).

2. Resolving the problem of regional differences in leaf–climate relationships using multivariate approaches

There may be leaf characters that can be incorporated into multivariate analyses to compensate for regional biases and therefore reduce the adverse effects of variation in leaf–climate relationships. One such opportunity is based on the concept that large differences in the degree of scleromorphy and the incidence of deciduousness may be major contributors to the regional differences in leaf–climate relationships (as discussed in Section III.2). Leaf mass per unit area, a widely used measure reflecting scleromorphy and leaf lifespan (Wright et al., 2004b), can be predicted from the area and petiole width of leaves (Royer et al., 2007). This model has a very high power to predict average leaf mass per unit area at the site level (r2 > 0.9) (Royer et al., 2007), and can differentiate between deciduous and evergreen species to the same level (G. J. Jordan, unpublished). Furthermore, it has a clear biomechanical component and appears to apply globally (Royer et al., 2007). The incorporation of petiole characters in multivariate models thus has the potential to improve leaf physiognomic climate models. Although this approach cannot eliminate the potential for evolutionary history confounding the models entirely, it does increase the chance that biogeographical noise will be over-ridden by the signal from convergent evolution.

The limitations with multivariate analyses mentioned in Section III.3 can be overcome by alternative methods of analysis, especially when combined with the redefinition of characters. Stranks & England (1997) proposed a more robust methodology than any currently employed. They used resemblance functions, in which palaeoclimatic estimates are based on the multivariate similarity of the fossil assemblage to samples in the calibration set. The use of resemblance functions has several advantages over current approaches. It considers all physiognomic data in an unbiased way and does not assume linearity in leaf–climate relationships. It also facilitates operation at the individual taxon level (i.e. making separate estimates for each species in a flora, and then integrating these data to give overall estimates). This simplifies methods of phylogenetic adjustment, makes the identification and/or exclusion of anomalous taxa possible and facilitates the honest estimation of uncertainty. Resemblance functions have not been taken up by the physiognomic community, perhaps because Stranks & England’s (1997) implementation employed correspondence analysis of CLAMP-type data and generated larger standard errors than those claimed for the approach of Wolfe (1995). However, Jordan (1997b) showed that the latter greatly exaggerated the accuracy of the predictions. Furthermore, there is no need to use correspondence analysis. The availability of continuous leaf characters that show monotonic relationships to environment (Peppe et al., 2011) would allow the use of more powerful and robust similarity measures (such as the Euclidean distance of suitably transformed variables). An alternative approach is to use physiognomic evidence to choose among alternative models. Thus, Teodoridis et al. (2011) proposed a physiognomic rule for choosing between two alternative models in CLAMP. However, this approach is less general than the resemblance function approach and does not overcome the intrinsically local nature of the CLAMP data in its current manifestation.

3. Leaf veins and stomatal modelling

The realization that changes in stomatal aperture dimensions can bias stomatal frequency proxies has led to research into improved proxies (Wynn, 2003). Many aspects of stomatal form can be measured on the cuticle preparations used to determine SI and SDe (the main exception is stomatal depth), which raises the possibility of developing proxies based on the modelling of stomatal responses to CO2. Wynn (2003) proposed a theoretical framework relating stomatal frequency to CO2, Roth-Nebelsick (2007) modeled gas exchange through stomata and Konrad et al. (2008) developed a mechanistic model of the response of stomata to CO2 incorporating stomatal dimensions. Until now, these models have mainly been used to assess the responses of vegetation to changes in atmospheric CO2 concentration (De Boer et al., 2011; Lammertsma et al., 2011). However, it may be possible to elaborate Konrad et al.’s (2008) model into a mechanistic predictive model that can be used to estimate past pCO2.

In addition, an increasing body of evidence shows that the leaf vein density (length of veins per unit lamina area) is closely linked to SDe through co-ordination of the capacity to supply water to the leaves (veins) with the maximum demand for water (stomata) (Brodribb et al., 2007). However, vein density is much less subject to some of the factors that limit the value of SDe as an environmental palaeoproxy. In particular, vein density is much more stable than SDe across genotypes and species, and therefore should be much less vulnerable than SDe to evolutionary changes (Sack & Frole, 2006; Noblin et al., 2008; Boyce et al., 2009). Vein density can also be measured on well-preserved impression fossils, on which it is usually impossible to count stomata. The potential for use as a proxy for levels of atmospheric CO2 is further supported by optimization modelling predicting the effect of CO2 on vein density (Brodribb & Feild, 2010). However, like stomatal characters, vein density responds to a range of parameters that affect assimilation, including light environment (Uhl & Mosbrugger, 1999), and more work is needed before either vein density or stomatal modelling can be used as an effective proxy either in isolation or in combination.

VI. Synthesis

This discussion of leaf-based proxies suggests that, before using a proxy, one should consider a series of inter-related questions (listed in Table 1) on the relationship between the leaves and the target parameter. These questions relate to the genetic and environmental inference spaces of the proxy, the assumptions necessary to make estimates, the likelihood of these assumptions being satisfied, and the consequences of their violation. Stomatal frequency and foliar physiognomic proxies illustrate some key points. The stomatal frequency proxies are mostly based on relatively direct responses to pCO2, but their inference spaces are strongly constrained genetically, so that the proxies are highly susceptible to adaptation and other evolutionary changes. By contrast, the foliar physiognomic proxies may be more buffered against evolutionary changes because they are based on considerably wider genotypic ranges. However, they represent much less direct responses to the relevant environmental parameters, and are therefore strongly empirical. Furthermore, because foliar physiognomic proxies largely depend on community assembly processes, they are vulnerable to phylogenetic effects resulting from assembly. In each case, the genetically and/or environmentally local nature of the proxies makes them vulnerable to environmental and/or evolutionary changes to the leaf–environment relationship. The Ginkgo paradox illustrates how proxies with narrow inference spaces can show lower statistical uncertainties than proxies with wider inference spaces, but can have high true uncertainties because of greater extrapolation.

Table 1.   Key questions that should be considered in developing and using biological proxies, based on the discussion and examples in Sections III and IV
Which aspect of the environment is the plant responding to directly? Which plant trait responds directly to this environmental characteristic?Understanding the mechanistic basis of the proxy may help answer these questions, which may aid in the determination of the likelihood of biases from changing correlations among traits
What are the environmental and genetic ranges of the calibration data? How much environmental and/or genetic extrapolation is involved in using the proxy?Even if the values of the target parameter from the time and place of fossilization fall within the range of the calibration set, other aspects of the environment may have differed from modern conditions
Does the relationship between the leaf trait and the target parameter vary with other aspects of the environment?Combined with environmental extrapolation, this can result in changes to the leaf–environment relationship, thus biasing the proxy
How much is the relationship between the leaf trait and the target parameter affected by genetic variation?This helps determine the likelihood that the proxy will be affected by extinction, evolution and (for some proxies) immigration. It also encompasses phylogenetic effects
Are there methodological problems with data collection or analysis?Consider this on a case-by-case basis, including problems with extrapolation of potentially nonlinear relationships
What taphonomic, diagenetic and collecting biases can affect the data?These need to be considered on a case-by-case basis

The nature of the uncertainties discussed here means that leaf-based proxies tend to become progressively less reliable as a fossil becomes more ancient (e.g. Uhl, 2006). This is because extinction and biological and landscape evolution have irreversibly altered the biotic and physical environment. The uniformitarian assumption implicit in the application of proxies that ignore extinction and evolution (e.g. geographically local models and single species’ models) is that patterns were the same in the past, whereas the uniformitarian principle is most sound when assuming that the underlying processes have remained unchanged (Gould, 1965). This means that future work should concentrate on proxies with stronger links to biomechanical or biochemical processes, although the stomatal proxies show that even proxies underpinned by a clear mechanism connecting leaves to the target parameter need to be appraised carefully.

Although the magnitude of the increase in uncertainty with time is highly uncertain and idiosyncratic, it may be possible to provide some guidelines to assess whether these processes are likely to have had major impacts on specific proxies. Rules of thumb suggesting robustness in proxies based on plastic responses within single species are as follows: the species involved is the same extant species used in calibrating the proxy; closely related species show similar leaf–environment relationships; and the environmental range of the calibration set is sufficiently broad to be able to assume that the fossil came from an analogous environment. For proxies based on average scores for multiple species, the key test is that the proxy is consistent across different environments and community compositions. Failing this, the proxy is more likely to be valid if there is evidence of similar environment and floristic compositions at the times and places of calibration and fossilization.

The proxies considered here are quantitative. It can be argued that qualitative proxies may be more robust – in that they have the lower aspirations of showing trends and broad patterns rather than providing numerical estimates. However, nothing precludes the use of quantitative methods in a qualitative way – one may be sceptical of the numerical estimate provided, but may accept the presence of a qualitative difference.

All the major issues discussed here apply widely to biologically based palaeoproxies. However, the vulnerability of each proxy needs to be considered separately in the logical framework set out in this review. There appears to be potential to develop more robust proxies in each case by incorporating more information into the proxies. Furthermore, although the uncertainties of individual proxies are often great, congruent estimates from multiple proxies can be useful.


I thank Matt McGlone, Mark Hovenden, Tim Brodribb, Kale Sniderman, Ray Carpenter and three anonymous reviewers for constructive comments on this review.