•A wide range of factors (developmental, physiological, ecological) with unpredictable interactions control variation in leaf form. Here, we examined the distribution of leaf morphologies (simple and complex forms) across angiosperms in a phylogenetic context to detect patterns in the directions of changes in leaf shape.
•Seven datasets (diverse angiosperms and six nested clades, Sapindales, Apiales, Papaveraceae, Fabaceae, Lepidium, Solanum) were analysed using maximum likelihood and parsimony methods to estimate asymmetries in rates of change among character states.
•Simple leaves are most frequent among angiosperm lineages today, were inferred to be ancestral in angiosperms and tended to be retained in evolution (stasis). Complex leaves slowly originated (‘gains’) and quickly reverted to simple leaves (‘losses’) multiple times, with a significantly greater rate of losses than gains. Lobed leaves may be a labile intermediate step between different forms. The nested clades showed mixed trends; Solanum, like the angiosperms in general, had higher rates of losses than gains, but the other clades had higher rates of gains than losses.
•The angiosperm-wide pattern could be taken as a null model to test leaf evolution patterns in particular clades, in which patterns of variation suggest clade-specific processes that have yet to be investigated fully.
Leaf form in angiosperms varies from simple (unlobed and lobed) to complex (deeply divided or dissected, and compound). This variation is controlled by developmental and functional (physiological and ecological) factors. Unlike simple unlobed leaves, lobed and complex leaves undergo differential marginal growth, which is governed by the activity of KNOX proteins (Hagemann & Gleissberg, 1996; Efroni et al., 2010). The repeated evolution of complex leaves appears to be accompanied by the turning on of KNOX expression in leaf primordia (Bharathan et al., 2002; Hay & Tsiantis, 2006; Piazza et al., 2010). Lobed and complex leaves may possess physiological properties (e.g. increased boundary layer conductance preventing temperature extremes) that are beneficial under specific environments (e.g. under warm, dry conditions). However, there may be multiple optimal ‘solutions’ for a single set of environmental conditions in a given location, and the particular leaf forms present will depend on local ecological history (e.g. Givnish, 1987; Gurevitch & Schuepp, 1990; Niklas, 1994; Marks & Lechowicz, 2006). These studies suggest that complex leaves evolve readily and, without functional factors to favour particular leaf forms, we do not expect biases in the evolution of leaf form. However, a previous phylogenetic analysis of leaf shape variation in angiosperms showed that complex leaves arise from simple leaves (‘gains’) more often than the reverse (‘losses’) (Bharathan et al., 2002). This study did not investigate whether the numbers of gains and losses translated into higher rates of one type of change over the other, introducing a bias towards the origin of complex leaves in angiosperms.
Here, we examine this hypothesis of unequal rates of change in leaf form, specifically that the rate of gains of complex leaves is greater than the rate of losses, assuming no developmental or physiological/ecological biases. We analyse the phylogenetic patterns of leaf shape variation employing several coding schemes and maximum likelihood (ML) methods, in addition to the limited ‘simple’ and ‘complex’ coding and parsimony methods used by Bharathan et al. (2002). ML methods have the advantage that they take into account branch lengths and opportunities for change when estimating relative rates of change (Sanderson, 1993). We analysed seven datasets: one broadly sampling all major angiosperm lineages and six focusing on different nested clades of the angiosperms.
Materials and Methods
Seven datasets were analysed (Fig. 1; Supporting Information Figs S1–S8): representatives of a broad sampling of angiosperms (560 taxa from Soltis et al., 2000), and more focused sampling of six angiosperm clades: Sapindales (Gadek et al., 1996), Apiales (Downie et al., 1998), Papaveraceae (Hoot et al., 1997; Gleissberg & Kadereit, 1999), Fabaceae (Wojciechowski et al., 2004), Lepidium (Brassicaceae: Mummenhoff et al., 2001, 2004) and Solanum (Solanaceae: Bohs, 2005). Angiosperm-wide analyses were conducted both including and excluding Ceratophyllum (controversially placed, e.g. Doyle, 2007); the results were similar, but only the latter analyses are presented. In addition, phylogenetically well-supported subsamples of the full angiosperm dataset (‘560 taxon’) were analysed: eudicots (‘409 taxon’), rosids including Saxifragales (‘195 taxon’) and asterids (‘147 taxon’). The study groups were chosen because they show variation in leaf form; that is, taxa were sampled based on the availability of sequence data and, therefore, were assumed to be random with respect to leaf form. All analyses included appropriate outgroups based on previously published phylogenies.
Leaf forms of taxa within each dataset were scored from the literature, internet sources and direct observations (Table S1). The characterization of leaf form is complicated (Hickey, 1973; Ellis et al., 2009). Complex leaves are variably defined; they could be considered to be the opposite end of a continuum from simple unlobed leaves (e.g. Kaplan, 1975; Dengler et al., 1982; Kaplan et al., 1982; Lacroix & Sattler, 1994; Sattler & Rutishauser, 1997). The classical morphological view is that the compound leaf with leaflets articulated to a rachis is distinct from the unlobed, lobed or dissected leaf (e.g. Eames, 1961; Esau, 1965). Developmental morphology suggests that simple lobed leaves are more similar to complex leaves because lateral blastozones (absent in simple entire leaves) initiate both lobes and leaflets (Hagemann & Gleissberg, 1996). Molecular developmental data suggest that dissected and compound leaves are similar in being initiated from complex primordia (Bharathan et al., 2002). Physiological data distinguish between simple unlobed and other leaves as a result of the distinct functional correlates of these forms (e.g. Vogel, 1968; Nobel, 1983; Gurevitch & Schuepp, 1990). Here, coding schemes capturing disparate perspectives were adopted. Four mature leaf forms were recognized: simple and unlobed (‘u’), simple and lobed to less than one-third of the way to the midrib (‘l’), simple and lobed but dissected to greater than one-third to nearly all the way to the midrib (‘d’), and compound, with leaflets articulated to a rachis (‘c’). Serrated/toothed margins were not distinguished. Four different coding schemes (henceforth ‘coding’) using variously defined character states were devised to categorize the four forms. Each coding is listed below with references and factors that could be considered to justify that scheme. Polymorphic coding was used in the case of variation within genera and species (heteroblasty). Three additional schemes analysed for completeness yielded consistent results (Notes S1).
For the 560 taxon dataset, the published strict consensus tree was used to generate 99 trees through random resolutions of polytomies using MacClade (Maddison & Maddison, 2005). This dataset was too large for Bayesian analyses using the computing capacity available to us at the time. Branch lengths were estimated by ML on the three-gene dataset of Soltis et al. (2000) using PAUP* (Swofford, 1998).
ML estimates of ancestral states were obtained for all seven datasets and subsamples of the angiosperm dataset (409 taxon, 195 taxon, 147 taxon) using BayesMultistate (Pagel, 1994, 1997; Pagel et al., 2004). The algorithm treats polymorphic states as equally probable in likelihood calculations.
Maximum parsimony (MP) estimates of ‘change’ (transition from one character state to another) and ‘stasis’ (no transition) were obtained across all 99 trees for the 560 taxon dataset. These calculations were performed using the ‘Chart’ function in MacClade 4.08 (Maddison & Maddison, 2005). The algorithm optimizes ancestral states across the tree and determines, for each node, whether its descendants have different states (‘change’) or not (‘stasis’). This provides estimates of the instances of change and stasis between pairs of states (ancestral and descendant). Because alternative optimizations are possible, we averaged the maximum and minimum numbers of estimated changes across all trees. We used these averages to estimate the proportion of changes as a fraction of the total number of nodes (change + stasis). These proportions of gains and losses were taken to represent approximate MP estimates of transition rates, whose inequality is indicated by their ratio. Ancestral reconstructions on individual trees were examined under two modes of resolution of equivocal reconstructions: delayed (DELTRAN) and accelerated (ACCTRAN) changes on the tree.
Transition rates (henceforth ‘rates’) between character states (e.g. 0, 1) were modelled by ML under a continuous time Markov model implemented in BayesMultistate. For all codings, unrestricted models (i.e. allowing unequal rates for all possible evolutionary transitions) were tested against restricted models (rates restricted –Tables 1). For two-state coding, the hypotheses of equal rates of gains and losses of complex leaves (codings A and B) or compound leaves (coding C) were tested. Likelihoods were compared under the unrestricted model (Model 1) of unequal rates of gains and losses (two rate parameters: gain, q01; loss, q10) and the restricted model (Model 2) of equal rates (one rate parameter, q01 = q10). The significance of differences between likelihoods was assessed using the likelihood ratio test, where the statistic G = – 2 × (LogLmodel1 – LogLmodel2) follows a chi-squared distribution with degrees of freedom equal to unity, which is the difference in the number of free parameters between models (e.g. Sanderson, 1993). For the four-state coding scheme (D), selected restricted models were tested on the 560 taxon angiosperm dataset (Table 5). These tests allowed us to determine whether particular rates were equal (e.g. rate1→0/rate0→1 = 1) or unequal (e.g. rate1→0/rate0→1 ≠ 1); examination of the transition rate ratios (henceforth ‘rate ratios’) allowed us to determine the direction of inequality.
Table 1. Summary of unrestricted models (separate rate estimates for all transitions) and restricted models (fewer rate estimates)
Two-state coding schemes A, B, C
Four-state coding scheme D
Likelihoods of models were compared using likelihood ratio tests (results in Tables 4, 5).
Restricted: rates between paired states equal; tested singly
q01 = q10 q12 = q21 q20 = q02 q03 = q30 tests 4; rate parameters 11 each test
Randomization and simulation tests
We analysed two potential sources of systematic bias in our estimates of ML rate ratios using simulations of the 560 taxon dataset: relative frequencies of character states and reported inconsistency of ML optimizations.
Effect of relative frequencies of character states We investigated whether the relative frequency of character states (independent of the identity of the species and its phylogenetic position) determined rate ratios by shuffling character states with respect to the terminals of the phylogenies. If relative frequencies of character states alone determined rate ratio estimates, the observed and shuffled rate ratios would be statistically similar and strongly correlated. Conversely, if both relative frequencies of character states and phylogeny determined rate ratio estimates, the observed and shuffled rate ratios would be statistically different and weakly correlated. If the latter, we would conclude that the effect of phylogeny is more important than the relative frequency of character states. First, we estimated the asymmetrical rates of change in character state and resulting rate ratios for codings A and B using one binary resolution of polymorphic taxa across each of the 99 angiosperm trees, employing the ace function of the Ape package (Paradis et al., 2004) in R (R Development Core Team, 2010). Second, we shuffled the character states of the terminals 100 times with respect to the phylogenies. Third, we estimated the rate ratios for the shuffled datasets across the 99 trees. Finally, we compared the distributions of estimated observed (i.e. nonshuffled) and shuffled rate ratios, and measured the correlation between them.
Effect of polymorphisms Differences in the resolution of polymorphic taxa into binary states might affect the rate ratio estimates. To investigate this issue, we compared the distributions of rate ratios estimated from a random sample of possible binary resolutions of the polymorphic states in a taxon with those from single binary resolutions. To obtain the distributions, we estimated the rates of change in character states sampled from 1000 random binary resolutions for codings A and B on 99 original trees using the ace function. These analyses yielded a distribution of rate ratios that summarized the range of results possible from the billions of binary resolutions of the polymorphic taxa. We compared the rate ratio distributions obtained from single and multiple resolutions of polymorphic characters with each other and with a null distribution of rate ratios (based on equal rates of change in both directions, described in the following section, ‘Inconsistency of ML estimates’) to determine the impact of alternative resolutions of polymorphisms.
Inconsistency of ML estimates If ML estimates spuriously converge on very different rates of change in either direction, the ML rate ratios obtained would not approximate the actual rate ratio, and the actual rate ratio might not differ significantly from a value of unity (i.e. equal rates). We used simulations to: compare the distributions of the rate ratio parameter from observed and simulated data; and generate a null distribution of rate ratios modelled using equal rates of change in both directions. We followed a three-step protocol to generate and analyse these simulations. First, a sample of 100 two-rate matrices of change in character states obtained from analyses of the observed data (codings A, B) was used to parameterize simulations employing the sim.char command of the Geiger R package (Harmon et al., 2008). Second, a sample of 100 equal-rate matrices was generated based on rates drawn from a normal random distribution (mean = 5, SD = 1) and used as input to simulate the null distribution on rate ratios. The root was assigned state 0 with a probability of 0.7–0.8 (estimated from the data) in the simulations. Third, three distributions of rate ratios were calculated (in the original 99 trees), plotted and compared: observed, simulated (data simulated using estimated rate ratios as input) and null (data simulated using a rate ratio of unity as input).
We anticipated three alternative results. First, if ML estimates of asymmetrical rates of change were unbiased and consistent, the observed and simulated rate ratios should not differ significantly and should be strongly correlated, and both observed and simulated rate ratios should differ significantly from a null distribution of equal rates (rate ratio = 1). Second, if ML estimates were systematically biased, but reflected underlying asymmetrical rates of change, the observed and simulated rate ratios should differ and be weakly correlated, and both should differ from the null distribution. Finally, if ML estimates were systematically biased and unrelated to the underlying rates, the observed and simulated rate ratios should differ and be uncorrelated, and the former should not differ from the null distribution. In the last scenario, ML estimates of rate ratios would be taken as misleading, because significant asymmetrical results could originate from data generated under a model of equal rates of change.
Table 2. Maximum likelihood (ML) reconstructions of ancestral states of selected nodes in trees from the 560 taxon dataset and subsets (409 taxon, under different coding schemes (A–D); numbers in parentheses indicate average probabilities of alternative ancestral states
MP analyses yielded unambiguous reconstructions of ancestral states in most cases. The leaf was simple in selected ancestors (angiosperm, eudicot, rosid, asterid) under codings B–D. The state in the ancestral rosid was uncertain in some of the trees under coding A. Reconstructions under ML were uncertain for all but the asterid ancestor. Under two-state coding, the reconstructions of the angiosperm, eudicot and rosid ancestors were uncertain (P = 0.588–0.907), whereas the asterid ancestor consistently had the classic simple leaf (P = 0.99). Under four-state coding, combined probabilities were used to infer ancestral states. The ancestral angiosperm leaf most likely was simple and unlobed or lobed (coding A, P = 0.803 and coding B, P = 0.881), or unlobed, lobed or dissected (coding C, P = 0.831). Simple unlobed or lobed leaves were likely to have characterized the ancestral angiosperm under coding D (P = 0.912). The combination of results under different codings suggests that the ancestral angiosperm had simple unlobed or lobed leaves (Fig. 2).
The eudicot and rosid ancestors within the 560 taxon tree could have had any of the leaf forms, as inferred from low support values for most of the reconstructions. By contrast, the asterid ancestor was unequivocally simple under all two-state codings and was highly likely to have been simple (unlobed/lobed) under scheme D (P = 0.998), and unlikely to have had compound leaves. Starting with simple leaves in the ancestral angiosperm, there may have been a change to lobed/complex leaves in the ancestral eudicot, and reversals back to the unlobed state in the ancestral asterid and, depending on the analysis, in the ancestral rosid (Fig. 2).
Stasis and change (MP) among leaf forms (Tables 3)
Table 3. Average number of instances of stasis (no state change between ancestral and descendant nodes) or change under different coding schemes across the angiosperms (560 dataset)
Estimated numbers of stasis (0→0, 1→1, 2→2, 3→3) and change calculated using ‘Chart’ function, MacClade v4.08: minimum and maximum number of unambiguous events of stasis and change among leaf forms reconstructed under maximum parsimony (MP); averaged across 99 trees. Coding schemes as in Table 2.
MP reconstructions revealed that stasis far exceeded change in the case of simple leaves, however defined (on average 973 : 62). Stasis was also higher than change in compound (84 : 10), dissected (9 : 1) and lobed (6 : 3) leaves (coding D). The number of changes from simple states (36 gains) exceeded reversals (six losses); however, the ratio of losses to gains ranged from 1.1 to 2.4. In other words, there were proportionately more losses than gains of complex leaves across the tree. Although the evolutionary change from simple to complex leaf development occurred several times during the phylogenetic history of angiosperms (e.g. the ancestral Sapindales going from simple unlobed to compound), many other nodes appear to have retained the ancestral simple form (e.g. the ancestral Malphigiales), whereas others have reverted to the simple state (e.g. Cneorum).
Table 4. Variation in leaf form (columns 1–8: coding scheme A) and ratios of maximum likelihood (ML) estimates of instantaneous rates of evolution in leaf form (losses, q10; gains, q01; coding schemes A–C) with results of likelihood ratio tests
Significant differences in rates *, P < 0.05. Marginally significant differences **, P < 0.10 under likelihood ratio test.
Where dissected and compound leaves are mutually exclusive, fewer coding schemes are applied: two (A and B or A and C, e.g. Lepidium, Papaveraceae, Apiales) or one (C for Fabaceae) (no lobed leaves—Bauhinia, the only exception, was coded as simple).
aRate parameters estimated under unrestricted models in the 560 taxon dataset (highest estimate within each coding scheme in bold, all inferred to be significantly higher than rate parameters of other changes): (A) q01 = 6.53, q10 = 29.76; (B) q01 = 3.89, q10 = 27.74; (C) q01 = 3.96, q10 = 37.72.
Table 5. Inequality of rates of evolution in leaf form inferred from likelihood ratio (LR) tests, conducted for the angiosperm-wide (560 taxon) dataset under coding D
The first column presents the coding scheme being tested and the following three columns present inferences from LR tests. The likelihoods of the data under models that restrict change in particular ways (columns I–III, see footnotes) are compared with the likelihood under the unrestricted model. Rejection of restricted models at α = 0.01, combined with estimated rate parameters*, leads to the inferences presented.
Restricted models tested: (I) all rates equal: q01 = q10 = q12 = q21 = q23 = q32 = q02 = q21 = q03 = q30 = q13 = q31; (II) rates between paired states equal: q01 = q10, q12 = q21, q20 = q02 and q03 = q30, tested together, one test, nine rate parameters; (III) rates between paired states equal, tested separately, four tests, 11 parameters each.
Four-state (D) 0 = u unlobed 1 = l lobed 2 = d dissected 3 = c compound
All rates not equal
Rates from unlobed not equal to reversals (reversals greater)
q01 = q10 q02 = q20 q03 = q30 q12 ≠ q21
Across all analyses, simple leaves (however defined) were more likely to evolve from other forms than to change. Analyses of two-state coded data (A, B, C) for the angiosperm-wide dataset showed a significantly higher likelihood (P << 0.001) for the unrestricted model (unequal rates of gains and losses of complex leaves) over the restricted model (i.e. equal rates of gains and losses). For all trees, rates of losses of complex leaves were greater than rates of gains. This implies a strong bias towards simple leaves across angiosperms caused by either retention or reversals, or both.
Four-state coding (D) revealed the complicated nature of this pattern (Table 5). The inequality of rates overall was strongly supported, as was the inequality of rates between paired states tested together (Table 5). This inequality in rates may be attributed to higher rates of change to the unlobed state and lower rates of transitions out of the unlobed state, as may be seen in coding D. Estimated rate parameters for changes from lobed to unlobed leaves were high (q10 = 215) and from lobed to compound leaves and back were moderate (q13 = 58, q31 = 25). Changes to unlobed leaves from either dissected or compound leaves, between dissected and compound leaves, or from unlobed to dissected leaves were low or negligible (rate parameter estimates of 0–15) compared with changes from lobed leaves (estimates of 24–215).
Phylogenetic patterns in angiosperm clades (Table 4, Figs S2–S8)
The six angiosperm clades displayed varied patterns, mostly opposite to the pattern detected in the angiosperm 560 taxon dataset, that is, with rates of gain of complex leaves greater than rates of reversals to simple leaves; however, most of these rate differences were not significant.
Among the clades analysed, only the results from Solanum were similar to the pattern across angiosperms: higher rates of reversals to simple leaves relative to gains of complex leaves. The coding scheme for this dataset affected the outcome. Rates of loss were always greater than rates of gain, but this difference was nonsignificant under coding B, and marginally significant under codings A and C (P < 0.055). The high proportion of polymorphisms (0.24) in this dataset may have contributed to the marginal significance of the differences in the rates observed. In analyses of a dataset coded with no polymorphisms, this pattern (rates of loss greater than rates of gain) for Solanum was significant across codings (not shown). However, polymorphism contributes significantly to estimates of rate ratios by affecting the frequency of character states (below), and its effect cannot be ignored.
In Apiales, Sapindales, Fabaceae, Papaveraceae and Lepidium, the trend was the opposite of angiosperms in general and Solanum, with higher rates of transitions from simple to complex than the reverse (ratios ranging from 0.095 to 0.731). In Lepidium, this trend was generally significant for the internal transcribed spacer (ITS), but not chloroplast DNA (cpDNA), trees. Among ITS trees, 81% of the tests under coding A and 93% of the tests under coding B were significant; in cpDNA trees, only 26–29% of the tests were significant. Despite these differences, possibly caused by differences in the topologies of cpDNA and ITS trees (Bowman et al., 1999; Mummenhoff et al., 2001, 2004), similar trends in leaf form evolution were detected.
Thus, most of the trends in different angiosperm clades (rates of gain greater than rates of loss), except for Solanum, were opposite to that of the angiosperms overall (rates of loss greater than rates of gain); all trends were inconsistently and marginally significant.
We tested the possibility that bias in choosing the clades of angiosperms might have led to conflicting results at different taxonomic levels by analysing three subsamples from the three-gene angiosperm phylogeny. All subsamples (409, 195 and 147 taxon data) showed significant, differentially unequal rate ratios with greater rates of losses of complex leaves than gains (Table 4).
Involvement of lobed leaves in transitions that occur at high rates (Tables 3, 5)
ML estimates of rate parameters under coding D show that transitions towards or from lobed leaves occur at relatively high rates, suggesting that lobed leaves are labile in evolution. Thus, most leaf forms evolve into unlobed leaves at very high rates, but not the reverse; dissected leaves tend to change to or from compound leaves and compound leaves tend to change to or from lobed leaves. The high rate of reversion to simple leaves under all two-state codings may be explained by the underlying high rates of transition to unlobed and lobed leaves from compound leaves and a moderate level of transition between compound and lobed leaves. Subsamples of the angiosperm dataset (eudicots, rosids, asterids), analysed separately (see the Discussion section), showed similar patterns as seen in all angiosperms.
To sum up, MP and ML analyses in combination show that: (1) stasis is the overriding pattern for all forms; (2) the numbers of changes do not directly translate into rates; (3) unlobed leaves tend to evolve from lobed or compound forms; (4) lobed leaves are most changeable and tend to evolve from compound forms and to unlobed or compound forms; (5) dissected leaves tend not to change; when they do, they switch to compound leaves and back; and (6) compound leaves tend to switch back and forth from lobed forms. Similar results were obtained under additional codings (Notes S1, Tables S3–S5).
Relative frequency and distribution of character states (Fig. 4)
The number of simple leaves in the angiosperm dataset is high (428–490 operational taxonomic units depending on the coding scheme used) compared with compound (64), dissected (10) and lobed (19) leaves. The asymmetry of rates (rate ratios ≠ 1) could be the consequence of the predominance of simple leaves. Our analyses show that, despite sharing the same underlying phylogenies and differing only in the identity of the species having particular character states, for codings A and B, the rate ratios (1→0 : 0→1) derived from shuffled data were > 1 (as in the original data); and the rate ratios derived from observed and shuffled data were not significantly correlated (Pearson’s product-moment correlation for coding A = 0.04, t97 = 0.3919, P = 0.348; Pearson’s product-moment correlation for coding B = 0.09, t97 = 0.9117, P = 0.1821; Fig. 4). Thus, the asymmetry of rates relates to the relative frequencies of character states generally, but particular patterns of asymmetry depend on the observed phylogenetic distribution of character states. We used the Mann–Whitney U-test to determine whether the medians of rate ratios differed in the observed and shuffled data, and found significant differences (U for coding A = 9108, P < 2.2 × 10−16; U for coding B = 8335, P < 2.2 × 10−16; Fig. 4). We used a nonparametric test because distributions of the rate ratios resulting from the analysis of observed and shuffled data were not normal (Shapiro–Wilk normality test; Shapiro & Wilk, 1965; P ≤ 0.02144). Rates in the observed data were more asymmetric than in the shuffled data, suggesting that the extent of asymmetry was dependent on the distribution of character states. Our results were robust to polymorphisms in the dataset. Simulations to evaluate the effect of variably resolved polymorphisms revealed lower mean rate ratios and much larger spread in results from 1000 random binary resolutions of polymorphisms than from original estimates (U for coding A = 42702.5, P = 0.02405; U for coding B = 21176.5, P < 2.2 × 10−16; Fig. 4). Mean rate ratios from the observed data were always significantly higher than the null equal-rate rate ratios (one-sided U for coding A ≥ 7710, P ≤ 5.5 × 10−12; U for coding B ≥ 9207, P < 2.2 × 10−16). In sum, although extensively sampling polymorphism throughout the angiosperm phylogeny resulted in lower rate ratios (less asymmetry), these were still significantly higher than the rate ratios resulting from a null distribution of equal rates of change in both directions.
Rate ratios in observed and simulated data (Fig. 5)
The rate ratios estimated from data simulated using rate ratios estimated from the observed data as input parameters were significantly correlated with the rate ratios estimated directly from the data (Pearson’s product-moment correlation for coding A = 0.34, t98 = 3.5637, one-sided P = 0.0003; Pearson’s product-moment correlation for coding B = 0.22, t98 = 2.2944, one-sided P = 0.01195), but correlations were low, suggesting that input parameters only partially explain the processes that underlie the observations. Simulated data produced significantly higher rate ratios and a broader spread than observed data (Shapiro–Wilk normality test for observed and simulated rate ratios under coding A, P ≤ 0.036; one-sided U for coding A = 4162, P = 0.02036; Shapiro–Wilk normality test for observed and simulated rate ratios under coding B, P ≥ 0.06234; one-sided paired t-test of means for coding B, t99 = − 5.0531, P = 9.952 × 10−07; Fig. 5). The null distribution was strongly non-normal (Shapiro–Wilk normality test, P < 2.2 × 10−16; Fig. 5), and so we used nonparametric tests to determine whether the observed rate ratios were significantly higher than modelled under the null hypothesis of equal rates of change in both directions. Both coding methods had significantly higher rate ratios than the modelled null (one-sided U for coding A = 7566, P = 1.823 × 10−10; one-sided U for coding B = 9126, P < 2.2 × 10−16).
The ancestral angiosperm was inferred to have simple (unlobed or unlobed–lobed), but not complex, leaves under MP and ML. This finding is consistent with the fossil record and previous phylogenetic studies (Hickey & Doyle, 1977; Cronquist, 1988; Doyle & Endress, 2000; Bharathan et al., 2002; Doyle, 2007). This does not rule out the possibility that the simple leaf in angiosperms was the result of a ‘reduction’ from complex leaves (see Sinha, 1997). ML estimates of ancestral states included some probability that the ancestral angiosperm had dissected leaves (C and additional). This unexpected outcome could result from the assumption of constant rates of change across the phylogeny and from the uncertainty associated with reconstructions under high rates of evolution (Schluter et al., 1997; Cunningham et al., 1998; Ekman et al., 2008; see the section entitled ‘Data and analysis-related factors’).
Our analyses of leaf form led to three major new findings: (1) stasis of leaf form predominates; (2) simple leaves revert from complex leaves at a statistically significantly higher rate than the reverse, suggesting that simple leaves are retained as a result of a low rate of change, a higher rate of reversals from complex leaves, or both; and (3) simple lobed leaves are involved in all transitions with high rates and have a strong tendency to become unlobed, suggesting that lobed forms represent a labile intermediate step in leaf evolution.
Stasis, character constraints and environmental factors
Stasis, as detected in this study, could be a result of a lack of change, very slow rates of change, or both. Lack of change may be a result of either ‘character constraint’ (vs ‘organismal constraints’) or stabilizing selection, or interactions between the two (Charlesworth et al., 1982; Estes & Arnold, 2007; Futuyma, 2010). Character constraints could be caused by several factors, two of which, the loss or absence of the appropriate genetic basis and the rarity of appropriate mutations, may not apply to leaf evolution. Minimally, the same genetic pathway (KNOX), with frequent mutations that turn the pathway on and off, may be involved in multiple evolutionary origins of complex leaves across angiosperms (Bharathan et al., 2002; Hay & Tsiantis, 2006). A third factor, genetic correlations, might constrain the evolution of complex leaves; for instance, KNOX proteins also regulate meristematic growth in various plant organs (reviewed in Hay & Tsiantis, 2010), and so mutations in knox genes may be constrained as a result of pleiotropic effects. Such a constraint cannot be absolute, but could slow the appearance of new phenotypes.
Leaf physiognomic studies, used to estimate palaeoclimate, suggest that a high proportion of leaves with dissected margins in a community is related to low mean annual temperature and low precipitation; conversely, low dissection levels are related to warmer climates (Wolfe, 1995; Wiemann et al., 1998; Royer et al., 2005; Peppe et al., 2011). Our study includes taxa representing all plant forms (herbs, shrubs, trees, lianas), and not just woody dicotyledonous plants, as is usual in leaf physiognomic studies (but see Peppe et al., 2011); moreover, the definitions of dissection do not exactly overlap, and so our results cannot be related directly to these studies. Nevertheless, it is possible that simple leaves during the early days of angiosperm evolution may have been maintained in prevailing warm climates, and that complexity may be related to locally decreasing temperatures (and precipitation). Further work using dated phylogenies and quantitative coding could allow a clearer perspective on these alternatives.
Gains, losses and developmental factors
The higher rate of reversals to simple leaves could indicate that it is developmentally easier for simple leaves to evolve from complex leaves than the opposite. Comparative studies suggest that complex mature leaves arise only from ‘complex’ (minutely toothed) primordia with KNOX expression, whereas simple mature leaves can arise from either complex primordia with KNOX expression or smooth primordia lacking KNOX expression, leading to three leaf trajectories: complex–complex, complex–simple and simple–simple (Bharathan et al., 2002). However, there is no developmental genetic reason to believe that mutations from complex to simple primordia (loss of KNOX expression) or from complex to either simple leaf trajectory are ‘easier’ than mutations from either type of simple leaf to a complex leaf. Regardless of the details of the underlying mechanism, its generality or ease of change, angiosperms appear to have one way to initiate complex leaves, but at least two ways to initiate simple leaves. This could lead to simple leaves evolving more readily; theory predicts that, when there are more developmental paths leading to one state, the state with more paths will be visited more frequently in evolution, all else being equal (Wagner & Stadler, 2003). By virtue of having more developmental paths leading towards it, the simple leaf may exert an ‘intrinsic pull’. A combination of genetic constraints and intrinsic pull as a result of the asymmetry of developmental paths may be responsible for the high proportion of simple leaves in angiosperms and the unequal rates of change.
We show that evolutionary transitions among certain leaf forms happen more readily than among others, suggesting developmental hypotheses for further study. Simple lobed leaves appear to be a labile, intermediate stage in evolution, being involved in transitions that occur at high or moderate rates, especially from compound leaves. These patterns suggest that lobed leaves are likely to be initiated as complex primordia (with KNOX expression), as observed, for example, in Arabidopsis and Myriophyllum (Piazza et al., 2010; Bourque & Lacroix, 2011). Furthermore, in taxa in which lobed and unlobed leaves start as complex primordia, processes that allow them to develop into simple leaves are likely to be initiated later in lobed leaves than in unlobed leaves (e.g. Barkoulas et al., 2008). Comparative studies, for example, on palmately lobed and pinnately compound leaves in rosid taxa (Doyle, 2007) should reveal such differential developmental processes. Evolutionary transitions between lobed or compound leaves and dissected leaves are less likely to occur, suggesting that the latter form a distinct class. It is possible that leaves, leaflets, lobes and serrations are distinct developmental genetic entities despite the fact that their development uses common genetic components (Efroni et al., 2010; but see Kaplan, 2001; Blein et al., 2008). Similarly, investigations might add ‘dissection’ to this list, mindful that the distinction between ‘lobed’ and ‘dissected’ may be arbitrary in some cases in this study.
Data- and analysis-related factors
The slow rate of change in simple leaves could be a result of their higher frequency in the dataset. Under MP, the frequency of the rare state provides an upper limit to the number of times that state can be inferred to evolve. This tends to increase the cost of gains of the derived state and, therefore, to favour greater relative numbers of losses of the rare state and gain of the common state (Collins et al., 1994; Ree & Donoghue, 1998). Our finding that ML rate ratios were asymmetrical in both observed and shuffled character state data suggests that the unequal transition towards simple leaves may be partly a result of the high frequency of simple leaves in the dataset. However, this is not a complete explanation, because there was no correlation between the rate ratios obtained from observed and shuffled character states on the same topology, and the medians of rate ratios were higher in the observed than in shuffled data. Therefore, the particular distribution of character states in angiosperms is the basis for an asymmetry of rates that is significantly greater than might be expected from the high frequency of simple leaves in the data. Furthermore, the preponderance of simple leaves in the dataset does not result from artefacts in sampling. Using the numbers of species in sampled genera, assumed to be monophyletic (Mabberley, 1997), we estimated the number of species represented in the angiosperm dataset with leaves that were simple unlobed (18 857), lobed (3485, including polymorphic species) and complex (6568). Even taking into account the uneven numbers of species (1–900 per genus), the genera sampled were overwhelmingly simple leaved. The underlying causes for such a skewed character state distribution remain unidentified. In an analogous situation, compositional bias in molecular sequences, the relative roles of neutral mutational pressure and selection continue to be contentious (e.g. Wernegreen & Funk, 2004; Stoltzfus, 2006; Hildebrand et al., 2010). The question of how morphological character states attain skewed distributions requires deeper investigation. We conclude that our results are robust to the effects of the relative frequencies of the forms and uncertainties from taxon sampling and the alternative resolutions of polymorphisms.
Our simulations showed that the rate ratios obtained in the ML analyses were significantly different from those modelled under the null hypothesis of equal rates of gains and losses, supporting our conclusion that angiosperms are most likely to retain, and revert to, simple leaves. We also showed that ML estimates of rates were systematically biased and explained the observations only partially, because the recovered rate ratios were higher, more variable than and only weakly correlated with the input rate ratios. This systematic bias is expected to affect ancestral state reconstructions. The high rates of reversals assumed to hold across the angiosperm phylogeny could drive unexpected results, in which there is some probability that the ancestral angiosperm had dissected or compound leaves. Because observed and simulated rate ratios were correlated and both were significantly greater than the null rate ratios, we conclude that angiosperms have an overall bias towards simple leaves as a result of stasis and reversals. Nested angiosperm clades (here, Sapindales, Apiales, Papaveraceae, Fabaceae, Lepidium) may depart from this general pattern found across angiosperms.
Differences between patterns of evolution in angiosperms and nested clades
Our analyses uncovered an apparent paradox: angiosperms as a whole showed a strong tendency to retain or re-evolve simple leaves, but five of the six angiosperm clades studied showed inequalities in both directions that were either marginally or not significant. Given this variation among clades, one might expect angiosperms overall to show no significant inequalities in rates and to reflect only weak, clade-specific patterns. However, this was not the case, and this conflict between angiosperm-wide and taxonomically restricted analyses raises the possibility of biases in taxon sampling. Our estimates show that the 560 taxon sample reflects leaf form frequency in angiosperms in toto. Subsamples of angiosperms (409, 195 and 147 taxon datasets) showed evolutionary patterns similar to the 560 taxon dataset, suggesting that our results reveal an expectation for angiosperms that can be used as a null model when examining the results from these and other clades.
In the present study, Solanum showed rates that were consistent with the angiosperm-wide pattern (in particular, the asterid clade), whereas Lepidium showed marginally significant unequal rates that were opposite to the angiosperm-wide pattern (in particular, the rosid clade). A bias towards simple leaves in Solanum may be a result of the combination of factors postulated to operate across angiosperms, but more sampling in this large genus is required to test this hypothesis. Selected members of the Solanaceae showed both types of simple leaf trajectories (primordium–mature: simple–simple in Nicotiana tabacum (Nishimura et al., 1999) and complex–simple in eight species of Solanum (N. Sinha, unpublished)).
The evolutionary pattern in Lepidium (gains more than losses) points to factors that promote the origin and retention of complex leaves. This may be the case in the polyploid Australian/New Zealand clade with a complicated ancestry (Dierschke et al., 2009). To greatly simplify, one subgroup (‘Californian origin’) typically has complex leaves and occurs in arid/semi-arid regions, whereas the other (‘African/Californian origin’) typically has simple leaves and occurs in mesic regions (K. Mummenhoff, unpublished). This distribution pattern is consistent with emerging physiognomic patterns in herbs (Peppe et al., 2011), suggesting the importance of environmental factors. Three leaf developmental trajectories are known in Lepidium, as seen across angiosperms: simple–simple (L. africanum), complex–simple (L. oleraceum) and complex–complex (L. hyssopifolium). However, in Lepidium, any intrinsic developmental pull may be overcome by selective forces, a hypothesis that remains to be tested. Investigation of other, well-understood clades may reveal similar patterns (e.g. Pelargonium; Jones et al., 2009).
The choice of the six clades studied was dictated by the variability in leaf form and the availability of well-supported phylogenies. As such, they do not represent random samples (with respect to leaf form) of taxa within angiosperms; therefore, no combination of their evolutionary patterns could possibly represent the evolutionary pattern for all angiosperms. Sampling needs to be expanded to more variable and more densely sampled clades to test the generality of the angiosperm-wide pattern and to explore further the effect of character state frequencies. In addition, quantitative coding of leaf form could resolve some arbitrariness involved in assigning discrete states. In the meantime, a reasonable working hypothesis is that the angiosperm-wide pattern detected – a high frequency of simple leaves, lower rates of gain of complex leaves and higher rates of loss of simple leaves – reflects the integrated result of myriad factors acting at multiple levels – genetic developmental, organismal-physiological, phylogenetic and environmental – and may represent a null expectation when investigating the evolution of leaf form within angiosperm groups.
We thank Stephen Downie, Sara Hoot and Chris Quinn for molecular datasets; Randy Evans, Toby Pennington and Krzysztof Spalik for leaf morphological data; and Beatrice Grabowski, Jessica Gurevitch, James Doyle, Andreas Franzke, Ramona Walls and the phylogenetics discussion group at Stony Brook for discussions. We acknowledge funding from the National Science Foundation (NSF) (L.M.D. (DEB-0206336), L.B. (DEB-0316614), R.G. (DEB-0129376), M.L., N.S. and M.F.W. (DEB-0542958)), German Research Foundation (K.M.) and Fundação para a Ciência e Tecnologia (A.L. (SFRH/BPD/41391/2007)). We thank Kevin Boyce, James Doyle, Beatrice Grabowski, Susanne Renner, Mark Rausher and six anonymous reviewers for helpful critical comments on previous versions of the manuscript. We appreciate the exceptionally thoughtful reviews of James Doyle and one anonymous reviewer.