• Luis M. Valente,

    1. Division of Biology and Natural and Environment Research Council Centre for Population Biology, Imperial College London, Silwood Park Campus, Ascot, Berkshire, SL5 7PY, United Kingdom
    2. Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3DS, United Kingdom
    3. Real Jardín Botánico de Madrid, CSIC, Plaza Murillo 2, 28014 Madrid, Spain
    Search for more papers by this author
    • 4

      These authors contributed equally.

  • Gail Reeves,

    1. Division of Biology and Natural and Environment Research Council Centre for Population Biology, Imperial College London, Silwood Park Campus, Ascot, Berkshire, SL5 7PY, United Kingdom
    2. Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3DS, United Kingdom
    3. South African National Biodiversity Institute, Kirstenbosch, Private Bag X7, Claremont, 7735, Cape Town, South Africa
    Search for more papers by this author
    • 4

      These authors contributed equally.

  • Jan Schnitzler,

    1. Division of Biology and Natural and Environment Research Council Centre for Population Biology, Imperial College London, Silwood Park Campus, Ascot, Berkshire, SL5 7PY, United Kingdom
    2. Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3DS, United Kingdom
    Search for more papers by this author
  • Ilana Pizer Mason,

    1. Division of Biology and Natural and Environment Research Council Centre for Population Biology, Imperial College London, Silwood Park Campus, Ascot, Berkshire, SL5 7PY, United Kingdom
    2. Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3DS, United Kingdom
    3. Department of Zoology, Faculty of Life Sciences, Tel Aviv University, Tel-Aviv 69978, Israel
    Search for more papers by this author
  • Michael F. Fay,

    1. Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3DS, United Kingdom
    Search for more papers by this author
  • Tony G. Rebelo,

    1. South African National Biodiversity Institute, Kirstenbosch, Private Bag X7, Claremont, 7735, Cape Town, South Africa
    Search for more papers by this author
  • Mark W. Chase,

    1. Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3DS, United Kingdom
    Search for more papers by this author
  • Timothy G. Barraclough

    1. Division of Biology and Natural and Environment Research Council Centre for Population Biology, Imperial College London, Silwood Park Campus, Ascot, Berkshire, SL5 7PY, United Kingdom
    2. Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3DS, United Kingdom
    3. E-mail: t.barraclough@imperial.ac.uk
    Search for more papers by this author


The Cape region of South Africa is a hotspot of flowering plant biodiversity. However, the reasons why levels of diversity and endemism are so high remain obscure. Here, we reconstructed phylogenetic relationships among species in the genus Protea, which has its center of species richness and endemism in the Cape, but also extends through tropical Africa as far as Eritrea and Angola. Contrary to previous views, the Cape is identified as the ancestral area for the radiation of the extant lineages: most species in subtropical and tropical Africa are derived from a single invasion of that region. Moreover, diversification rates have been similar within and outside the Cape region. Migration out of the Cape has opened up vast areas, but those lineages have not diversified as extensively at fine spatial scales as lineages in the Cape. Therefore, higher net rates of diversification do not explain the high diversity and endemism of Protea in the Cape. Instead, understanding why the Cape is so diverse requires an explanation for how Cape species are able to diverge and persist at such small spatial scales.

Explaining why some taxa and geographical regions contain more species than others is an important goal of evolutionary biology. Phylogenetic approaches have been increasingly used to explore the timing and rates of diversification, in terms of the net accumulation of species through time (Barraclough and Vogler 2002; Rabosky 2006; Ricklefs 2007). Incorporating earlier ideas from island biogeography and ecological studies of diversity patterns (MacArthur and Wilson 1967), it is now clear that the potential for clades to diversify depends on the area of the geographical region they inhabit (Losos and Schluter 2000; Ricklefs 2003; Davies et al. 2004; Phillimore et al. 2006). These findings shift the focus of evolutionary diversity studies to understanding why some geographical regions and taxa contain more species than expected based on their area, while still taking the time available for diversification into account.

One of the best-studied biodiversity hotspots is the Cape region of southern Africa (Cowling 1992; Linder 2003). Southern Africa as a whole contains 20,400 indigenous flowering plant species, of which 80% are endemic (Goldblatt and Manning 2000). Even more remarkable is that over 40% of these species are concentrated within the Cape region of South Africa, which covers less than 4% of the surface area. The Cape is characterized by a Mediterranean-type climate of winter rainfall, contrasting with the summer-rainfall climate of subtropical and tropical southern Africa. Recent work has confirmed that the Cape represents an outlier from general trends between species richness and environmental parameters (Davies et al. 2005; Kreft and Jetz 2007).

Such high levels of diversity and endemism have provoked intense interest in the origins of the Cape flora (Levyns 1952, 1964; Linder et al. 1992; Rourke 1998; Galley and Linder 2006). One hypothesis has been that Cape diversity resulted from recent and rapid diversification (Levyns 1952, 1964; Linder 2003; Latimer et al. 2005 but see Etienne et al. 2006). Recently, several studies have reconstructed phylogenetic relationships within endemic clades from the Cape to test the hypothesis of recent diversification (Bakker et al. 1999; Richardson et al. 2001b; Goldblatt et al. 2002; Forest et al. 2007; reviewed in Linder and Hardy 2004; Linder 2005; Hawkins 2006; Verboom et al. 2008). Contrary to the hypothesis of recent origin and rapid speciation, Cape clades display a wide range of ages: some clades did originate recently, but many others result from prolonged diversification that began before the onset of winter-rainfall conditions (Linder 2008; Verboom et al. 2008). Therefore, high diversity in the region might reflect high levels of species persistence and sustained diversification rather than recent, rapid diversification (Linder 2003; Barraclough 2006), perhaps explained by the relative climatic stability combined with continuing geomorphological dynamism (Dynesius and Jansson 2000, Cowling et al. 2009).

One limitation of this work has been the lack of formal comparisons between related clades within and outside the Cape region (Barraclough 2006). A recent study by Sauquet et al. (2009a) found a strong positive correlation between diversification rates and the proportion of Cape species in genera of Proteaceae. However, the study lacked information on the phylogenetic positioning of non-Cape representatives within each genus, and, therefore, it could not confidently conclude that higher-than-average diversification rates in some clades were due to elevated rates in the Cape region. In particular, for understanding the concentration of species in the Cape, comparisons of related lineages between the Cape and neighboring regions of southern Africa would be especially informative (van der Niet and Johnson 2009). In an investigation of the biogeography of four Cape clades that extend into the Afromontane region, Galley et al. (2007) discussed levels of in situ speciation outside the Cape, which were judged to be low. However, diversification rates in Cape and non-Cape lineages were not compared, due to uncertainty over the number of immigration events versus in situ speciation for these clades.

Here, we reconstruct the phylogeny of the genus Protea (Proteaceae). The family Proteaceae, including Protea, is an important component of the fynbos vegetation of the Cape region, and its ecology and distribution have been studied extensively (Cowling et al. 1992; Rebelo 2001). For example, the “Protea Atlas Project” (http://protea.worldonline.co.za/) has recorded the distribution of all members of South African Proteaceae over a 12-year period, including ecological data on habitats and pollination syndromes (Rebelo 2001). However, evolutionary relationships within genera remain obscure: there are no pollen or macro-fossil data in southern Africa for the extant genera, and only Leucadendron has been subjected to a comprehensive, DNA sequence-based study (Barker et al. 2004). Critically for our aims, only two Leucadendron species are found outside the Cape region (Rebelo 2001).

Protea is an excellent clade for exploring the causes of high diversity in the Cape. Although its center of diversity and endemism is the Cape region (69 of the 112 species recognized by Rourke 1980 are Cape endemics), the genus also extends through tropical Africa north to Eritrea and west to Angola (Fig. 1). This distribution allows comparisons of diversification between the Cape region and the rest of Africa. For conciseness, we refer to the species from outside the Cape region as the non-Cape species. Previous work has shown the overall diversification rate of Protea to be significantly higher than the background diversification rate for the rest of Proteaceae (Sauquet et al. 2009a). Here, we investigate whether Protea has indeed diversified more rapidly in the Cape than in the rest of Africa.

Figure 1.

Map of the number of Protea species within 1 degree by 1 degree grid cells across Africa. The Cape floristic region is indicated by hashed shading. Data are from the Protea Atlas (http://protea.worldonline.co.za/).

Our first aim was to reconstruct changes in distribution onto the phylogenetic tree to evaluate the history of migration between the Cape region and the rest of Africa. Earlier authors speculated that Protea originated in tropical or subtropical regions based on its sister relationship with the tropical genus Faurea and the presumed simpler and more uniform morphology of tropical species (Levyns 1964; Rourke 1980, 1998), which would fit with the idea of rapid and recent radiation within the Cape. However, our analyses demonstrate the opposite pattern: the non-Cape species are nested within a wider radiation of Cape lineages and all but two of them belong to a single clade. Therefore, most extant lineages outside the Cape originated by in situ diversification from a single ancestor that arrived there from the Cape.

Second, we tested whether Cape lineages have differed in their diversification rates from non-Cape lineages, using recent methods that allow simultaneous estimation of speciation and extinction rates and migration rates between the two regions (Maddison et al. 2007). The analyses show that the non-Cape clade has, if anything, diversified at a faster rate than the Cape lineages. For Protea, migration out of the Cape region triggered diversification by opening up a larger area in which to diversify, despite the loss of whatever traits or environmental features allowed sustained diversification at fine spatial scales in the Cape.

Materials and Methods


No previous phylogenetic study of Protea has been attempted and its classification is complicated by separate treatments of South African and tropical species. The most recent treatment of South African species was that by Rourke (1980). Of the 82 recognized species included in his revision, 69 are endemic to the Cape Region, 10 are found only in summer-rainfall regions of South Africa, and three (Protea gaguedi, P. caffra, and P. welwitschii) extend into tropical Africa, north of the Limpopo. Protea subvestita was the only species believed to exist both in the Cape and summer-rainfall regions of South Africa. However, Cape populations previously assigned to P. subvestita are now known to constitute a hybrid between P. punctata and P. mundii (R. Prunier, unpubl. data) and so we treat P. subvestita as a non-Cape species. Tropical species were classified by Beard (1963, 1993; Table 1) into five sections. Conflict between treatments by Rourke (1980) and Beard (1963, 1993) is largely due to morphological variability within the tropical species, which makes their circumscription difficult, in contrast with the distinct species of the western Cape (Rourke 1998). The effects of alpha-taxonomy on comparisons of diversification rates in the two regions will be returned to in the Discussion. In the absence of a recent subgeneric taxonomic scheme for Protea, we use the informal groupings defined by Rebelo (2001; based on Rourke 1980 and unpubl. data) as a basis for evaluating the phylogenetic trees presented here (Table S1). Two of the species of Rourke (1980) are actually named as subspecies, and we have treated them as such, that is, our species list contains 110 species of which 70 are restricted to the Cape (including P. namaquana, found just to the north of the Cape, which was not described at the time of Rourke 1980). Several putative species have been excluded. A list of Protea species, their historical taxonomic treatments, geographical distribution, and nomenclature is provided in Table S1.

Table 1.  Summary of the performance of different partitions. PP, posterior probability from Bayesian analysis; BS, bootstrap support from parsimony analysis. A. P. subvestita groups with P. roupelliae (spoonbract, PP 0.94). B. P. angustata groups with P. pendula (penduline) and P. canaliculata (rose, PP 1.0). C. P. sulphurea (penduline) falls within non-Cape clade and P. enervis falls outside (several nodes distant with pp between 0.54 and 0.68). D. Paraphyletic, P. nana, and P. scolymocephala (rose) fall within the clade (PP 0.9). – indicates that the grouping is not contradicted by any nodes with PP >0.8.
Number of sites252984182913841994337
Number of variable sites 329245316136 8901026
Number of parsimony informative sites 211 79235113 525 638
% nodes PP >0.5  32.1 33.8 56.8 36.6  57.0  80.2
% nodes BS >50  23.8 28.6 32.4 16.9  31.4  41.9
 PP support for groupings (letters indicate grouping is contradicted with PP>0.8, see footnotes)
  WhiteA 0.990.97 0.95
  Non-Cape 0.94C0.520.541
  Western-ground B10.53D0.79


We sampled 87 species, which includes all 70 Cape species and 17 of the 40 species from outside the Cape. Voucher information and GenBank accession records are provided in Table S2. We were unable to collect more species outside the Cape due to logistical constraints but, for reasons explained below, we believe that our sample is sufficient for the comparisons made here when combined with methods to compensate for under-sampling. Taxa from three genera belonging to subfamily Proteoideae (Johnson and Briggs 1975; Weston and Barker 2006) were chosen as outgroups: five species of Faurea, one species of Leucadendron, and three of Serruria. Faurea is the sister clade of Protea, a relationship that is well supported on both morphological and molecular grounds (Rourke 1998; Sauquet et al. 2009a). Total genomic DNA was extracted from 0.2 to 1.0 g of silica dried leaf material using a modified 2X CTAB method (Doyle and Doyle 1987) with purification by cesium-chloride/ethidium-bromide density gradient (1.55 g/mL). Ethidium bromide was removed with butanol, and the purified total DNAs were dialyzed in 1X TE buffer and stored at −80°C. All DNA extracts were further purified and concentrated using QIAquick silica columns (Qiagen Inc., Valencia, CA) according to the manufacturer's protocol for cleaning polymerase chain reaction (PCR) products.


We generated two kinds of DNA markers. First, we sequenced DNA from six noncoding regions in the plastid and nuclear genomes. Plastid regions were trnL intron, trnL-trnF intergenic spacer (Taberlet et al. 1991), rps16 intron (Oxelman et al. 1997), and atpB-rbcL intergenic spacer (Savolainen et al. 1994). Nuclear regions were the ITS ribosomal region (Sun et al. 1994; Álvarez and Wendel 2003; Chase et al. 2003) and a portion of the region encoding the plastid-expressed isozyme of the glutamine synthetase gene (ncpGS, Emshwiller and Doyle 1999). Second, anticipating low levels of sequence variation among species, we generated 138 amplified fragment length polymorphism markers (AFLP, Vos et al. 1995). AFLP markers sample widely distributed restriction endonuclease sites across the nuclear genome. Polymorphism is apparent as the presence or absence of bands. AFLPs are used mostly for population- and species-boundary studies (Mueller and Wolfenbarger 1999), but also have been employed to resolve species-level relationships in recent, rapid radiations (Hodkinson et al. 2000; Beardsley et al. 2003; Richardson et al. 2003; Sullivan et al. 2004). Details of primers and PCR amplification conditions are provided in the Supporting Information. Sequencing was performed on an Applied Biosystems 3730 DNA Analyser (Applied Biosystems, Warrington, Cheshire, UK). Sequences were aligned manually in Maclade 4.08 (Maddison and Maddison 2000). For all plastid regions and the ncpGS region, sequence-length variation among species was low. Gaps were coded as missing data. For ITS, a region of 66 base pairs was excluded from all analyses due to extreme length variation. Interpretation of AFLP fragments was carried out using Genescan (version 2.02) and Genotyper (version 1.1) analysis software (Applied Biosystems) and fragments were scored as present/absent.


The four noncoding plastid regions were combined into a single dataset. Because of their uniparental mode of inheritance and noncoding nature, these regions are expected to produce congruent results (confirmed by inspection of the individual gene trees, Figs. S1–S3 and Supporting information). Therefore, our data matrices comprised: (1) the plastid dataset for 85 Protea and nine outgroup species, (2) the ncpGS dataset for 77 Protea and five outgroup species, (3) the ITS dataset for 75 Protea and five outgroup species, (4) the plastid and nuclear datasets combined for 87 Protea and nine outgroup species, (5) the AFLP dataset for 72 Protea species, and (6) the AFLP, plastid and, nuclear datasets combined for 87 Protea and nine outgroup species. Data for taxa absent from any of the separate partitions were coded as missing.

Bayesian analyses were performed in MrBayes 3.12 (Ronquist and Huelsenbeck 2003). Analyses were repeated in turn for each of the six datasets listed above. Model selection for each partition (ITS, plastid, and ncpGS) was based on the Akaike information criterion (AIC) scores for substitution models evaluated using MrModeltest (version 2.3; Nylander 2004). The general time reversible (GTR) model with gamma-distributed rate variation among sites and a proportion of invariant sites was chosen for the plastid and ITS regions, whereas a simpler two-parameter substitution model with gamma-distributed rate variation and no invariant sites was chosen for ncpGS. The AFLP data were analyzed using the restriction-site (binary) model. The combined analyses (5) and (6) fitted a separate substitution model to each partition but assumed a single-tree topology and branch lengths. The Markov chain Monte Carlo (MCMC) algorithm was run with two independent runs, each with four chains, and initiated with a random tree. Analyses were run for 10 million generations, sampling the Markov chains every 100 generations. Samples from the first 5 million (ITS, ncpGS, AFLP) or 6 million (plastid, all DNA sequences, all data) generations were discarded based on stabilization of the standard deviation of split frequencies between the two independent runs. Bayesian posterior probabilities were estimated as the proportion of trees sampled after burn-in that contained each of the observed clades. For comparison, conservative estimates of the internal support under parsimony analysis were obtained from 1000 bootstrap replicates using a heuristic search with 10 replicates of random taxon addition, with TBR (tree bisection reconnection) swapping and one tree held at each step. Parsimony analyses were run in PAUP* version 4.10b (Swofford 2001).


Divergence times of Protea were estimated using a relaxed-clock Bayesian MCMC approach as implemented in BEAST (version1.4.8, Drummond et al. 2006; Drummond and Rambaut 2007). For this analysis, only dataset (4), the combined DNA sequence data without AFLPs, was used. The consensus tree from the Bayesian analysis of dataset (6) was used as the starting tree. A speciation model following a Yule process was selected as the tree prior, with an uncorrelated lognormal (UCLN) model for the rate variation among branches. The root node age was constrained to a normal distribution with a mean of 28.4 million year (My) (SD = 2) based on the comprehensive fossil calibration of Proteaceae by Sauquet et al. (2009b). Twenty-five independent runs of 5 million generations, sampling every 2000 generations were performed. The adequacy of sampling was assessed using the effective sample size (ESS) diagnostic with Tracer (version1.4, Rambaut and Drummond 2007, http://beast.bio.ed.ac.uk/Tracer). Because the starting tree was near optimal, each of the 25 runs converged immediately. After removing the first 50,000 generations as burn-in, the maximum clade credibility tree was built using TreeAnnotator (version 1.4.8, Drummond and Rambaut 2007). Analyses were repeated with an alternative tree prior, a birth–death process, to check for the influence of the prior on the estimated dates.


We coded species distributions as a single binary character, 0 for the Cape and 1 for outside the Cape. We did not classify into more subdivided regions (cf. Galley et al. 2007) because our analyses require a large sample of species in each region to compare diversification rates. Markov models implemented in BayesTraits (Pagel et al. 2004) were used to reconstruct ancestral distributions onto the dated trees to determine whether the Cape or the subtropical and tropical region represents a more likely ancestral area for the genus and to infer the relative rate of migration events in each direction. We first performed maximum likelihood optimization of a two-parameter model (q01 = emigration from the Cape, q10 = immigration into the Cape) on 50,000 sampled trees from BEAST. Finding that parameter estimates were roughly exponentially distributed, we then used an exponential prior with mean equal to the observed mean from the maximum likelihood results to implement a reverse jump MCMC Bayesian analysis with 5 million iterations. Reverse jump allows all simpler nested models to be visited during MCMC, and the proportion of time they are visited is proportional to their posterior probability, thus incorporating model selection into the MCMC procedure. As demonstrated by Maddison (2006), interpretation of the relative rate of change between two trait values (in this case region) might be confounded by an effect of the same trait on speciation and/or extinction rates. This possibility is explored in the next section.


Log-lineage-through-time plots were used to visualize the temporal dynamics of diversification in the whole genus and for subtrees of the species restricted to each region in turn. The dynamics of diversification were compared using BiSSE likelihood methods implemented in the program Mesquite (Maddison and Maddison 2009). The model estimates six parameters: rates of speciation in each region (λ0 and λ1), rates of extinction in each region (μ0 and μ1), and rates of change between the two states, here equivalent to migration rates (q01 and q10 = migration rates from the Cape to the rest of Africa and from the rest of Africa to the Cape, respectively). To estimate the significance of observed differences in λ and μ between the two character states and perform model simplification, we used log likelihood ratio tests to compare constrained and unconstrained models. Under the null model, twice the log likelihood ratio should be distributed as chi-square with degrees of freedom equal to the difference in the number of parameters of the models. The comparison of diversification rates assumes that all extant species have been sampled. Therefore, missing species were added using an unpublished Perl script by James Cotton (Day et al. 2008). The script adds species at random points along branch lengths descended from the ancestral node of the narrowest clade in which the species are believed to belong based on current taxonomy. This conforms to a constant speciation rate model, that is, branching events are equally likely at any point along lineages. Random taxon addition was performed separately on each of a random subsample of 1000 trees from the BEAST output to account for uncertainty in the topology and branching times. The BiSSE model was optimized onto each of the resulting trees. The subsample of 1000 trees was shown to represent adequately the entire sample from the BEAST output by comparing posterior probabilities and confidence limits of node ages with the full sample (Figs. S9 and S10). Note that the present version of BiSSE model in Mesquite does not output information on the likelihood of inferred ancestral states: this is why we also performed the BayesTraits analyses.



Phylogenetic reconstruction recovered several clades in agreement with the current informal subgeneric taxonomy of Protea (Fig. 2, Table 1). Moreover, all species belonging to the subgeneric groupings that lack Cape representatives, namely the grassland, mountain, savanna, and red proteas, were resolved as a single clade (which we call the non-Cape clade) in the ncpGS, AFLP, and combined analysis. Only two species that are not found in the Cape, P. roupelliae and P. subvestita, fell outside the non-Cape clade. These two species are the only non-Cape species that were previously classified in sections with Cape species (Rebelo 2001).

Figure 2.

Consensus of 80,000 sampled Bayesian trees for the combined analysis of all DNA sequence and AFLP data, showing all groupings with posterior probability above 0.5. Bayesian posterior probabilities are shown above the branches leading to each node; parsimony bootstrap percentages >50% are shown below the equivalent branches. No nodes with bootstrap >50% in the parsimony analysis contradicted those shown here. Bars indicate informal sections or combinations thereof according to the classification of Rebelo (2001), based on the treatment of Rourke (1980). Bars are provided as a guide for navigating the taxonomy rather than implying monophyletic or strongly supported clades: solid lines around the bars indicate a section that is monophyletic in our tree; dashed lines indicate sections that are not monophyletic. For polyphyletic sections, letters after species names show which species belong to which section as defined in the label for the broader grouping within which they fall. The black circle indicates the ancestral branch of the non-Cape clade. Consensus branch lengths and a scale bar are shown in units of changes per site.

Most of the differences between trees recovered for different partitions reflected low variation (i.e., sampling error) rather than strongly supported conflict (Figs. S4–S8). Combining the datasets improved resolution and the recovery of previously described and morphologically based subgeneric groupings (Table 1). For example, adding the AFLP data to the combined sequence data increased the percentage of nodes with posterior probabilities >0.5 from 57% to 80% and led to recovery of two additional informal subgroupings: the white and western ground proteas. Four cases of hard conflict between partitions were apparent (judged by topological differences associated with posterior probability >0.8, Table 1). However, in each case, at least two partitions confirmed the monophyly of the morphologically recognized grouping recovered in the combined analysis and so we feel confident that the combined analysis is converging on reliable groupings, rather than being drawn toward the results of a single partition.

Two conflicts are potentially relevant for the biogeographic comparison. First, P. subvestita (a white protea) and P. roupelliae were recovered as sister species in the plastid analysis, whereas P. roupelliae falls just outside the white proteas in other partitions. Because these two species are found outside the Cape, but closely related to Cape species, this potentially affects inferences of number of dispersal and speciation events. Second, in the ITS analysis, P. sulphurea groups with the non-Cape clade rather than with the penduline proteas as expected, and P. enervis groups with the rodent proteas rather than with the non-Cape clade as expected. Because these involve switching between distantly related groups found in distant geographical regions, the most likely explanation is an undiagnosed database or phylogenetic error, rather than hybridization or incomplete lineage sorting. However, no obvious mistake in data assembly was found, and the deletion of these sequences did not affect relationships in the combined analysis, so we have retained the sequences in the matrix. Matrices and trees in nexus format are available from TreeBase (accession number S2415).


The ancestral region for Protea is reconstructed as the Cape with high probability: the median probability that the crown node of Protea has state 0 is 1.00 with a minimum of 0.979 across the MCMC results from the reverse jump analysis. We refer here to the statistical reconstruction of trait values on the tree; whether these can be taken to indicate likely ancestral areas will be returned to in the Discussion. On average, migration rates are reconstructed to have been greater from the Cape to the rest of Africa (q01, Fig. 3A) than in the opposite direction (q10, Fig. 3B). However, although the best-supported model is of migration only from the Cape to the rest of Africa (q01 > 0, q10 = 0), this has only marginally higher support than a model of equal migration rates (posterior probabilities = 0.55 and 0.45, respectively). The alternative model, that q01 and q10 are both positive but differ in value, requires optimization of an additional parameter and was not supported (posterior probability = 0.0057). Maximum likelihood optimization across the BEAST trees yielded similar results to the Bayesian approach but with stronger evidence for unidirectional migration. Using AIC = twice the log likelihood minus twice the number of parameters, the model with q10 = 0 was preferred in 84.9% of trees, the model constraining q10 =q01 in 14.9% of trees and the unconstrained model in 0.2% of trees.

Figure 3.

Frequency histogram of estimated migration rates between (A) the Cape and the rest of Africa, q01, and (B) the rest of Africa and the Cape, q10. The units are migration events per lineage per million years. Means of the distributions are indicated by dashed lines.


The median age of the crown node of Protea across trees sampled by the BEAST analysis was 17.7 Million years ago (Mya) with 95% confidence limits of 11.2–27.2 Mya (the Early to Mid-Miocene, consensus tree shown in Fig. S9). The average age of the crown node of the non-Cape clade was 14.9 Mya (C.I. 9.3–24.0 Mya). The only possible internal calibration relates to endemic species on coastal flats that were probably submerged before 2.6 Mya (Linder 2003; Cowling et al. 2009). The best candidates are P. obtusifolia (found on limestone) and P. susannae (found on calcareous and neutral sands), which have long been viewed as a case of ecological speciation on two quaternary soil formations (Mustart and Cowling 1993). We did not use this in our calibration because of uncertainty over the date of emergence in relation to sea-level changes and uplift of southern Africa (Cowling et al. 2009). However, 60.7% of our dates for the split between P. susannae and P. obtusifolia are <2.6 Mya (median 2.31, 0.78–5.2), which provides broad corroboration except that our confidence limits extend to older dates than predicted by the coastal flat scenario.

The apparent rate of diversification slows down toward the present (Fig. 4A, solid lines): all sampled trees yielded negative Pybus and Harvey (2000) gamma values, and 99.8% of them were significant (two-tailed test). One possible cause of an apparent decrease in rate is incomplete sampling of species. The 23 recognized species that were missing from our sample all belong to the subgeneric groupings within the non-Cape clade, but the groupings themselves were not monophyletic. Therefore, we added 23 tips within the non-Cape clade at random points along branches descended from its crown node for each BEAST tree in turn. There remains a trend for a slow-down toward the present (Fig. 4A, dashed lines: all sampled trees yielded negative gamma, 85.4% of them below the 95% critical value). Comparison of plots between the Cape and the non-Cape lineages reveals that the radiation of the non-Cape clade began after the Cape radiation and occurred at a slightly faster rate (Fig. 4B, Cape = solid lines, non-Cape = dashed lines). Note that the shape of the curve for the non-Cape clade will be affected by how we added missing species; our approach was conservative in that it assumes a constant branching rate model rather than introducing major departures in shape from constant exponential growth. Using a birth–death prior in BEAST instead of the Yule prior had little effect on conclusions; there remained a slow-down in rate toward the present, and 81.2% of trees displayed a significantly negative gamma.

Figure 4.

Log lineage through time plots of (A) the entire genus with (dashed lines) and without (solid lines) missing species added to the non-Cape clade and (B) for just Cape lineages (solid lines) and for just non-Cape lineages (dashed lines). Black lines indicate the curve of median ages for each diversification event and gray lines indicate the 95% confidence limits. (B) shows two curves for the non-Cape lineages: the higher black dashed line is the plot including missing species, the lower black dashed line is the plot without missing species added. Confidence limits are shown for both as gray dashed lines.


Optimization of the BiSSE model onto BEAST trees yielded results that were consistent whether missing species were included or not; we describe the results including missing species. The absolute scale of estimated parameters varied according to the dating of the root node of the genus, that is, values tend to be positively correlated across trees (Fig. 5). We focus on comparing estimates between the two regions rather than on absolute numbers.

Figure 5.

Plot of parameter estimates from the BiSSE model for (A) the Cape and (B) the non-Cape lineages. Parameters are the rates of speciation (λ0 and λ1) and rates of extinction (μ0 and μ1) in the Cape and non-Cape lineages respectively, and the migration rates from the Cape to the rest of Africa, q01, and from the rest of Africa to the Cape, q10. Units are in events per lineage per million years.

Per lineage speciation rates, λ, and extinction rates, μ, were higher in the non-Cape lineages than in the Cape (Figs. 6A,B). The median speciation rate in the Cape, λ0, was 0.17 species My−1 (95% C.I. 0.12–0.27) and the median value of λ10 was 2.0 (95% C.I. 1.3–3.3). The median extinction rate in the Cape, μ0, was zero to within three decimal places (95% C.I. 1.4 × 10−7− 1.0 × 10−4). Outside the Cape, two distinct solutions were obtained: either extinction rate was estimated as zero, or it took a wide range of values up to an upper 95% confidence limit of being 73.8% of the non-Cape speciation rate. Estimates for the migration rate from the rest of Africa into the Cape (q10) covary with those for extinction rates outside the Cape: either extinction rate is high and emigration rate is low, or extinction rate is low and emigration rate is high, indicated by the two distinct lines of points in Figures 5B, 6C. The per lineage net diversification rate also falls into two distinct solutions (Fig. 6D). In 65.8% of trees, it is higher outside the Cape, taking a median value of twice the net diversification rate in the Cape. In 34.2% of trees, namely those in which the non-Cape extinction rate is notably greater than zero, the net diversification rate in the non-Cape lineages is between 79% and 93% of the value in the Cape (95% C.I.).

Figure 6.

Histograms of relative parameter estimates between the Cape (0) and non-Cape (1) lineages from optimization of the BiSSE model onto a random subsample of 1000 output trees from BEAST. (A) The difference in speciation rates, (B) the difference in extinction rates, (C) the difference in the logarithm of migration rates, and (D) the difference in per lineage net diversification rates. Units are events per lineage per million years. Differences are shown rather than absolute estimates in each region because of covariation of estimates resulting from variation in the absolute scale of branch lengths across the trees (Fig. 5).

We performed further analyses to identify the basis of the two alternative solutions described above. We checked whether our sample of trees contains distinct islands of trees by constructing separate majority rule consensus trees for those trees associated with the two solutions. No major differences were apparent (Fig. S11). In particular, there was no tendency for alternative placements of P. roupelliae and P. subvestita, reflecting differences among the separate partitions described under section “Phylogenetic Relationships,” to be associated with the two solutions. Instead, exploration of the likelihood surface with respect to q10 and μ1 for trees representing both solutions shows that a broad range of values are almost equally likely, and the two alternative solutions represent small peaks in this surface (Fig. S12). Therefore, the BiSSE analysis is jumping between two solutions with near equal ability to explain the data.

Despite apparently large differences between the regions when fitting the full BiSSE model, there is further uncertainty in parameter estimates due to the stochastic nature of the underlying processes. The fully parameterized BiSSE model was only preferred in one of 1000 trees compared to a minimum feasible model of equal speciation rates in both regions, no extinction in either region and equal migration rates between regions (AIC maximum model minus AIC minimum model =−8.0; C.I. −10.2 to −6.0). Under the minimum model, the net diversification rate was estimated at 0.19–0.58 species My−1 (95% C.I.).


Our results provide the first phylogenetic analysis of relationships among Protea species. Levels of variation were low, especially in plastid regions, but combined analysis greatly improved resolution, yielding a phylogenetic hypothesis in broad agreement with recent ideas about their taxonomy. For example, the snow proteas and shale proteas were recovered as monophyletic in every partition; the rodent proteas were recovered in the plastid and ncpGS analyses; and the white and the western ground proteas were recovered in the ITS and AFLP analyses. Morphological characters distinguishing these groupings are described in Rebelo (2001). Several of the exceptions correspond to stated uncertainty in taxonomy. For example, the rose and penduline proteas were already thought to share a close relationship, and P. pendula is believed to hybridize with members of the rose proteas (Rebelo 2001 and pers. obs.). Similarly, apart from bearded involucral bracts and nonopening flowerheads, the bearded proteas share all the morphological features of the spoon-bract proteas (Rebelo 2001). The implications of the tree for Protea taxonomy will be discussed elsewhere.

Our main finding is that all sections that contain only species from outside the Cape comprise a single clade nested within the wider radiation of Cape lineages. This finding coincides with morphological characteristics shared by these sections of long life span, unadorned involucral bracts, nearly actinomorphic flowers, undifferentiated pollen presenters, fruits that are shed, and a lignotuberous, epicormically resprouting tree-habit (Rebelo 2001). The Cape shaving-brush proteas were believed to be Cape representatives of grassland proteas (section Leiocephalae, Rourke 1980) based on shared morphological features—P. nitida is the only Cape species to resprout from epicormic buds—but these species group more closely with other Cape taxa in all our analyses. Although it is possible that the non-Cape species we were unable to sample could belong elsewhere in the tree, we believe this is unlikely; our sample included around 50% of recognized species from outside the Cape and representatives of all recognized non-Cape subgeneric-groupings. Also, our sample of non-Cape species was biased toward those in southern Africa, and we would expect any close relatives of Cape species to be found in neighboring regions (cf. Galley et al. 2007). The two other species with distributions outside the Cape, P. roupelliae and P. subvestita, were confirmed to be related to Cape clades. Contrary to previous taxonomy, however, P. roupelliae was recovered as closely related to the white proteas, which include P. subvestita.

The Bayesian analysis of species distributions confirmed the Cape to be the most likely ancestral area. Dispersal between the two regions has been rare: at most three separate events. This stands in contrast to other Cape clades, in which exchange between the Cape and especially the Drakensberg mountains of eastern South Africa is relatively common (Galley et al. 2007). The rarity of dispersal in Protea cannot just be explained by assuming lower dispersal capabilities, as both within the Cape and outside some lineages have colonized broad geographic areas. Instead, it implies that species adapted to one environment are unlikely to colonize the other. The non-Cape clade comprises long-lived trees that resprout after fire and have “simple” flowers that are nearly actinomorphic with undifferentiated pollen presenters, structures used for placement of pollen onto pollinators (Rebelo 2001). The Cape region includes a few species with similar traits, for example P. nitida, but most species are serotinous (plants are killed by fire but recruit by seed released from fire-proof cones) and have more differentiated flower morphology associated with a range of pollination syndromes (birds, rodents, and insects, Collins and Rebelo 1987). Therefore, although there are some exceptions, broad differences in ecological and reproductive strategies might limit migrations between biomes.

Although migration in both directions cannot be ruled out, presumably because of the low number of distributional shifts in either direction, there is marginally greater support for migration only from the Cape to the rest of Africa rather than in both directions. Together, these results contradict the traditional view that the genus originated in tropical regions and has only more recently invaded the Cape. An alternative explanation for our findings would be if there were repeated invasion of lineages from the rest of Africa into the Cape, but that a subsequent extinction event killed off all but one of the non-Cape lineages present at that time. In the absence of any alternative evidence for this, we favor the simpler interpretation. It remains possible that the original ancestor invaded the Cape from elsewhere (e.g., the sister genus Faurea has a distribution centered in tropical Africa and Madagascar), but the extant lineages outside the Cape appear to result from invasion from the Cape rather than vice versa. Several other Cape taxa display similar patterns (Phylica, Richardson et al. 2001a, b; Disa, Irideae, the Pentaschistis clade, and Restionaceae, Linder 1994; Galley and Linder 2006; Galley et al. 2007), whereas in others, such as Pelargonium, the Cape taxa comprise a derived clade within a more widely distributed set of species (Bakker et al. 1999).

Dating the radiation of Protea is limited by the lack of internal calibration points. We used the date of the split between Protea and Faurea from Sauquet et al. (2009b), which in turn is derived from fossil dates applied to the wider Proteaceae tree. The 95% confidence intervals of the age of the crown node of Protea are broad, but 11.2–27.2 Mya encompasses the assumed date for the formation of the circum-Antarctic Benguela current and the start of the shift to cooler and drier climates (Linder 2003; Cowling et al. 2009). The results establish the radiation of Protea as being of medium age compared to other Cape clades (Linder 2005; Verboom et al. 2008). In addition, the estimated net diversification rate for the genus (0.19–0.58 species My−1) is moderate when compared to other groups. It is higher than the average rate in angiosperms (Magallon and Sanderson 2001) but considerably lower than that of recent Cape radiations such as Aizoaceae (Klak et al. 2004). The log-lineage-through time plots provide no indication that diversification rates in the Cape accelerated at any period, contrary to the hypothesized effects of the steep increase in aridity and seasonality, accompanied by the onset of modern fire regimes since 3–5 Mya (Linder 2003). Instead, our analyses agree with recent studies that propose that the fynbos biome has experienced a relatively stable environment during the Late Tertiary and Quaternary (Verboom et al. 2008), allowing the accumulation of lineages at a relatively constant pace (Linder 2008). The remarkably even tempo at which extant lineages have accumulated matches findings in Restionaceae, although restios began diversifying much earlier than Protea (Linder and Hardy 2004). The timing of migration events into the rest of southern Africa also falls within the broad range of dates obtained by Galley et al. (2007) for four monocot clades (dates fell within 26 Mya: Disa, Irideae, the Pentaschistis clade, and Restionaceae).

Comparison of diversification rates between the Cape and non-Cape lineages using the maximum BiSSE model was affected by switching between two alternative solutions depending on minor alterations in the tree topology and branch lengths among Bayesian samples of dated trees. Either the model inferred a high rate of extinction in non-Cape lineages and low emigration rate or a low extinction rate and high emigration rate. In line with discussion by Maddison (2006), the appearance of two non-Cape species isolated from the main non-Cape clade is consistent with either high extinction rate of non-Cape species or an excess of migration events into the Cape, an interpretation confirmed by exploration of the likelihood surface. However, the number of shifts between the two regions was too few to permit strong inferences from the maximum model; simplifying to the minimum model, all we can say with confidence is that diversification rates outside the Cape have been similar to those in the Cape; or, if anything, slightly higher. Despite uncertainty in the details, our results show that diversification rates of Protea were no higher in the Cape than elsewhere. Previous studies of other clades have inferred that diversification rates in the Cape were not especially rapid, but without statistically comparing diversification rates of the same clade in the Cape and neighboring regions.

One caveat is that our analyses placed missing species at random into the non-Cape clade, assuming a constant speciation rate model. If divergence times for missing species were concentrated more toward the present than in our simulations, for example, this might increase estimates of extinction rates outside the Cape (Nee et al. 1994). However, the estimate of net diversification rate is likely to be fairly robust to this uncertainty, because the same number of species for approximately similar total branching times will be present no matter where missing species are added. Nonetheless, uncertainty in the location of missing species may well explain the wide confidence limits in extinction rate estimates outside the Cape.

Although the timing of diversification has been similar in both the regions, the spatial extent and pattern of diversification is strikingly different. The area of occupancy of Protea outside the Cape is 17 times greater than within the Cape region (Fig. 1), but the density of species is far lower. Many species outside the Cape have exceptionally broad ranges (Fig. 7). There are some possible cases of speciation on a narrow scale, such as the five species restricted to the Barberton-Transvaal escarpment (the mountain proteas P. comptonii, P. curvata, P. laetans, P. rubropilosa, and the grassland protea P. parvula), which comprise a clade with weak support in our tree together with the more widespread grassland protea P. simplex. However, the extent of diversification within equivalent-sized areas is greatly reduced, despite the existence of areas of contiguous habitat suitable for Protea species of equivalent or greater size than the Cape. For example, based on the quarter-degree scale Protea Atlas data, P. subvestita has an occurrence of 113,500 km2 in the Natal Drakensberg region, which is greater than the total area of the Cape floristic region (90,000 km2, Linder 2005). Without a more complete sample of species outside the Cape in our phylogenetic tree, we cannot compare the spatial pattern of diversification quantitatively between the two regions, but it is clear that controlling for geographical area would yield a far greater net rate of diversification in the Cape than in the rest of Africa. The key to understanding the Cape biodiversity hotspot, therefore, lies in understanding how so many species could have originated and persisted in such a small area, not in the speed of their diversification.

Figure 7.

Frequency distribution of geographic range sizes of Protea species from the Cape region (white) and the rest of Africa (black). Range sizes were estimated conservatively as the number of one-degree grid squares occupied (data at a finer spatial scale are only available inside South Africa). The range size distribution differs significantly between the two regions (Kolmogorov-Smirnov test, D = 0.26, P < 0.05). The total area occupied by Protea in each region is 24 cells within the Cape and 380 cells in the rest of Africa.

Our comparisons assume that we have an accurate representation of species present in both regions. Monophyly of non-Cape sections facilitated addition of missing species to our analyses. Clearly, however, it would be desirable to sample the missing species and determine their relationships, which might narrow down the range of estimates obtained by the random placement of species. A more fundamental limitation, in common with most other broad-scale diversity studies, is that we relied on the existing alpha-taxonomy of the genus for our list of species. One possible explanation for the recent slow-down in lineage branching is that young species have not been recognized as such by traditional taxonomy. As mentioned above, no recent revision has considered the genus across its entire range. The Cape representatives have been better studied than the non-Cape ones in recent times, but there are putative new Cape species awaiting investigation, and several species, such as P. cynaroides, exhibit morphological variation among populations (Rebelo 2001). The non-Cape species are less morphologically distinct than Cape species and exhibit morphological variation within populations over large regions, rather than between geographically isolated populations (Chisumpa and Brummit 1987). However, whether this reflects further divergence in those species or simply phenotypic plasticity is not known: DNA studies at the population level would be needed to confirm or refute the current taxonomy. Another complication is that hybridization is known to occur between some species. We found no strong evidence consistent with hybrid speciation (i.e., contradictory results between unlinked markers for species living in close enough proximity to interbreed), but more variable markers such as microsatellites or further AFLPs and population samples would be needed to confirm this. Overall, we do not believe that changes in alpha-taxonomy could be of sufficient magnitude to change our conclusions and, in any case, the well-known differences in species density between the Cape and the rest of Africa are based on traditional taxonomic species; our results show that this difference is not due to the rate of accumulation of those units.

To conclude, our results provide strong evidence that the radiation of extant Protea lineages began in the Cape region and that the present-day diversity in this region arose through the prolonged accumulation of species at a moderate rate, rather than by recent, rapid radiation. The radiation of Protea outside the Cape resulted from a single-colonization event and subsequent expansion. Diversification has occurred at similar net rates in both regions. However, the spatial scale over which species are distributed is vastly different. Some feature of the Cape permits many unrelated plant taxa to speciate and especially to persist at fine spatial scales. Persistence is hard to study in the Cape flora because of the lack of detailed paleoecological data, but analyses of species abundance distributions combined with direct estimates of speciation rates, population history, and dispersal from genetic data could be used to evaluate alternative explanations (Latimer et al. 2005; Barraclough 2006; Etienne et al. 2006). Outside the Cape, species have not diversified to such an extent within small areas, but the geographical area over which suitable habitat is present is much larger. Future work sampling in the rest of Africa could establish how they have expanded through that region. In the Cape, studies at the population- and species-level are needed to uncover the causes of fine-scale divergence and persistence.

Associate Editor: J. Pannell


This research was supported by an NERC studentship to GR, a Royal Society University Research Fellowship to TGB, a European Commission Marie Curie EST Fellowship (“HOTSPOTS”) to JS and LMV, the Royal Botanic Gardens, Kew, and the NERC Centre for Population Biology at Silwood Park. The authors would like to thank J. Joseph, M. Powell, R. Cowan, F. Forest, and L. Csiba at RBG Kew for technical help, three anonymous reviewers for comments, and J. Lawton and R. Cowling for their enthusiastic support of this project.