Parsimony analysis of endemicity (PAE) revisited


  • Juan J. Morrone

    1. Museo de Zoología ‘Alfonso L. Herrera’, Departamento de Biología Evolutiva, Facultad de Ciencias, Universidad Nacional Autónoma de México (UNAM), Mexico D.F, Mexico
    Search for more papers by this author

Correspondence: Juan J. Morrone, Museo de Zoología ‘Alfonso L. Herrera’, Departamento de Biología Evolutiva, Facultad de Ciencias, Universidad Nacional Autónoma de México (UNAM), Apartado Postal 70-399, 04510 Mexico D.F., Mexico.



Several methods have been proposed for use in identifying and classifying areas of endemism. Parsimony analysis of endemicity (PAE) is the most widely used. It constructs cladograms based on the cladistic analysis of presence–absence data matrices of species and supraspecific taxa. Several authors have criticized PAE, usually because they have misunderstood its theoretical basis. A summary of the procedure is presented here, along with a discussion of the interpretation of PAE cladograms. Some critics deny any place in evolutionary biogeography for non-phylogenetic approaches, but I believe evolutionary biogeography is a pluralistic discipline, where PAE has a place despite lacking a strictly phylogenetic perspective, and thus can be applied as the first step in an analysis. Other authors criticize the use of PAE as a cladistic biogeographical method, although their arguments may be circular because they refer to biogeographical analyses based on phylogenetic hypotheses. Finally, the use of PAE for identifying areas of endemism has been criticized because an optimality criterion is used a posteriori to select areas of endemism found by what have been considered as less appropriate means, and endemicity analysis (EA) has proved to be more efficient than PAE for identifying areas of endemism. Over the last few decades PAE has been used extensively to identify areas of endemism and to determine their relationships, playing a relevant role in evolutionary biogeography.


The term endemism refers to the restriction of a taxon to a particular geographical area; such a taxon is said to be endemic to that area (de Candolle, 1820, pp. 359–422). Endemism is one of the most significant features of geographical distributions, because species are rarely cosmopolitan and most species and supraspecific taxa are confined to restricted regions, at a variety of spatial scales, from continents to islands and mountain tops. Additionally, organisms are endemic at different taxonomic levels, and the size of an area depends on the category of the taxon, with genera having larger areas than species, and families having larger areas than genera. This situation, however, is not comparable between different taxa: the distribution of a plant species may correspond to the distribution of an insect genus. It has been suggested that endemism is a consequence of both historical and ecological factors: historical events explain how taxa became confined to their present ranges, vicariance events caused by tectonics being the most common explanation, whereas ecological explanations deal with the present limits of endemic taxa, with abiotic and biotic factors commonly being considered (Morrone, 2008).

Areas of non-random distributional congruence among different taxa are called areas of endemism (Morrone, 1994a). These areas are hierarchically arranged, with smaller areas of endemism nested within larger ones (Morrone, 2009; Crother & Murray, 2013, 2014), although some degree of overlap is evident, particularly in transition zones. This is what we expect to observe when evolutionary biogeographical processes produce historically structured biotic assemblages (Cracraft, 1994). The identification of areas of endemism is fundamental for biogeographical regionalization (Rosen & Smith, 1988; Escalante, 2009), i.e. the hierarchical arrangement of areas of endemism within a system of realms, regions, dominions, provinces and districts (Ebach et al., 2008). Several methods have been proposed to identify and classify areas of endemism (Morrone, 2007, 2009), parsimony analysis of endemicity (PAE) (Rosen, 1988a,b) being the most widely used. Several authors have criticized PAE (Humphries, 1989; Humphries & Parenti, 1999; Enghoff, 2000; Szumik et al., 2002; Brooks & van Veller, 2003; Santos, 2005; Nihei, 2006; Santos & Amorim, 2007; Garzón-Orduña et al., 2008; Peterson, 2008; Carine et al., 2009; Casagranda et al., 2012; Donato & Miranda-Esquivel, 2012), while others have defended it (Escalante & Morrone, 2003; Nihei, 2006; Morrone, 2009; Echeverry & Morrone, 2010; Escalante, 2011; Crother & Murray, 2013). The objective of this paper is to analyse PAE in order to clarify its theoretical basis and address some of its criticisms.

What is PAE?

PAE constructs cladograms based on the cladistic analysis of presence–absence data matrices of species and supraspecific taxa (Morrone, 2009). PAE cladograms allow the identification of biotic components, an ‘umbrella’ term that is used here for what are known as historical biotas (Salthe, 1985), biogeographical assemblages (Rosen, 1988a), taxonomic assemblages (Rosen, 1992) and species assemblages (Cracraft, 1994), and their hypothetical relationships. These biotic components are represented graphically as areas of endemism or generalized tracks (Morrone, 2009).

PAE was formulated originally by Rosen (1984, 1985) and fully developed by Rosen (1988a,b) and Rosen & Smith (1988). Several authors have contributed to its theoretical development (Craw, 1988; Cracraft, 1991; Myers, 1991; Morrone, 1994a, 2009; Luna-Vega et al., 2000; Trejo-Torres & Ackerman, 2001, 2002; Cecca, 2002; García-Barros et al., 2002; Porzecanski & Cracraft, 2005; Ribichich, 2005; Nihei, 2006; Echeverry & Morrone, 2010; Crother & Murray, 2013, 2014). Some of these authors have proposed modifications of PAE that they then believed deserved new names, such as parsimony analysis of shared presences (Rosen & Smith, 1988), parsimony analysis of distributions (Trejo-Torres & Ackerman, 2001), parsimony analysis of species assemblages (Trejo-Torres & Ackerman, 2002), cladistic analysis of distributions and endemism (Porzecanski & Cracraft, 2005) and parsimony analysis of community assemblages (Ribichich, 2005).

Over the years, several authors have summarized the procedure followed in a PAE (Rosen, 1988a, 1992; Craw, 1989; Morrone, 1994a, 2004, 2009; Posadas & Miranda-Esquivel, 1999; Crisci et al., 2000, 2003; Cecca, 2002; Espinosa Organista et al., 2002; Escalante & Morrone, 2003; Contreras-Medina, 2006; Lomolino et al., 2006; Morrone & Escalante, 2009; Echeverry & Morrone, 2010; de Carvalho, 2011; Camardelli & Napoli, 2012; Crother & Murray, 2013). A general procedure is presented below that incorporates all the different variations (Fig. 1).

Figure 1.

Flowchart showing the steps of parsimony analysis of endemicity (PAE). (a) Localities analysed; (b) pre-defined areas of endemism or areas defined by physiographical criteria; (c) grid cells; (d) locality records; (e) individual tracks; (f) modelled distributions; (g) phylogenetic information from supraspecific taxa; (h) data matrix; (i) cladogram obtained; (j) areas of endemism as groups of grid cells; (k) areas of endemism as coarse maps; (l) generalized tracks.

  1. Choose a set of biogeographical units across the study area, for example localities (Fig. 1a), pre-defined areas of endemism or areas defined by physiographical criteria (Fig. 1b), or grid cells (Fig. 1c).
  2. Determine the geographical distribution of the taxa being analysed, by simply recording their localities (Fig. 1d), constructing individual tracks (Fig. 1e) or modelling their distributions (Fig. 1f). If available, consider adding phylogenetic information from supraspecific taxa (Fig. 1g).
  3. Construct an r × c matrix (Fig. 1h), where r (rows) represents the biogeographical units analysed and c (columns) represents the species and/or supraspecific taxa. Each entry is coded as either 1 or 0, depending on whether each taxon is present or absent in the unit. A hypothetical unit coded as all zeros is added to the matrix in order to root the resulting cladogram(s).
  4. Analyse the matrix with a parsimony algorithm. If more than one cladogram (Fig. 1i) is found, calculate a strict consensus cladogram.
  5. Identify biotic components in the resulting cladogram as the monophyletic groups of units defined by at least two taxa (= synapomorphies). Additionally, if an historical interpretation is being applied, infer specific biogeographical processes from the optimized taxa onto the cladogram: synapomorphies as vicariance events, parallelisms as dispersal events, and reversals as extinction events.
  6. Represent the biotic components identified in the previous step on a map as areas of endemism [groups of grid cells (Fig. 1j), coarse maps (Fig. 1k)] or generalized tracks (Fig. 1l).

Biogeographical units

PAEs have been undertaken using a variety of biogeographical units (see Appendix S1 in Supporting Information). Rosen (1988a,b) originally used localities as the unit. Localities have been used frequently in palaeobiogeographical analyses, as well as in studies dealing with Recent taxa (e.g. Ron, 2000; Anstey et al., 2003; Ribichich, 2005; Navarro et al., 2007; Ramírez-Arriaga & Martínez-Hernández, 2007; Gates et al., 2010; Aguirre et al., 2011). Craw (1988) and Cracraft (1991) used predefined areas of endemism as the biogeographical unit, and these have been used as the unit in several analyses (e.g. Glasby & Álvarez, 1999; De Grave, 2001; Goldani et al., 2002; Katinas et al., 2004; Espinosa et al., 2006; Fattorini, 2007; Albert & Carvalho, 2011). A few analyses have used politically defined areas (Cué-Bär et al., 2006; Nelson, 2008; Ribeiro & Eterovic, 2011), areas defined by physiographical criteria (Aguilar-Aguilar et al., 2003, 2005; Espinosa et al., 2006; Huidobro et al., 2006) or even arbitrary operational units (van Soest, 1994). Morrone (1994a) proposed the use of grid cells when the objective of the analysis was to identify areas of endemism, and they have become a frequently used unit (e.g. García-Barros, 2003; Rojas-Soto et al., 2003; Vergara et al., 2006; Navarro-Sigüenza et al., 2007; Herrera-Paniagua et al., 2008; Meng et al., 2008; Löwenberg-Neto & de Carvalho, 2009; Ramírez-Barahona et al., 2009; DaSilva & Pinto-da-Rocha, 2011). In a few analyses, latitudinal or elevational transects have been used (Morrone et al., 1997; Trejo-Torres & Ackerman, 2002; García-Trejo & Navarro, 2004; Mihoč et al., 2006; Moreno et al., 2006; Espinosa-Pérez et al., 2009).

Choosing the most appropriate biogeographical unit depends on the objective of the analysis. When the objective is to identify areas of endemism, grid cells are the most adequate unit. When the objective is to determine relationships between areas, pre-defined areas of endemism should be used.


In the first analyses, genera were used as the data (columns) in the data matrices (Rosen, 1988a,b; Rosen & Smith, 1988; Rosen & Turnšek, 1989). Species then became the most common unit, although occasionally genera and other supraspecific taxa have been used (e.g. Fortey & Cocks, 1992; Davis et al., 2002; Silva & Gallo, 2007; McCoy & Anstey, 2010). Craw (1989), Cracraft (1991) and Myers (1991) independently suggested combining species and supraspecific taxa in order to incorporate some phylogenetic (and thus historical) information, and several authors have followed this approach (e.g. Morrone, 1994b, 1998; De Grave, 2001; Luna-Vega et al., 2001; Morrone & Márquez, 2001; Escalante et al., 2003; McInnes & Pugh, 2007; Santos et al., 2007; Sánchez-González et al., 2008; Zamora-Manzur et al., 2011). Porzecanski & Cracraft (2005) formalized this procedure, naming it cladistic analysis of distributions and endemism (CADE). In spite of their differences, PAE and CADE can be regarded as variants of the same general method (Morrone, 2009; Parenti & Ebach, 2009).

Locality data may be geographically fragmented and as a result be relatively uninformative, and coarse maps built with polygons of marginal locality records may result in significant distributional over-predictions. Between these extremes, some authors have built individual tracks before compiling the data matrix, so that an entry is coded as 1 when an individual track is present throughout all the sample areas or crosses a given biogeographical unit (e.g. Andrés Hernández et al., 2006; Espinosa-Pérez et al., 2009; Echeverry & Morrone, 2010, 2013). Other authors have used projected ecological niche modelling to provide a potential distribution to overcome the shortcomings of poorly sampled localities (Espadas Manrique et al., 2003; Rojas-Soto et al., 2003; Rovito et al., 2004; Escalante et al., 2007a,b,c; Navarro-Sigüenza et al., 2007; Gutiérrez-Velázquez et al., 2013).

What comprises the most appropriate data depends on the information available. When there is enough information, locality data are the optimum; when data are scarce, coarse maps or ecological niche models are adequate. Individual tracks should be used when working under a panbiogeographical framework.


When compiling the matrix, data are conventionally coded using a binary system, with 0 for absence and 1 for presence in each biogeographical unit. Craw (1988) used additive multistate characters when coding information on supraspecific taxa. Some authors have suggested using ‘?’ for doubtful taxonomic records (Smith, 1992; Posadas & Miranda-Esquivel, 1999; Echeverry & Morrone, 2010).

Rooting the cladogram(s) with a hypothetical unit coded as all zeros implies an ‘ancestral’ condition where all the taxa are absent (Geraads, 1998). For a historical interpretation, Bisconti et al. (2001) assumed that this implicitly excludes dispersal a priori. Rosen & Smith (1988) and Crother & Murray (2013) suggested that it was possible to work with an unrooted cladogram. Cano & Gurrea (2003) and Ribichich (2005) used an area coded as all ones, which implies grouping areas according to shared absences and assuming a biotic impoverishment through time starting from a cosmopolitan biota (Rosen & Smith, 1988; Cecca, 2002). Vázquez-Miranda et al. (2007) rooted the cladograms with a real area.

PAEs are usually performed using equal weights. Linder (2001) suggested weighting species inversely to their distribution areas, in order to minimize homoplasy caused by widespread taxa. Analyses with implied weights (Goloboff, 1993) have been undertaken for similar reasons (Luna-Vega et al., 2000; García-Barros, 2003; Escalante et al., 2007b; Aguirre et al., 2011; Ribeiro & Eterovic, 2011). In order to maximize the ratio of reversals/parallelisms, Smith (1992) used acctran (a phylogenetic optimality criterion that puts the character change as close to the root of the cladogram as possible) so that secondary losses of a taxon are made more likely than independent evolution; this procedure was used by Geraads (1998) and Escalante et al. (2007b). On the other hand, Dollo optimization was applied by Rosen & Smith (1988), Glasby & Álvarez (1999), Unmack (2001) and Fattorini & Fowles (2005). Enghoff (2000) suggested coding characters as irreversible to avoid having clades supported by reversals.

Once the parsimony analysis has identified the most parsimonious cladogram(s), it is possible to remove the taxa supporting the clades and repeat the analysis until no more taxa (synapomorphies) support any clade. This procedure is known as parsimony analysis of endemicity with progressive character elimination (PAE-PCE), and was proposed independently by Luna-Vega et al. (2000) and García-Barros et al. (2002). It has been used by a few authors (García-Barros, 2003; Huidobro et al., 2006; Vergara et al., 2006; Corona et al., 2007; Martínez-Aquino et al., 2007; Zamora-Manzur et al., 2011). Echeverry & Morrone (2010, 2013) used it for panbiogeographical analyses, where the alternative clades obtained in different analyses identify further generalized tracks, and thus make it possible to find areas where different generalized tracks overlap, which are considered to be nodes or composite areas.

The requirement that monophyletic groups in the PAE cladogram are defined by at least two taxa has been contested by Crother & Murray (2011, 2013, 2014). These authors consider it possible to identify an area of endemism when no unique species occur in the area but instead there is a unique combination of species. They based their idea on two assumptions: that areas of endemism are philosophically individuals (as opposed to classes) and that they are hierarchically arranged. Hovenkamp (2014) argued that Crother & Murray's (2011, 2013) interpretation may lead to an enormous number of areas of endemism and that the hierarchical nature of such areas is a methodological artefact of PAE. Furthermore, Hovenkamp (2014) challenged the notion that nestedness is a general property of areas of endemism. However, I agree with Crother & Murray's (2014) assertion that Hovenkamp's stance is inconsistent with global biotic distributions.

PAEs work implicitly under the ‘total evidence’ approach. Some authors, however, have explored partitioning the data matrix into separate sets (e.g. Cracraft, 1991; Myers, 1991; Morrone, 1998; Ron, 2000; García-Trejo & Navarro, 2004; Porzecanski & Cracraft, 2005; Fattorini, 2009a,b; Watanabe, 2012). Separate analyses based on time series of data from a succession of geological intervals or stratigraphical horizons have allowed dynamic interpretations (Rosen & Smith, 1988; Smith & Xu, 1988; Rosen & Turnšek, 1989; Fortey & Cocks, 1992; Geraads, 1998; Aguirre et al., 2011), although another possibility for dynamic PAE would be to partition the data according to the estimated ages of the taxa using molecular clocks (N. Gámez, UNAM, pers. comm.). Roig-Juñent et al. (2002) pruned grid cells that were found to be conflicting in a preliminary analysis. Gutiérrez-Velázquez et al. (2013) applied a null model of significant co-occurrence to the specific distributional data in order to filter those species that showed no significant co-occurrence.

Interpretation of PAE cladograms

There are different ways of interpreting the patterns identified by PAEs. Rosen (1988a) considered there to be two basic alternatives when interpreting PAE cladograms: static and dynamic. A static interpretation (descriptive PAE; sensu Escalante, 2011) is based on a reconstruction of geographical and geological features on a single geological horizon, without including any phylogenetic information. A dynamic interpretation attempts to relate the results to a reconstruction of geological events, based on two or more cladograms obtained at different time periods (Cecca, 2002; Nihei, 2006; Cecca et al., 2011), or by adding phylogenetic information concerning supraspecific taxa. Rosen (1988a) suggested three situations in which a static PAE may be interpreted tentatively as dynamic: when there is a particular pattern among several groups of organisms as a result of major geological events; when a single cladogram can be viewed as a hypothesis about the history of the areas and reconciled with independent geological evidence; and when cladograms are being explained using palaeoecological data.

Patterns detected by static or dynamic PAE may be interpreted historically or ecologically (Rosen, 1988a). If the area rooted with all zeros is interpreted as an area lacking favourable conditions for the taxa to survive (ecological interpretation), area relationships will indicate ecological affinities. If it is interpreted as a geologically ancient area, where none of the taxa has yet evolved or arrived by dispersal (historical interpretation), area relationships will indicate vicariance events or biotic interchanges (Morrone, 2009). Most authors adopt a historical interpretation (historical PAE; sensu Escalante, 2011), usually from a vicariance viewpoint. Trejo-Torres & Ackerman (2002) and Ribichich (2005) adopted an ecological interpretation. In spite of recognizing these alternatives, Rosen (1988a, p. 462) warned that ‘[i]t is therefore not yet possible to distinguish how far a single PAE pattern is ecological or historical’, although he suggested that when a particular pattern recurs across different taxa a historical explanation is more likely than an ecological one. Cracraft (1991, p. 222) suggested that ‘ecology versus history’ is a false dichotomy, because they may be inseparable: temporal and spatial variation in abiotic factors also influences patterns of vicariance.

Patterns detected by PAE are occasionally compared with those obtained from cladistic biogeographical analyses (see below). A rationale for contrasting the interpretation of PAE and cladistic biogeography can be achieved by applying the concept of biogeographical homology (Morrone, 2001, 2004). Primary biogeographical homology (Fig. 2) results from identifying areas of endemism or generalized tracks, and represents a hypothesis on a common biotic history based on distributional congruence. Secondary biogeographical homology (Fig. 2) refers to the cladistic biogeographical test of the previously recognized homology (phylogenetic congruence). According to this distinction, PAE is aimed at recognizing primary biogeographical homology, whereas cladistic biogeography is aimed at secondary biogeographical homology. They both represent the first steps of an evolutionary biogeographical analysis (Morrone, 2007, 2009).

Figure 2.

Relationship between parsimony analysis of endemicity (PAE) (primary biogeographical homology) and cladistic biogeography (secondary biogeographical homology).

Critiques of PAE

Humphries (1989) and Humphries & Parenti (1999) consider PAE to be an invalid cladistic biogeographical method, because it does not take into account the phylogenetic relationships of the taxa analysed. According to Humphries (1989, p. 102) ‘A more recent and worrying development has been the attempt to rid biogeography of biological relationship and simply consider distributions based on existing taxonomy’. Humphries & Parenti (1999, p. 41) stated that ‘[PAE] is not an historical method’. Parenti & Ebach (2009) consider PAE to represent a ‘pseudo-cladistic analysis’ (p. 137) and that it produces an ‘area phenogram’ (p. 139). Santos (2005) and Santos & Amorim (2007) have reiterated these arguments, stating that ‘the reconstruction of the history of use of space by species strictly depends on phylogeny’ (Santos, 2005, p. 1284), ‘whatever the methods employed, it does not make sense to perform historical biogeographical analysis without this focus’ (Santos, 2005, p. 1284), and that ‘PAE is a method that intends to identify areas of endemism and the relationships among them but, as a non-phylogenetic reconstruction procedure, it is unable to distinguish whether an area is historically more closely related to another with regard to a third one’ (Santos & Amorim, 2007, p. 70). Garzón-Orduña et al. (2008, p. 904) consider historical biogeography ‘to be a discipline that rests on the theoretical principles and application of phylogenetic systematics, and thus relies on the use of phylogenies to produce historical reconstructions of area relationships'. I find that all these authors deny any place for non-phylogenetic approaches in evolutionary biogeography. In contrast, other authors consider evolutionary biogeography to be a pluralistic discipline, where PAE has a place in spite of lacking a strict phylogenetic basis, and where it may be applied as the first step in the analysis when identifying areas of endemism or generalized tracks (Morrone & Crisci, 1995; Crisci et al., 2000, 2003; Nihei, 2006; Riddle & Hafner, 2006; Morrone, 2007, 2009).

Instead of the previous theoretical criticisms, Brooks & van Veller (2003) criticized the use of PAE as a cladistic biogeographical method after using it in a series of case studies. They found that PAE identifies historically meaningful area relationships when species become distributed over those areas through a particular combination of vicariance and non-response to vicariance events or when species' distributions result from a particular combination of extinction events affecting widespread species. Under three circumstances the area relationships obtained were uninformative or incorrect: as a result of not using phylogenetic relationships among the species within the clades; when there was a shared absence of a given species; and when there were shared episodes of post-speciation dispersal. Brooks & van Veller (2003) concluded that PAE is the least defensible and least desirable of all cladistic biogeographical methods. Garzón-Orduña et al. (2008) also tested the efficiency of PAE for recovering historical relationships in previously published cladistic biogeographical analyses, and compared it with Brooks parsimony analysis (BPA) and an event-based method. They found that PAE and BPA tend to provide similar results but, in relation to the event-based models, their performance was poor, the number of ‘historical nodes’ recovered using PAE being negatively correlated with a dispersal/vicariance ratio. Garzón-Orduña et al. (2008) concluded that PAE is unable to recover historical patterns and therefore does not fit into the current paradigm of historical biogeography, although they acknowledged that it may be a useful tool for identifying areas of endemism. Unfortunately, Garzón-Orduña et al.'s (2008) comparison is a petitio principii or an example of circular reasoning, as they stated that ‘since we consider biogeographical methods using a phylogenetic hypothesis to be more likely to recover historical statements about area relationships, we used the topology obtained from the historical methods as the reference topology’ (p. 905). Escalante (2011) also considered this comparison inadequate, because PAE and cladistic biogeography have different objectives. Donato & Miranda-Esquivel (2012) dismissed Escalante's criticism, considering that PAE and cladistic biogeography do share the same objectives.

Enghoff (2000) suggested that PAE can be seen as an extreme ‘assumption 0’ approach, because only widespread taxa provide evidence of area relationships. Thus, although not being strictly a cladistic biogeographical method, PAE behaves like an incomplete implementation of BPA (Morrone & Márquez, 2001; Ebach et al., 2003). CADE also implements BPA incompletely, because only some clades are incorporated to provide additional data. The relationship between the information used in PAE, CADE and BPA is shown in Fig. 3. In spite of focusing on a subset of the information analysed in cladistic biogeographical analyses, PAE can produce similar results (Morrone et al., 1997; Ron, 2000; Morrone & Escalante, 2002).

Figure 3.

Relationship between the data used by parsimony analysis of endemicity (PAE), cladistic analysis of distributions and endemism (CADE) and Brooks parsimony analysis (BPA).

Szumik et al. (2002) criticized the use of PAE for identifying areas of endemism because an explicit optimality criterion is used a posteriori to select areas of endemism found by what they consider to be less appropriate means. They concluded that ‘parsimony is indeed an appropriate criterion for phylogenetic reconstruction, but it cannot be adapted to a field with completely different goals and premises’ (Szumik et al., 2002, p. 808). Casagranda et al.'s (2012) comparative analysis found that PAE's poor performance is more evident when there are overlapping and disjunct distributions. The method proposed by Szumik et al. (2002), which implements an optimality criterion based on considering only the distributional data that are relevant to endemism, has been shown to be more efficient than PAE for identifying areas of endemism (Carine et al., 2009; Escalante et al., 2009; Casagranda et al., 2012).

Nihei (2006) reviewed how PAE has been applied by previous authors, and discussed its history and theoretical basis. Nihei considered that most of the criticisms have dealt with its methodology rather than with its theory, and that these criticisms have usually resulted from confusion between the dynamic and static approaches. Nihei (2006) concluded that a single PAE cladogram (static PAE) is not reliable for evaluating area relationships, whereas a comparison of PAE cladograms from different geological layers (dynamic PAE) could identify reliable area relationships. Nihei (2006) warned biogeographers to be aware of the limitations of both dynamic and static PAE, to evaluate new variations of PAE, and to test dynamic PAE experimentally as a tool for inferring area relationships.

Peterson (2008) considered that, although PAE has become a popular analytical approach, it has serious drawbacks that make correct inferences of biogeographical history inadequate, namely: (1) rooting cladograms with all units coded as 0; (2) non-endemism is required for insight; (3) PAE may group areas based on shared absences; and (4) PAE is not applicable to artificially delimited areas. The first point is incorrect, because PAE cladograms can also be rooted with all units coded as ones, or a real area, or an unrooted analysis can be performed. The second point is based on a misunderstanding of the concept of endemism and the fact that it can occur at different hierarchical levels; taxa allegedly ‘non-endemic’ are in fact endemic to larger areas of endemism (for example, a taxon inhabiting areas A and B is not endemic to either of them but is to AB). The third point states a correct fact, but in the most parsimonious cladograms shared absences are interpreted as extinction events; to deny this possibility a priori would imply some model where extinctions are unlikely. The fourth point refers to the fact that some authors have analysed areas as countries that in fact do not constitute a natural area, but I think this depends on the objective of the analysis. Peterson (2008, p. 542) concluded that ‘PAE falls short owing in largest part to its absolute focus on vicariance. Dispersal also exists, and is a major structuring force in biogeographical processes, like it or not. Otherwise, species’ ranges would only subdivide further and further through time, and biological diversification would only produce more and more micro-scale endemism’ and ‘PAE denies these mechanisms in its reconstructions, and its reconstructions are thereby unreliable and quite suspect’. I do not consider this to be a valid criticism, because vicariance is not the only possible cause of the patterns identified; geodispersal (Lieberman, 2000) or even ecological factors are also possible explanations.

How can PAE be justified?

The existence of areas of endemism and their hierarchical organization is strong evidence that biotic components are historically structured (Cracraft, 1994; Crother & Murray, 2013, 2014). Thus identifying areas of endemism and discovering how they are interrelated are basic tasks of evolutionary biogeography (Morrone, 2009). PAE plays an interesting role, because it is used for both objectives. With the specific objective of identifying areas of endemism, there are several alternatives available, e.g. phenetic clustering (Linder, 2001; Moline & Linder, 2006), endemicity analysis (Szumik et al., 2002; Szumik & Goloboff, 2004), nested areas of endemism analysis (Deo & DeSalle, 2006), sympatry networks (Dos Santos et al., 2008) and network analysis (Torres-Miranda et al., 2013). There are several published comparisons of the most commonly used, namely phenetic clustering, PAE and endemicity analysis (Linder, 2001; Trejo-Torres & Ackerman, 2002; Szumik & Goloboff, 2004; Moline & Linder, 2006; Díaz Gómez, 2007; Casazza et al., 2008; Carine et al., 2009; Casazza & Minuto, 2009; Escalante et al., 2009; Casagranda et al., 2012). These comparisons basically show that the methods differ in the number of areas of endemism identified and in the number of taxa supporting them. PAE appears to be intermediate, with phenetic clustering being the least effective and endemicity analysis being the most robust approach. For identifying generalized tracks, there are three software packages available: Trazos2004 (Rojas Parra, 2007), Croizat (Cavalcanti, 2009) and MartiTracks (Echeverría-Londoño & Miranda-Esquivel, 2011). Unfortunately, these software packages consider any degree of overlap between parts of two or more individual tracks as a generalized track (for a critique of MartiTracks see Ferrari et al., 2013). This is a substantial modification of the concept of a generalized track, which results from the significant superposition of two or more individual tracks, not parts of them (Zunino & Zullini, 1995; Crisci et al., 2000, 2003; Morrone, 2009). As concluded by Casagranda et al. (2009), when criticizing a software package aimed at identifying areas of endemism (Dos Santos et al., 2008), endemism should be based on distributional congruence, not merely overlap. At the moment, implementation of PAE-PCE (Echeverry & Morrone, 2010) seems to be the best approach for identifying generalized tracks (Ferrari et al., 2013).

Concerning the relationships between areas of endemism, PAE competes with cladistic biogeography, although several authors have recognized explicitly that PAE is not a cladistic biogeographical method (e.g. Rosen, 1988a, 1992; Sfenthourakis & Giokas, 1998; Seeling et al., 2004; Morrone, 2005, 2007, 2009). Rosen (1988a, p. 457) considered that they both generate historical hypotheses on area relationships, although ‘PAE is still experimental and the theoretical basis for historical inference has yet to be developed satisfactorily’. Some authors have accepted using PAE when phylogenetic information is lacking (e.g. Lieberman, 2000; Yeates et al., 2002; Fattorini & Fowles, 2005; Michaux & Leschen, 2005; Wiley & Lieberman, 2011; Ávila et al., 2012). It has been suggested that PAE can constitute a preliminary step in a biogeographical analysis, where its results are tested with a cladistic biogeographical analysis (Cracraft, 1991, 1994; Morrone & Crisci, 1995; Crisci et al., 2000, 2003; Crisci, 2001; Morrone, 2001, 2004, 2009; Morrone & Escalante, 2002; Porzecanski & Cracraft, 2005; Contreras-Medina, 2006; Riddle & Hafner, 2006).

PAE is neutral regarding the biogeographical processes involved. Both history and ecology play a role in determining biotic components (Cracraft, 1991; Szumik et al., 2002; Morrone, 2009). It is a pattern-orientated method (Cecca, 2002; Echeverry & Morrone, 2010), so demanding explanations regarding processes (e.g. Garzón-Orduña et al., 2008) or focusing on a particular one (e.g. Peterson, 2008) is misplaced. In fact, PAE may be considered to be similar to phenetic clustering, although the former creates groups based only on shared presences, whereas the latter uses overall similarity (Rosen & Smith, 1988; Waggoner, 1999; Fattorini & Fowles, 2005; Porzecanski & Cracraft, 2005; Gates et al., 2010; Wiley & Lieberman, 2011).

Over the last few decades PAE has been used extensively to identify areas of endemism and to determine their relationships. In general, it has played an important role in evolutionary biogeography. Although some authors may lament its use, many practising biogeographers have found PAE useful. Whether it will continue to be used or will be replaced by more appropriate methods is something that cannot be predicted. My hope is that it will help address new biogeographical issues, for both evolutionary and ecological biogeography.


I thank Dolores Casagranda, Fabrizio Cecca, Malte Ebach, Amparo Echeverry, Tania Escalante, Oscar Flores-Villela, Niza Gámez, Isolda Luna-Vega, Sergio Roig-Juñent, Brian Rosen, Luis Sánchez-González, Claudia Szumik and two anonymous referees for providing useful comments that helped improve the manuscript.


Juan J. Morrone is Professor of Biogeography, Systematics and Comparative Biology at the Facultad de Ciencias, Universidad Nacional Autónoma de México (UNAM). His main interests are in evolutionary biogeography and phylogenetic systematics.