## Introduction

After being relegated to a simple ‘footnote acknowledgment’ (Lynch, 1989) in biogeographical papers for a long time, dispersal is again receiving increased attention as a fundamental process explaining the distribution of organisms (de Queiroz, 2005; McGlone, 2005; Riddle, 2005; Cowie & Holland, 2006). This trend can be observed in the large number of phylogeny-oriented articles published since 2004 in this journal and in *Systematic Biology* with the word ‘dispersal’ in their abstract (54 and 48, respectively; more than one per month). The change is so noticeable that one may speak of a paradigm shift in historical biogeography from the vicariance approach, in which distributions were mainly explained as the result of geological isolating events (Nelson & Platnick, 1981), to one where dispersal takes a more prominent, or even primary role, in explaining current distribution patterns (de Queiroz, 2005). This shift in perspective (‘counter-revolution’, de Queiroz, 2005) has mainly been brought about by the popularization of molecular systematics and the possibility of estimating divergence times using molecular clocks. Many plant and animal lineages, whose distributions were originally explained by vicariance, appear to be too young to have been affected by the postulated geological events, suggesting their current distribution patterns are the result of dispersal (e.g. Baum *et al.*, 1998; Waters *et al.*, 2000; Cooper *et al.*, 2001; Arensburger *et al.*, 2004; Renner, 2004). Even the biotic patterns of regions such as the Southern Hemisphere, traditionally considered as the prime example of the vicariance scenario, appear to have been shaped in large part through trans-oceanic dispersal (Winkworth *et al.*, 2002; Vences *et al.*, 2003; Sanmartín & Ronquist, 2004; Sanmartín *et al.*, 2007).

This shift in perspective in empirical studies, however, has not been accompanied by a concomitant shift in theoretical and methodological approaches to biogeographical analysis. Cladistic biogeography (Humphries & Parenti, 1999) considered dispersal as a rare and chance phenomenon, incapable of explaining general, shared distribution patterns across groups. Biogeographical analysis was based on finding a general pattern of area relationships among the groups analysed, which was then interpreted as evidence of a common sequence of vicariance events dividing an ancestral biota. If dispersal was incorporated into biogeographical reconstructions (as in ‘phylogenetic biogeography’, Van Veller *et al.*, 2003), it was usually in the form of *ad hoc* explanations to explain departure from a strict vicariance model (Brooks & McLennan, 2001; Brooks *et al.*, 2001), or from a combination of vicariance and geodispersal (i.e. range expansion in response to the disappearance of a dispersal barrier) represented by the backbone of the area cladogram (Wojcicki & Brooks, 2005). These methods have in common that they are designed to find patterns of area relationships without explicitly making any assumptions about the underlying evolutionary processes; information about the biogeographical processes that have generated the patterns is ignored when constructing the general area cladogram (Ebach *et al.*, 2003) or inferred *a posteriori* when comparing the area cladogram to the patterns of individual groups (Wojcicki & Brooks, 2005).

In recent years, new methods of biogeographical inference have been developed that allow integration of all relevant biogeographical processes (i.e. dispersal, extinction, vicariance, and duplication) directly into the analysis through the use of explicit process models – the event-based approach. Each process is assigned a cost inversely related to its likelihood, and the analysis consists of finding the minimum-cost, most parsimonious explanation for the observed distribution pattern (Page, 1995, 2003; Ronquist, 1995, 1997, 1998, 2003; Page & Charleston, 1998; Sanmartín & Ronquist, 2002; Sanmartín, 2006). Event-based reconstructions specify both the ancestral distributions and the events responsible, thus making it easier to compare alternative evolutionary/biogeographical scenarios. Probably the most important contribution of event-based methods to analytical historical biogeography was the possibility to detect patterns of ‘concerted’ dispersal. This refers to repeated, *directional* dispersal resulting from common constraints, such as prevailing winds and ocean currents, and generating shared distributional patterns across multiple organism groups (Sanmartín & Ronquist, 2004; Sanmartín *et al.*, 2007). For the first time, both vicariance and dispersal hypotheses were amenable to analytical testing (Sanmartín *et al.*, 2007).

Although event-based methods represent an important advance over traditional, cladistic–vicariance approaches, they still have several limitations. Like most phylogeny-based inference methods used currently in biogeography, they are founded on the parsimony principle. This means that the cost of the events cannot be estimated directly from the data but must be fixed beforehand using *ad hoc* procedures such as permutation-based significance tests (Ronquist, 2003; Sanmartín & Ronquist, 2004). For example, parsimony-based tree fitting under the four-event model (Ronquist, 2003) requires vicariance to have a lower cost than dispersal in order to distinguish phylogenetically constrained distribution patterns from random patterns but the exact optimal cost ratio is difficult to determine – see Ronquist (2003) and Sanmartín *et al.* (2007) for more details. Another problem, due to the use of the parsimony or cost-minimization principle itself, is that the number of dispersal events is typically underestimated in event-based reconstructions (Sanmartín & Ronquist, 2004). Dating of divergence times and phylogenetic uncertainty (i.e. tree is not known without error) are other important factors that are difficult, if not impossible, to incorporate within the parsimony context.

Statistical approaches to biogeographical analysis, which model dispersal as a discrete-state stochastic process, have been proposed in recent years. For example, Huelsenbeck *et al.* (2000) developed a Bayesian tree fitting method that models host switching (dispersal) as a stochastic process capable of disrupting the topological congruence between the organism phylogeny and the host (area) cladogram, i.e. in the absence of dispersal the two cladograms are identical (Huelsenbeck *et al.*, 2000). Similarly, Ree *et al.* (2005) suggested a likelihood alternative to dispersal–vicariance analysis that models dispersal and extinction as stochastic anagenetic events occurring along internodes, while vicariance and duplication are treated as cladogenetic events responsible for the inheritance of biogeographical ranges. The advantage of these methods over event-based biogeography is that they allow biogeographical parameters of interest, such as dispersal rates, to be estimated directly from the data without the inherent bias of the parsimony approach.

### A dispersal-based biogeography?

Any of the mixed dispersal–vicariance methods described above, however, still assume vicariance as the primary explanation for shared distribution patterns, either by requiring a lower cost for vicariance than for dispersal (Ronquist, 2003), by assuming biogeographical congruence as the default background pattern requiring no stochastic modelling (Huelsenbeck *et al.*, 2000), or by considering dispersal as a stochastic process with no direct role in cladogenesis (Ree *et al.*, 2005). This situation contrasts with the increasing interest among biogeographers for the development of methods of biogeographical inference that give dispersal primacy over vicariance in explaining general biogeographical patterns (McDowall, 2004; de Queiroz, 2005; Riddle, 2005; Cowie & Holland, 2006).

Probably no other scenario is more appropriate for this dispersal-based biogeography than oceanic islands. Oceanic archipelagos of volcanic origin such as hotspot or arc archipelagos arose directly from magma rising up through the ocean and never had any geological connection to a continental landmass. Hence, their current biodiversity and observed biogeographical patterns is fundamentally the product of over-water dispersal (Cowie & Holland, 2006). Examples of this type of archipelago are the Hawaiian Islands in the Pacific Ocean, the Mascarene Archipelago in the Indian Ocean, or the Atlantic Canary Islands (see below). Dispersal is typically considered the key process generating biological diversity in islands (Emerson, 2002; Lomolino *et al.*, 2005; Cowie & Holland, 2006), even though vicariance is sometimes invoked to explain within-island (Juan *et al.*, 2000) or even inter-island speciation (e.g. Pleistocene fluctuating sea levels among the central Hawaiian Islands; Cowie & Holland, 2006). Moreover, dispersal in island systems seems to be capable of producing non-stochastic, highly concordant distribution patterns such as those expected from vicariance. For example, the predominant mode of colonization in hotspot archipelagos such as the Hawaiian Islands is the ‘stepping-stone model’, in which the pattern of island colonization follows the sequence of island emergence, with geologically younger islands more recently colonized than older ones. Most Hawaiian groups are apparently descendants of a single colonization event that follows this pattern of stepwise dispersal (Funk & Wagner, 1995). Mixed dispersal–vicariance methods such as those discussed above would not be appropriate for reconstructing this type of highly congruent, non-stochastic dispersal patterns because these methods do not associate dispersal with cladogenesis (Ree *et al.*, 2005) or because they give vicariance the primary role in explaining shared distribution patterns (Ronquist, 2003).

### Inferring dispersal: ancestral-state inference methods

Current research on island evolutionary biogeography focuses mainly on reconstructing patterns of island colonization in individual (Goodson *et al.*, 2006) or multiple (Emerson, 2002; Carine *et al.*, 2004) groups. The most common approach is to use some type of ‘ancestral character state’ inference method to reconstruct the number and sequence of dispersal/colonization events (Funk & Wagner, 1995; Nepokroeff *et al.*, 2003). Parsimony mapping of the ‘biogeographical character’ onto a phylogenetic tree – independently derived from morphological or molecular data – is by far the most popular approach (Funk & Wagner, 1995, and references therein; Moore *et al.*, 2002; Allan *et al.*, 2004; Goodson *et al.*, 2006). Often, intricate colonization pathways are directly deduced from the topology of the tree, leading to conclusions that are difficult to justify (see Emerson, 2002, for examples). A more sophisticated approach is to use phylogenetic programs, such as paup or MacClade, to optimize ancestral areas onto the internal nodes of the tree. This analysis consists of finding the reconstruction with the minimum number of character state changes (i.e. dispersal events) required to explain the distributions of the terminal taxa on the phylogenetic tree (Fig. 1). Different parsimony criteria can be used to implement alternative dispersal models: for example, Fitch Parsimony is appropriate for an unconstrained-dispersal model in which all transitions (dispersals) have the same cost (Fig. 1a; e.g. Moore *et al.*, 2002; Goodson *et al.*, 2006), while Wagner Parsimony may be used for a stepping-stone, sequential model in which dispersal is primarily from one island to an adjacent one along the island chain (Fig. 1b). Testing models against each other, however, is more problematic since the Fitch unconstrained model will always be more parsimonious – require fewer steps – than the Wagner ordered model (Fig. 1a,b). A more serious drawback of the parsimony approach is its disregard of two important sources of error (Ronquist, 2004): the uncertainty associated with estimating ancestral states on a given tree – only minimum-change reconstructions are evaluated even when alternative reconstructions could be almost as likely (e.g. Goodson *et al.*, 2006) – and the error in the phylogenetic estimate, since ancestral states are usually reconstructed on a single best tree assuming the phylogeny is known without error. This last source of error could be easily incorporated into the analysis if instead of a single input tree we use a set of weighted trees expressing our confidence in the different clades in the tree (Ronquist, 2003). For example, Huelsenbeck & Imennov (2002) inferred the geographic distribution for the most recent common ancestor to hominids by integrating over trees drawn from the posterior distribution of a Bayesian Markov Chain Monte Carlo (MCMC) analysis. They weighted each tree according to its posterior probability and then used Fitch optimization to reconstruct ancestral states on each of the trees. Their approach has the advantage that it incorporates topological uncertainty into the reconstruction but it ignores the error associated with reconstructing the evolution of a character on a given phylogenetic tree, which may be a more critical source of uncertainty. This is because information about ancestral states of a particular character comes only from the character itself, whereas the information about phylogenetic relationships is typically based on large sets of characters (Ronquist, 2004).

Parametric statistical approaches such as maximum likelihood (ML) (Pagel, 1994, 1999; Schultz *et al.*, 1996; Schluter *et al.*, 1997; Mooers & Schluter, 1999) offer the advantage over parsimony methods that they use an explicit stochastic model of evolution and branch length information to estimate the probability of change between ancestral states along a given branch – that is, they can account for the fact that changes are more likely along long branches than along shorter ones. Given a tree topology, branch lengths and the distribution of each species, the maximum likelihood approach finds the value of the biogeographical parameters that maximize the probability of observing the data. Since all alternative reconstructions are evaluated in estimating the relative probabilities of ancestral states, ML analyses do incorporate uncertainty in ancestral state reconstruction (Pagel, 1994, 1999). However, ancestral state changes are typically reconstructed over a fixed tree topology with fixed branch lengths (Nepokroeff *et al.*, 2003; Outlaw *et al.*, 2003), in which case the phylogenetic uncertainty is ignored. These methods also typically ignore the error associated with estimating the parameters in the substitution model(s).

Bayesian inference allows us to relatively easily incorporate both sources of uncertainty through random sampling from the posterior probability distribution of the phylogeny and other model parameters using the MCMC technique (see below). Unlike maximum likelihood, Bayesian analysis treats a model parameter as a random variable, whose posterior probability distribution we want to estimate (Holder & Lewis, 2003). This is done by integrating (marginalizing) over all possible values for the other parameters in the model, including the tree topology. Thus, an important property of Bayesian analysis is that inferences on model parameter estimates (i.e. the marginal posterior probabilities) are independent of the underlying phylogeny (Pagel *et al.*, 2004). The posterior probability distribution on a model parameter is typically summarized in the form of the mean and 95% credibility interval (the middle 95%) of the MCMC samples of that parameter.

Bayesian approaches to ancestral state reconstruction have been proposed in recent years (Huelsenbeck & Bollback, 2001; Pagel *et al.*, 2004) and even applied to biogeographical inference (Olsson *et al.*, 2006). These approaches model dispersal as a Markov chain stochastic process involving the transition between two or more discrete states with different rates or probabilities (Pagel *et al.*, 2004). However, when applied to biogeography, these approaches are usually based on the simplest model of dispersal (equal probabilities for all transitions) and have only been applied to single, individual groups (e.g. Olsson *et al.*, 2006). Here, we extend this approach to more complex dispersal models and to biogeographical analysis across many groups of organisms evolving on different trees, resulting in a general methodology for statistical analysis of island biogeography based on phylogenies and distributional data.