Much effort has gone into understanding the spatial structure of populations, from the collaboration between Wright and Dobzhansky during the Evolutionary Synthesis (Lewontin et al. 1981; Provine 1986), through to the many thousands of present-day studies that infer population structure from genetic variation. Knowing the history of populations is of interest in itself, and indirect estimates of dispersal and density may aid conservation. However, much work has been motivated by interest in the interaction between population structure and selection. In Wright's (1932) Shifting Balance Theory, selection acts more effectively in a structured population, whereas current efforts to detect selection and to map disease alleles depend on understanding the null model, of neutral variation in a structured population (e.g., Nielsen et al. 2007; Coop et al. 2009).

Yet, despite this long-standing interest, models of population structure are almost entirely limited to arrays of demes of constant size, and connected either with no spatial structure (the island model; Wright 1931), or in a one- or two-dimensional lattice (the stepping-stone model; Malécot 1948; Kimura and Weiss 1964). The evolution of both allele frequencies and of genealogies is well understood for these kinds of model (e.g., Charlesworth et al. 2003; Wakeley 2008). Yet, as we explain below, they fail to account for key features of genetic variation in nature. Moreover, there are fundamental difficulties in applying even the classical models to populations that live in continuous two-dimensional habitats, as we explain below (Felsenstein 1975; Nagylaki 1978; Barton et al. 2002). The lack of adequate theoretical models has forced phylogeographic analysis of long-term population history to be almost entirely qualitative (Knowles and Maddison 2002; Hey and Machado 2003; Avise 2004).

Of course, we could model an arbitrary population structure by specifying the population size and rates of migration over time. However, we hardly ever have this detailed information, and cannot hope to estimate it all from genetic data. Thus, we must try to condense a plethora of parameters that describe detailed and realistic models, into a few effective values describing the key features of the population. For example, in a panmictic population, the short-term rate of random drift is summarized by the effective population size, *N _{e}*, which depends on sex ratio, variance in fitness, and so on. In a spatially continuous habitat, the deterministic effects of gene flow are accurately approximated as a diffusion process with rate σ

^{2}, the variance in distance between parent and offspring along some axis—a single parameter that describes the distribution of distances moved by genes. However, it is difficult to extend this spatial diffusion approximation to include random drift. In one dimension, the approximation is well-behaved, and predicts smooth changes in neutral allele frequency, correlated over distances of , where μ is the mutation rate (Malécot 1948; Nagylaki 1978). In two dimensions, however, drift produces significant local fluctuations that cannot be described by diffusion. In a stepping-stone model, with fixed deme sizes, a diffusion approximation is accurate over all but very small scales (Malécot 1948; Kimura and Weiss 1964; Barton and Wilson 1995; Barton et al. 2002). However, fluctuations in population size in space and time, which are inevitable in a continuous population, may substantially alter the effective diffusion rate and the effective density. Indeed, it is difficult to find any analytically tractable model of a two-dimensional population with continuous space, that could serve as a null model from which we could define “effective” parameters. As Felsenstein (1975) showed, one cannot assume both a uniform density and independent movements: fitness has to decrease with local density if the population is to be stable, which necessarily leads to correlations between the movements and reproduction of nearby individuals. We incorporate these in a way that is tractable both forwards and backwards in time, giving a model that can be used to define effective parameters that describe a wider range of more realistic schemes.

A further difficulty with using the diffusion approximation to describe gene flow in a spatial continuum is that it is not clear how to follow the ancestry of lineages, traced backwards in time. The coalescent process describes this ancestral process in a panmictic population, and yields simple formulae for the distribution of coalescence times, and hence of allele frequencies. Moreover, it allows rapid simulation of just the sampled lineages, rather than of the entire population. The coalescent extends naturally to a population that is clustered into demes of given size (Hudson 1990; Cox and Durrett 2002; Zähle et al. 2005), but it is not known how to extend it to a two-dimensional spatial continuum, or to model fluctuations in population size. It would be natural to assume that lineages move in a random walk, with some probability of coalescence when they are sufficiently close; the common ancestor would be at a random location, centered on the midpoint between the coalescing lineages. Indeed, Wright (1943) proposed essentially this model for isolation by distance, long before the coalescent process was defined mathematically. Unfortunately, this simple model is inadequate. First, as noted above, regulation of population density implies some correlation in the movements of nearby lineages. Second, this naive model is inconsistent. If a large genealogy is followed in two dimensions, then the distribution of two lineages chosen from within it is not the same as the distribution of two lineages, modeled alone (Fig. 1B). This is because coalescence with an unsampled lineage causes a jump to the common ancestor, which would not be seen if we only followed one of the lineages involved. Thus, including more lineages in the sample changes the pattern along any one lineage by inducing more jumps in it. Yet, an algorithm that describes the ancestry of a sample is essential if we are to simulate evolution in populations of a realistic size. For example, a population that lives on a range ∼10^{3}σ across will contain ∼10^{6} neighborhoods, which can only be simulated with a reasonable speed if there are very few individuals per neighborhood. Individual-based simulations of the whole population are therefore constrained to unreasonably small neighborhood sizes or range sizes.

Classical models of diffusive gene flow fail in (at least) three ways. First, they cannot explain patterns of allele frequency or genealogical relationship over large spatial scales, simply because diffusion is too slow to move genes far enough. A gene will typically diffuse a distance in *t* generations. Because many species may occupy ranges at least 10^{3}σ across, diffusion across the range will take ∼10^{6} generations—and yet we know that at least at high latitudes, ranges have only been stable since the last Ice Age, for ∼10^{4} years at most. To put this another way, neutral variation, in a balance between gene flow, mutation and drift, is predicted to fluctuate over scales of , where μ is the mutation rate. Thus, if mutation is ∼10^{−6} per gene per generation, patterns cannot spread over scales of more than ∼10^{3}σ, even if the population remained stable for that long. (Note that long-range movement of individuals will not greatly alter this argument. Provided that the standard deviation of the distribution of individual movements is small relative to the species' range, the diffusion approximation will still hold over large enough scales. Even in the extreme case in which a small fraction *m* move over distances comparable with the species' range, this will simply act as an additional damping force, which pulls fluctuations toward the overall mean, and reduces the spatial scale to ; Wright 1943; Malécot 1948).

Second, we often observe patterns over very large spatial scales, and indeed, these are the necessary material for phylogeographic analysis. For example, Barbujani and Sokal (1990) identify zones in Europe at which allele frequencies in human populations change at multiple loci; these correspond to language boundaries, and reflect large-scale movements of whole populations. Yet, in the classical models, genes diffuse independently of each other, and so patterns at different genetic loci will also be independent. Thus, if genes really did diffuse independently of each other, genealogies at different loci would also be independent, and could not be used to infer a common population history in any simple way. Conversely, correlations across loci, in allele frequencies or in genealogy, indicate demographic events that affect the population as a whole (Barton and Wilson 1995; Eldon and Wakeley 2008). If all the individuals in an area descend from a few high-fitness founders, then any alleles carried by those founders will tend to spread over the same area. Such correlations across loci are not captured by classical models of individual reproduction.

Finally, genetic diversity in abundant species is far lower than expected from census numbers (Lewontin 1974; Gillespie 1991): genetic diversity does increase with population size, but far more slowly than predicted by the simple neutral theory (Nevo et al. 1984; Lynch and Conery 2003). For example, the effective population size in humans is estimated to be ∼10^{4} or less (Takahata 1993; Tenesa et al. 2007), and ∼10^{6} in *Drosophila melanogaster* (Li et al. 1999), both of which are far lower than census numbers. This discrepancy may be explained by multiple selective sweeps, such that diversity is limited primarily by the extent of hitchhiking (Maynard Smith and Haigh 1974; Gillespie 2000). Alternatively, diversity can be limited by recurrent population bottlenecks, involving part or all of the species' range; the long-term effective size then depends on the number of founders involved in each event, rather than the average census size (Nei 1987; Whitlock and Barton 1997; Eldon and Wakeley 2006, 2009). Regardless of whether diversity is limited by selective sweeps that affect only part of the genome, or genome-wide demographic effects, the classical model of local diffusion and steady density cannot apply.

We consider a family of models that addresses these difficulties; these were outlined by Etheridge (2008), and are analyzed rigorously by Barton et al. (2010). These models focus on the dynamics of large-scale extinction/recolonization events in a spatial continuum. The effects of recurrent extinction and colonization are well understood in the island model (Slatkin 1977; Wade and McCauley 1988; Whitlock and McCauley 1990; Pannell and Charlesworth 1999). Relatively little is known, however, about how such fluctuations affect populations evolving in continuous space (Barton and Wilson 1995).

We begin with a model in which individual reproduction depends on extinction/recolonization events over a range of scales. We then describe a backward process that gives the ancestry of a sample drawn from such a population; unlike the standard coalescent, this allows simultaneous mergers among more than two lineages. This backwards process is an approximation to the forwards model and corresponds exactly in the limit of high population density (the exact backwards process for an arbitrary population density can be written down, but is rather cumbersome). Even though density is high, this limiting process still allows coalescence, because reproduction is correlated across appreciable areas. We introduce recombination into the model, and show that when reproduction in drastic extinction/recolonization events is correlated over large areas, it leads to correlations between patterns at different genetic loci. This distinguishes our model from those that allow only independent reproduction and local movement. We give an integral equation for the probability of identity, which leads to the distribution of coalescence times, and the long-term rate of decay of genetic variability across the whole population.

Although our basic model is idealized, it is open to a variety of extensions, and it captures key features of two-dimensional populations: multiple mergers of ancestral lineages, large-scale structure, diversity much lower than expected from census numbers, and correlations across loci.