The understanding of speciation—the evolution of barriers to gene flow between taxa—is central to our current understanding of evolutionary biology (Dobzhansky 1937; Mayr 1942; Coyne and Orr 2004). Evolutionary geneticists are particularly interested in understanding the genetic basis of speciation, namely, how many and which genes are involved, what types of changes to these genes contribute to reproductive isolation, and what population genetic processes led to the fixation of different alleles at these genes (Orr et al. 2004). Historically, the study of the genetic basis of speciation has been hampered by the fact that alleles causing reproductive isolation are not particularly amenable to genetic analysis.
Despite this formidable obstacle, evolutionary geneticists have recently made progress in identifying these barriers to gene flow using two types of approaches. The first is a series of clever genetic mapping experiments designed to pinpoint genomic regions, and in some cases individual genes, causing reproductive isolation (Wittbrodt et al. 1989; True et al. 1996; Ting et al. 1998; Barbash et al. 2003; Presgraves 2003; Presgraves et al. 2003; Tao et al. 2003; Sawamura et al. 2004; Moehring et al. 2006; Turner et al. 2005). Presgraves (2003) estimated that between Drosophila melanogaster and its closest known relative, D. simulans, intrinsic hybrid inviability alone involves incompatible alleles at almost 200 genes. Hybrid male sterility factors have also been shown to disproportionately accumulate on the X chromosome (True et al. 1996; Tao et al. 2003), as predicted if the alleles causing them are partly recessive and positively selected (Charlesworth et al. 1987). In addition, several individual genes causing reproductive isolation have been shown to be targets of recurrent adaptive amino acid substitution (Ting et al. 1998; Barbash et al. 2003; Presgraves et al. 2003). Interestingly, these latter two observations suggest a link, if only indirect, between adaptive evolution and the evolution of reproductive isolation.
A second approach is based on statistical analysis of hybrid zones between parapatric, incompletely isolated species. The principle of this approach is that hybrid zones, in which hybrids are less fit than parental populations, represent a conflict between selection against unfit hybrids promoting species divergence and gene flow through dispersal preventing divergence (Slatkin 1973; Endler 1977; Mallet and Barton 1989; Harrison 1990; Barton 2001). The degree to which a genomic region can introgress across a hybrid zone can be related to the strength of selection against it in hybrids relative to dispersal (Barton 2001). In this way, genomic regions causing incompatibilities in hybrids can be mapped as those that have higher levels of differentiation between species (Hagen and Scriber 1989; Rieseberg et al. 1999; Payseur and Nachman 2005; Grahame et al. 2006). Though indirect, this approach is amenable to a wide range of species and presumably has the potential to map a broader range of factors contributing to reproductive isolation. Similar to more direct genetic mapping approaches, statistical analyses of hybrid zones suggest that a large number of loci contribute to reproductive isolation (Barton and Gale 1993). For example, Rieseberg et al. (1999) showed that of 26 genome segments showing significantly reduced introgression across a sunflower hybrid zone, 16 were associated with pollen sterility. These results demonstrate the use of hybrid zones in elucidating the genetic architecture of reproductive barriers between species.
Although both approaches above have proven useful, they suffer from caveats that limit how informative they are about the genetics of speciation. One concern is that hybrid zones are probably not stable over long periods of time. Thus, there may be a large historical component to patterns of differentiation between species, complicating estimates of the strength of selection across a cline. The second is that genes currently contributing to reproductive isolation may not have been involved in the initial speciation process (Coyne and Orr 2004). Whereas the evolution of reproduction isolation is a gradual process, the number of loci required to confer almost complete reproductive isolation between species may be small. One example is Presgraves' (2003) estimate that about 200 genes contribute to hybrid inviability alone in D. melanogaster/D. simulans hybrids. This implies that the total number of loci contribution to reproductive isolation between species (including, among other things, hybrid sterility, ecological differences, premating isolation, etc.) is likely to be much larger. Orr (1995) showed that a rapid accumulation of incompatibility factors (the “snowball effect”) is expected after reproductive isolation is complete between two populations. These theoretical considerations imply that a randomly chosen gene that is currently involved in reproductive isolation between completely isolated species is unlikely to have participated in the speciation process itself. The problem thus becomes trying to distinguish between true “speciation” genes from genic incompatibilities that secondarily strengthen reproductive isolation.
A third, complementary approach is based on coalescent theory. A coalescent approach considers the genealogical properties of samples from parental populations, and can be used to infer population genetic parameters under explicit models of speciation. The simplest of these models considers an allopatric speciation model with no gene flow (Hudson et al. 1987). In the presence of recombination, migration and selection can produce greater heterogeneity in divergence patterns across the genome than expected under a purely allopatric speciation model (Hudson et al. 1987; Palopoli et al. 1996; Wakeley and Hey 1997; Wang et al. 1997; Wu 2001; Machado et al. 2002; Hey and Nielsen 2004; Hey 2005; Bachtrog et al. 2006; Bull et al. 2006). In particular, those parts of the genome that move more freely between species (neutral markers) are expected to diverge more slowly than regions tightly linked to a gene causing reproductive isolation.
By partitioning the genome based on estimated population genetic parameters, such as ancestral and current population size, divergence time and migration rate, we can begin to ask which regions of the genome began to diverge first and/or have the lowest historical migration rates. These regions are more likely to be tightly linked to genes initially causing reproductive isolation rather than neutral parts of the genome or parts of the genome that only recently became associated with reproductive isolation. In addition, we can use estimated population genetic parameters to test explicit models of speciation. Although allopatry is believed to be the dominant mechanism of speciation (Mayr 1942), discordant gene genealogies and divergence time estimates among closely related Drosophila species (i.e., D. pseudoobscura and relatives; Wang et al. 1997; Machado et al. 2002) and between humans and chimps (Osada and Wu 2005) have rejected models of strict allopatric speciation.
Several Lepidopteran species reveal differential patterns of introgression at multiple loci among ecologically distinct strains or closely related species using various approaches (Lushai et al. 2003; Emelianov et al. 2004; Prowell, et al. 2004; Dopman et al. 2005; Bull et al. 2006; Kronforst et al. 2006). For example, a comparison of three Z-chromosome markers in strains of European corn borer reveals that at one marker haplotypes are not shared (Dopman et al. 2005). This marker is tightly linked to a factor that differentially affects postdiapause developmental time and may contribute to reproductive isolation between strains. Here we develop a new coalescent-based approach and apply it to parental populations of two hybridizing, parapatric species of Lepidoptera. Papilio glaucus and P. canadensis are partially reproductively isolated swallowtail butterfly species that form hybrids in a narrow hybrid zone. These species are differentiated by diapause regulation, female-limited mimicry, host-plant preferences, morphological characters, and at least two loci contributing to hybrid inviability (Hagen and Scriber 1989; Hagen et al. 1991; Scriber et al. 1991). Previous surveys of allozymes and mitochondrial DNA (mtDNA) haplotypes revealed a remarkable pattern of differentiation between these two species (Hagen and Scriber 1989; Hagen 1990; Sperling 1993; Bossart and Scriber 1995). In particular, of 21 autosomal allozymes surveyed, most were polymorphic but showed little differentiation between species suggesting high levels of gene flow (Hagen and Scriber 1989). In contrast, mtDNA and three allozymes, including two on the Z chromosome (the Lepidopteran analog of the X in the XY male/XX female system), exhibit strong patterns of differentiation between species, consistent with selection against these markers in hybrids. The two Z-linked allozymes (Pgd and Ldh) are only loosely linked to each other and show distinct patterns of differentiation across the hybrid zone (Hagen 1990; Fig. 1). These patterns strongly suggest that the genomes of these species are a mosaic of regions that experience differential selection pressures in hybrids, and thus they may also show heterogeneous patterns of differentiation.
We examine patterns of divergence between samples of the parental species for five distinct regions of the Z chromosome and the mtDNA (COI/COII). One of the Z-linked regions, Ldh, is an allozyme locus that shows particularly strong differentiation between species in transects through the hybrid zone and is thus a candidate for tight linkage to a gene causing reproductive isolation. MtDNA haplotypes also show strong differentiation in transects through the hybrid zone, which may be a consequence of its expected linkage to the W chromosome in Lepidoptera (Andolfatto et al. 2003). This is interesting because female-limited mimicry in P. glaucus (a trait that distinguishes species) is partly determined by a W-linked locus (Clarke and Sheppard 1962; Scriber et al. 1996). Here we implement a novel approximate Bayesian approach to estimating speciation time that extends previous approaches (Hudson et al. 1987; Wakeley and Hey 1997; Bachtrog et al. 2006). We use these divergence time estimates to test the strictly allopatric model of speciation, which predicts that each genomic region began to diverge at the same time. Under a model of continuing migration, selection against hybrids, and recombination, we may expect to reject the purely allopatric model. We also relax the strictly allopatric model and estimate locus-specific divergence times. In particular, we expect that our candidate regions, Ldh and the mtDNA, should yield deeper divergence time estimates than randomly selected markers. We test this prediction and discuss the implications of our results for mapping speciation genes.