The modes of speciation are classified according to the level of migration between the diverging populations (Mayr 1942). Under allopatric speciation, diverging populations are completely isolated and there is no migration. At the other extreme lies sympatric speciation under which migration is so high that mating is at random even in the presence of spatial discontinuities. Parapatric speciation represents an intermediate scenario where migration is insufficient to ensure global random mating. Scenarios of speciation with migration encompass both sympatric and parapatric speciation.

The proposition that speciation can take place in the presence of gene flow is one of the most controversial ideas in evolutionary biology. This is so because gene flow between populations is expected to rapidly break down linkage disequilibrium, preventing formation of genetically distinct groups (Mayr 1963). However, longstanding theoretical results (reviewed by Gavrilets 2003) and empirical evidence based on genetic clines (Endler 1977) suggest that parapatric speciation is common. Additionally, recent theoretical work shows that sympatric speciation is possible (Gavrilets 2003; Fitzpatrick *et al.* 2009), and some recent empirical examples seem to confirm this theoretical prediction (Nosil, 2008). Thus, the debate is now centred on the question of how frequent sympatric speciation is (Bolnick & Fitzpatrick 2007). Unfortunately, providing a clear answer to this question still represents a great challenge.

The difficulty in resolving this dispute resides in the fact that, with the exception of some particular cases such as chromosomal speciation, all the information available to us reduces to a snapshot (or several shortly spaced snapshots) of a very long evolutionary process. Thus, most studies of speciation are circumscribed to making inferences about different scenarios of gene flow from observed patterns of genetic variation. This is extremely difficult because several scenarios can lead to very similar patterns. Shared polymorphism between two species can be explained by at least four different mechanisms (see Fig. 1): (i) very recent allopatric speciation, (ii) parapatric or (iii) sympatric speciation and (iv) allopatric speciation followed by secondary contact. It is important to note that if the question were circumscribed to distinguishing among scenarios (i)–(iii), then it would suffice to use a method that fits the data to a model of isolation with migration (IM; Nielsen & Wakeley 2001) so as to obtain estimates of migration rates among species. If the estimated migration rates have the highest probability at zero, then one can reject the model of speciation with migration (Hey 2006). If this is not the case, then it could be possible in principle to distinguish between parapatric and sympatric speciation as migration rate estimates should be lower in the former. One could for example calculate the probability that migration rates are larger than a threshold value that would indicate panmixia. This strategy is easy to apply thanks to the availability of large multilocus data sets and the programs IMa (Hey & Nielsen 2007) and IMa2 (Hey 2010b) both of which implement the IM model. However, the problem with this approach is that we cannot exclude the possibility of secondary contact after allopatric speciation. This is, for example, the hybrid zone scenario that is frequently invoked to explain the existence of genetic clines (Barton & Hewitt 1985). Thus, simply estimating migration rates is not enough; we also need to estimate the time at which migration events took place. If the posterior distribution of migration timing is broad and a large proportion of its probability mass occurs near or prior to the inferred divergence time, we can infer that speciation occurred in the presence of gene flow (Fig. 1b, c). The opposite pattern (narrow posterior distribution with large proportion of probability mass near the present) would be consistent with secondary contact after allopatric speciation (Fig. 1d).

There are no statistical genetics methods specifically aimed at estimating gene flow timing. However, coalescent-based computer intensive methods that explicitly follow the changing genealogy of a sample during the course of the analysis allow making inferences about quantities other than the parameters that are explicitly included in the underlying probabilistic model (Won & Hey 2005). In the case of IMa and IMa2, the parameters for a scenario involving two species are the sizes of ancestral and descendant populations scaled by the overall neutral mutation rate, the scaled migration rates between populations and the scaled time at which the ancestral population gave rise to the two descendant populations. The estimation of these parameters involves two steps. First genealogies are generated by MCMC (Markov Chain Monte Carlo) conditional on the data, and then these genealogies are used to construct an estimate of the posterior density function of the parameters. The estimates of number and timing of migration events are obtained during the first step and do not have a direct connection to the parameters estimated during the second step (Hey 2010a). The quantities estimated by IMa and IMa2 differ; the first one estimates the posterior distribution of the mean migration time, while the second estimates the distribution of actual migration times (i.e. without averaging within genealogies). This distinction is important because using the mean time underestimates the true variance of migration time (Hey 2010a; Niemiller *et al.* 2010) and can lead to erroneously conclude that migration was circumscribed to a short period of time.

The IM model assumes a constant level of migration from initial divergence to the present, thus scenarios where migration varies through time represent violations to this assumption and can lead to biases in the estimation of parameters. Becquet & Przeworski (2009) investigated this problem and show that IMa and their own statistical method MIMAR (Becquet & Przeworski 2007) provide biased estimates of the IM model explicit parameters (gene flow rates, divergence time and effective sizes). They can fail to detect migration when this happens early on in the divergence process, wrongly pointing to an allopatric speciation scenario. They also show that secondary contact scenarios are indistinguishable from parapatric speciation but this was based on an arbitrary criterion stipulating that there was evidence for an IM scenario when *M* = 4Nm ≥ 0.1. This criterion is certainly much less satisfactory than one based on estimates of gene flow timing. The study of Strasburg and Rieseberg (2011) represents a more thorough evaluation of the ability of IM approaches to discriminate between speciation scenarios because it focuses on this particular parameter.

In their simulation study, Strasburg and Rieseberg considered a scenario with two species that diverged one million generations ago and had the same effective size as the ancestral population (one million). They explored a wide range of gene flow scenarios that included the IM model as well as five other models with nonconstant migration; three represent allopatric speciation followed by secondary contact of varying duration, and two represent allopatric speciation with short bursts of gene flow before the present. A method that can accurately distinguish between these patterns of gene flow timing should lead to posterior distributions that differ in width depending on the model being considered. However, this width is also influenced by the amount of information contained in the genetic samples so they also explored the effects of sample sizes, number of loci and length of sequences on posterior distributions of gene flow timing.

In most of the gene flow-sampling scenarios, the width of the 90% highest probability density interval (HPDI – a Bayesian equivalent of a confidence interval) was wider than the actual interval of simulated gene flow and included the actual period of gene flow but it also included the real time since divergence. The largest decrease in variance was achieved by increasing the sequence length. Increasing the number of loci led to a smaller decrease, while increasing sample size did not have much of an effect on the variance of the estimates. A rather discouraging result is that with long sequences, most of the probability mass of the gene flow timing posterior distribution is near the real divergence time even for the scenario with a very recent secondary contact – something expected for scenarios with gene flow during divergence. In practice, the real divergence time is unknown so Strasburg and Rieseberg also investigated the proportion of gene flow timing posterior probability density occurring near or prior to the inferred divergence time. Here again, the 90% HPDIs included this estimate giving support for divergence with gene flow regardless of the real scenario being considered.

Correctly interpreting what the large variance in gene flow timing means requires a good understanding of the MCMC approach used by IM methods. As mentioned above, the genealogies are generated directly from their marginal posterior probability distribution and do not have a direct connection to the parameters of the IM model. For any given locus, a large number of genealogies are proposed, and some of them are accepted based on the Metropolis–Hastings update criterion (see Hey & Nielsen 2007). Given the very high variance of the coalescence process, the accepted genealogies can be very different. Thus, the variance in timing of migration estimates obtained from IMa2 for any given locus has two components, one as a result of variation within a genealogy and another as a result of variation among accepted genealogies. A large contribution of the first component would indicate that migration events occurred all along the divergence period. A large contribution of the second component can indicate that the genetic data are consistent with genealogies where migration events occur before the speciation process started [allopatric divergence from a structured ancestral population; c.f. scenario C in Becquet & Przeworski (2009)] but also with genealogies where migration events occur well after divergence (secondary contact) and with genealogies that are intermediate between these two extremes (sympatric or allopatric speciation). It would be informative to compare the within- and among-genealogy variances. If the former is equal or larger than the latter, then we could infer that the speciation with divergence scenario is a good approximation to reality. However, the within-genealogy variance is difficult to estimate because very few migration events are included in any given genealogy (e.g. Won & Hey 2005; Nadachowska & Babik 2009) so this is not a viable strategy.

The failure of IMa and IMa2 to obtain reliable estimates of migration timing under models that do not fit the assumptions of the IM model is not surprising and does not disqualify their use to estimate the explicit parameters of this model under a range of evolutionary scenarios (Strasburg & Rieseberg 2008; Becquet & Przeworski 2009). However, we may question their use to estimate parameters directly derived from the genealogies. As Felsenstein (1988) noted in his seminal study, uncertainty of an estimate of a genealogy can be large and difficult to quantify. Thus, he proposed to treat the genealogy as a nuisance variable and estimate evolutionary and population genetics parameters by averaging over all possible genealogies. In other words, the objective is to remove the genealogies from the population genetics model, and this is the strategy followed by all coalescent-based methods including IM approaches. One could argue that removing the genealogy also means excluding from the estimation process the parameters that can be derived directly from them. This assertion may seem rather extreme but at the very least we need to be cautious when using them to make inferences about evolutionary processes.

Overall, Strasburg and Rieseberg results indicate that inferences about gene flow timing based on the IM model are not reliable and should not be used to distinguish among speciation modes. These are not good news as there are no other criteria that could be used to identify unequivocal examples of sympatric speciation. The four criteria proposed by Coyne & Orr (2004) have been criticized by Bolnick & Fitzpatrick (2007), and the several recently cited examples of sympatric speciation are debatable (Fitzpatrick *et al.* 2009). This may seem a rather discouraging situation but it is important to note that the controversy about sympatric speciation has led to great progress towards better understanding the processes that constrain or promote divergence.

Simulation studies are unappealing because they are extremely laborious, difficult to present because of the very large number of results and subject to the criticism that most of the time they uncover problems without proposing a viable solution to overcome them. However, by uncovering problems, they can help avoid misuse of available methods and also motivate the development of new and more appropriate approaches. It seems to me that this is precisely the very important contribution of Strasburg and Rieseberg analysis to the field of speciation studies.