Species delimitation and geography

Despite the importance of the geographical arrangement of populations for the inference of species boundaries, only a few approaches that integrate spatial information into species delimitation have thus far been developed. Persistent differentiation of sympatric groups of individuals is the best criterion for species status. Species delimitation becomes more prone to error if allopatric metapopulations are considered because it is often difficult to assess whether observed differences between allopatric metapopulations would be sufficient to prevent the fusion of these metapopulations upon contact. We propose a novel approach for testing the hypothesis that the multilocus genetic distances between individuals or populations belonging to two different candidate species are not larger than expected based on their geographical distances and the relationship of genetic and geographical distances within the candidate species. A rejection of this null hypothesis is an argument for classifying the two studied candidate species as distinct species. Case studies show that the proposed tests are suitable to distinguish between intra‐ and interspecific differentiation. The regression approach proposed here is more appropriate for testing species hypotheses with regard to isolation by distance than (partial) Mantel tests. Our tests assume a linear relationship between genetic and (transformed) geographical distances. This assumption can be compromised by a high genetic variability within populations as found in a case study with microsatellite markers.

The geographical relationships between groups of individuals may be highly informative for the inference of species boundaries.
If differentiated groups of individuals occur at the same locality, discontinuities in the distributions of their character states (other than polymorphisms or sexual dimorphism) demonstrate that these groups should be classified as distinct species. The criterion of | 951 HAUSDORF AnD HEnnIG persistent differentiation of sympatric groups, at least with regard to specific characteristics, can be found in several species concepts such as the genotypic cluster definition of Mallet (1995), the genic species concept of Wu (2001) and the differential fitness species concept of Hausdorf (2011). Species delimitation becomes more prone to error if allopatric metapopulations are considered, because it is often difficult to assess whether observed differences between allopatric metapopulations would be sufficient to prevent the fusion of these metapopulations upon contact.
Despite the importance of geography for the inference of species boundaries and despite geographical data of the sampled individuals almost always being available, only a few approaches that integrate spatial information into species delimitation have been developed so far. These approaches can be classified into a priori methods that incorporate the geographical data into the protocol for delimiting candidate species, and a posteriori approaches that use geographical data to assess whether candidate species delimited with other approaches should be considered distinct species given the degree of differentiation of the candidate species and their geographical relationships.
Two a priori methods for considering geographical information directly in the species delimitation process have been proposed. Guillot et al. (2012) proposed a statistical model that can analyse genetic and phenotypic data and can incorporate geographical data in such a way that the clusters to be delimited tend to occupy only one or a few separate areas. Edwards and Knowles (2014) suggested a clustering approach based on a combination of nonmetrical multidimensional scalings of the different distance matrices that were derived from geographical, as well as genetic, morphological and ecological data. The implicit assumption of this approach is that populations that are further apart are more likely to evolve into separate species because of the decreasing gene flow and/or the more strongly differing environmental conditions with increasing geographical distance.
A posteriori approaches assess whether the observed relationships between geographical and genetic or morphological distances between candidate species determined with other approaches are compatible with the expectation based on the variation of genetic distances with increasing geographical distances within the candidate species. Such tests require a model that describes the relationships between geographical and genetic or morphological distances within species. The simplest model that describes this relationship is the "isolation by distance" (IBD) model introduced by Wright (1943).
Four studies have suggested different a posteriori approaches. Medrano, López-Perea, and Herrera (2014) used partial Mantel tests to assess whether a variable indicating the classification can explain a significant part of the variance in the genetic distances between populations in addition to the variance explained by geography. Gratton et al. (2016) formulated two operational criteria for recognizing "good" species, namely (a) a pattern of within-cluster IBD, and (b) a lack of dependence of the genetic differentiation between pairs of individuals belonging to different clusters on their geographical distance. They compared the correlation between genetic and geographical distances within and between candidate species using Mantel tests. Spriggs et al. (2019) (Frantz, Cellina, Krier, Schley, & Burke, 2009;Guillot & Rousset, 2013). Thus, here we develop a new regression-based protocol for testing whether the genetic distances between individuals or populations belonging to two different candidate species can be explained by their geographical distances given the variation of genetic distances with geography within the candidate species, or whether they indicate that the candidate species should be classified as distinct species. We discuss the underlying assumptions and methodological difficulties of this approach and compare it with previous a posteriori approaches for assessing the status of candidate species using geographical information.  (Edwards, Soltis, & Soltis, 2008), AFLP data of trumpet daffodils (Narcissus; Amaryllidaceae) from the southern Iberian Peninsula (Medrano et al., 2014), and haplotype data of RAD loci (Gratton et al., 2015) of brassy ringlets (Erebia; Lepidoptera: Nymphalidae) from the Alps (Gratton et al., 2016). These data sets are described in detail in File S1.

| Outline of the IBD tests
We intend to test whether the genetic (or morphological) distances between units belonging to two candidate species delimited with other methods can be explained by IBD (i.e., by the increase in genetic distances with geographical distances observed within the candidate species). We derive the expected relationship of genetic distances and geographical distances from regressions of genetic distances of units belonging to the same candidate species against log-transformed geographical distances. The null hypothesis is that the genetic distances between units belonging to different candidate species are not larger than expected based on the within-group regressions. A rejection of the null hypothesis is an argument for classifying two candidate species as distinct species.
As units either individuals or populations may be used. Given that the number of individuals is always equal to or larger than the number of populations, tests on the level of individuals have more power.
However, inference based on individuals assumes that the units are independent samples. This assumption can be violated by relationships between individuals, which are especially close within populations. Thus, tests on the level of populations may be less affected by the violation of this assumption than tests on the level of individuals.

| Modelling setup
Assume that we have observations of n units I 1 ,…, I n . These are characterized by a geographical distance measure d ij = d(I i , I j ) and a genetic distance measure based on multilocus data d * ij = d * (I i , I j ) for i, j = 1,…, n, both of which fulfil the standard axioms of a dissimilarity (non-negativity, symmetry, d(I, I) = 0 for all objects I; we do not require the triangle inequality to hold). Furthermore for i = 1, …, n we have group indicators c i ∈ {1, 2} indicating whether I i belongs to candidate species 1 or 2. Let n 1 and n 2 be the number of units belonging to candidate species 1 and 2, respectively.
We use a linear regression approach, but we allow the distances to be transformed by known monotonic transformations f and f * , i.e., This allows for nonlinear relationships. Following Slatkin (1993) and Rousset (1997), we log-transform geographical distances. An issue is that geographical zero distances occur whereas log(0) is not defined. Thus, distances need to be transformed as f ij = log(d ij + c) with a constant c> 0. The choice of c will have an impact on the regression. c should depend on the value range of d ij , because its impact is relative to that range. We choose c to be the 0.25-quantile of the geographical distances here. This makes the transformation invariant to the measurement units of the distances.
Within a candidate species, we assume that the following regression relationship holds: Here i and j are from a set of indexes assumed to belong to the same candidate species. In the following we will assume that all units in candidate species k, k = 1, 2, belong to the same candidate species, characterized by regression parameters a k (intercept) and b k (slope). For statistical inference, we assume that the units are independent samples (see Outline of the IBD tests). However, we do not assume anything further regarding the distribution of e ij , and particularly not that they are independent, which for different distances involving the same unit would not make sense. This means that the standard distribution theory of linear regression cannot be applied.
If all units belong to the same species and the regressions within the two candidate species are equal, we have a = a 1 = a 2 and b = b 1 = b 2 . A difference in the regressions might indicate that the two candidate species considered are different species with different dispersal abilities. However, the relationship between the genetic and the geographical distances does not depend only on species-specific characteristics such as dispersal ability, but also on other factors such as the terrain or history. For example, genetic distances may increase faster with geographical distance in a mountainous region than in a plain because the mountains inhibit dispersal. Furthermore, the genetic distances within a candidate species may be larger in long-standing populations in a refuge area than in populations in an area that has been colonized only recently.
Thus, two candidate species may belong to the same species despite a 1 ≠ a 2 and/or b 1 ≠ b 2 . There will not always be a simple regression On the other hand, having a 1 = a 2 and b 1 = b 2 for the within-group distances in both candidate species does not necessarily imply that these candidate species belong to the same species, because this does not say that the between-group genetic distances are low enough to be explained by geographical distance alone. For this to be the case, the regression resulting from the within-group distances will need to fit the between-group distances as well.

| IBD tests
In the following, we describe three tests that investigate the equality of two different regressions of genetic versus geographical distances within and/or between two candidate species. The first test compares the regressions within the two candidate species. It does not test whether the overall pattern is compatible with IBD (see above), but indicates whether the second or the third test is appropriate given the structure of the data. These alternative tests were devised to test the hypothesis that the genetic distances between units belonging to two different candidate species can be explained by IBD.
The null hypothesis of the first test, H 01 , is that the relationship between genetic and geographical distances within each candidate species can be modelled by a single regression for both candidate species: a * = a 1 = a 2 and b * = b 1 = b 2 (i.e., the regression coefficients are called a * and b * assuming that the regressions based on within-group distances are equal). This case is illustrated, for example, in (1) If H 01 is not rejected, the hypothesis that the genetic distances between units belonging to two different candidate species can be explained by IBD can be investigated by checking whether the joint within-group regression also fits the distances between the two candidate species (green crosses in Figure 1), i.e., whether for all i, j = 1, …, n: Here a = a 1 = a 2 and b = b 1 = b 2 . This will be tested by comparing a regression fitted on all within-group distances (dotted green lines in Figure 1a) with another regression fitted on all distances (solid green lines in Figure 1a). If H 02 is true, these should be equal (i.e., a = a * and b = b * ). If H 02 is not rejected, the data provide no evidence for the specific distinctness of the candidate species.
If H 01 is rejected (i.e., a 1 ≠ a 2 and/or b 1 ≠ b 2 ; for example the dotted black and red lines in Figure 1b), it would be invalid to fit a regression to all the within-group distances together. In this case, we compare to what is predicted from each of the two regressions within the candidate species defined by the parameters (a 1 , b 1 ), (a 2 , b 2 ). The null hypothesis that the genetic distances between units belonging to two different candidate species can be explained by IBD is operationalized in this case as follows. We define another regression: for i, j with c i = k whereas c j may be either 1 or 2 (solid black and red lines, respectively, in Figure 1b). These regressions are based on the distances within candidate species k together with the distances between the candidate species, but without the distances within the respective other candidate species. Let f between be the centre of the between-group transformed geographical distances (i.e., If the genetic distances between the candidate species are too large to be compatible with the regression on the distances within candidate species k, putting the within-group and between-group distances together will result in a regression (solid black and red line, respectively, in Figure 1b) that fits a higher value at the centre of the between-group distances (blue lines in Figure 1) than the regression based on the within-group distances alone (dotted black and red line, respectively, in Figure 1b), that is a * k + b * k f between > a k + b k f between . If this is the case for both candidate species, H 03 is rejected, indicating that the two candidate species probably represent distinct species.
If this is the case for only one of the candidate species, the reasons have to be investigated.
H 01 will be tested against the alternative that a 1 − a 2 ≠ 0 or . H 02 will be tested against a + bfbetween > a * + b * f between (one-sided alternative), because it should only be rejected if the genetic distances between candidate species are larger on average than what would be expected from the regression on within-group distances only. This holds for H 03 as well, namely it is tested against the one-sided alternative a * k + b * k f between > a k + b k f between for both k = 1, 2. We will use ordinary least squares regression to fit all the models and obtain parameter estimators â, â * , Relationships between genetic and log-transformed geographical distances in pairs of individuals or populations of two candidate species. Black circles and red triangles: distances between individuals or populations belonging to the first and second candidate species, respectively; green crosses: distances between individuals or populations belonging to different candidate species; black and red dotted lines: regression lines fitted within the first or second candidate species, respectively; blue lines: centres of the between-groups geographical distances. (a) The relationship between genetic and geographical distances within each candidate species can be modelled by a single regression for both candidate species (H 01 not rejected). Green dotted line: regression line fitted on the within-group distances only (i.e., the black circles and red triangles taken together); green solid line: regression line fitted on all distances together. (b) The relationship between genetic and geographical distances within each candidate species cannot be modelled by a single regression for both candidate species (H 01 rejected). Black and red solid lines: regression lines fitted to the distances within the first or second candidate species, respectively, together with the distances between the candidate species [Colour figure can be viewed at wileyonlinelibrary.com] species 1 and 2, respectively) are compared. Here we test intercepts and slopes separately, and both need to be equal for the regression lines to be the same. The test statistics are T 1a = â 1 − â 2 The variation of each of the two regression lines can be assessed independently, and the variation of T 1a and T 1b can be derived from those. This test can be generally applied to comparing two regressions between distances in two different independent groups.
When testing H 02 and H 03 , the regression lines that are compared are based on partly the same units, and we are interested in assessing differences at the centre of between-group distances f between , rather than running separate tests for intercept and slope. The test statistic for H 02 is There are two test statistics for H 03 testing separately for the two candidate For testing, we have to estimate the expected variation of the test statistics under H 0 . We cannot use standard linear regression theory here because of the lack of distributional assumptions and particularly the lack of independence of e ij . One possibility for assessing variability would be a nonparametric bootstrap (sampling with replacement n units from the empirically observed units). A nonparametric bootstrap will keep the sample size constant by sampling identical objects several times. This is problematic here, because it will lead to a number of pairs of (identical) units sampled with both geographical and genetic distance zero. This can have a strong effect on the regression estimation and is unrealistic unless the measurement of distances is imprecise and there are many such "both distances zero"-cases already in the data.
Because of these issues, we apply a different nonparametric statistical resampling principle, the jackknife (Quenouille, 1949). A simple nonparametric jackknife test has been proposed by Tukey (1958). The general idea is to define "pseudovalues" for the parameter estimators. If a parameter is estimated from n independent and identically distributed observations X 1 ,…, X n by an estimator ̂n , for i = 1,…, n pseudovalues * i = n̂n − (n − 1)̂n −1;i are computed, where ̂n −1;i is the estimator of computed with observation X i omitted (in the context of distance data this means that all distances involving unit I i are omitted). The pseudosample * i ,…, * n can then be used to run a standard t test of the hypothesized value for (i.e., their mean is compared with the expected value under H 0 , which here is zero).
For details about when this works see Miller (1974); the specific reasons given by Miller why such a procedure may not work, namely if involved estimators are not smooth enough in the observations, do not apply in our setup. See also Efron (1979) for more theoretical exploration.
The test statistic T 2 (taking the role of ) allows a direct application of this principle. Some modification is required for T 1a , T 1b , T 31 , T 32 , because for these test statistics the role of observations differs between the two candidate species, and variances of * i may differ between candidate species.
T 1a and T 1b are differences between parameter estimators from two different independent groups, and this is an analogous situation to Welch's (1947) two-sample t test allowing for different variances.
The pseudovalues * i can be used separately depending on whether c i is 1 or 2, the two within-group variances can be combined and the test can be run in the same way as in Welch's t test.
In T 31 and T 32 , the two regression lines to be compared are not independent. The difference â * k + b * k f between − (â k + b k f between ) needs to be evaluated omitting one unit I i at a time to compute * i . Again the variance of the * i may differ depending on whether c i = 1 or 2. This is because the units I i with c i = k are used for both regressions, whereas the units from the other candidate species are used only for the regression that includes between-group distances. The variance of the mean 1 n ∑ n i=1 * i can be estimated as is the sample variance of i for which c i = j. This can be used in a t test, with degrees of freedom approximated by the Welch-Sattertwaithe equation (Welch, 1947), as in Welch's t test.
The shortest distances between two points on the surface of a sphere, measured along the surface of the sphere, from geographical coordinates using the function "coord2dist" of prabclus. The genetic distances between individuals, Jaccard distances for AFLP data and shared allele distances (Bowcock et al., 1994) and ê (Watts et al., 2007) for microsatellite data, haplotype data and single nucleotide polymorphisms (SNPs), can be calculated using the function "alleledist" of prabclus. For testing IBD between populations, we implemented the chord distance (Cavalli-Sforza & Edwards, 1967), (Weir & Cockerham, 1984), Φ pt (Peakall, Smouse, & Huff, 1995) and three variants of the shared allele distance (Bowcock et al., 1994)

| IBD tests of the case studies
These results of IBD tests of the case studies are described in detail in File S1.

| Regression based IBD tests for assessing species status
Whereas the continued co-occurrence of differentiated groups without fusing can be considered as a proof of their species status, it is more difficult to assess the status of allopatric metapopulations. For example, approaches such as the multispecies coalescent model as implemented, for example, in BPP (Yang & Rannala, 2014) tend to overestimate the number of true species ( approach did not take the dependence between distances into account (see "IBD tests" above). Furthermore, their approach implicitly assumes that the regression slope between geographical and genetic distances is the same within both candidate species and between the candidate species; this assumption is violated in many cases (e.g., see Table S1 and Figure S1). We assess whether the genetic distances between individuals or populations belonging to different candidate species are not larger than expected based on their geo-  Figure S1d-f). Thus, the between-group distances can hardly be higher than predicted by the within-group regressions. The lack of significance for rejecting the null hypothesis that the between-group distances can be explained by IBD should not be interpreted as evidence for the conspecifity of the examined groups if the within-group distances are already close to the maximum. Because of their high variability, microsatellites generally result in higher distances than AFLP or SNP data. Thus, and also because usually fewer loci are scored using microsatellites than with AFLP or SNPs, microsatellite data are less suitable for species delimitation than markers that result in distances that are less quickly "saturated" with increasing geographical distances and represent a larger portion of the genome.
In  Table   S1). Thus, the data provide no evidence for their specific distinctness, and these two taxa might better be considered conspecific as suggested by Wunderlin (1998) and classified as subspecies.
With the other investigated data sets, IBD tests based on distances between populations proved to be problematic. It is clear that the sample size is smaller when populations instead of individuals are used as units. Several of the studied taxonomic problems could not be tested with IBD tests based on distances between populations because not enough populations were sampled to perform the tests. However, this is not only a problem of sampling. In some cases, locally endemic species comprise fewer populations than would be necessary for an IBD test at the population level. Even if enough populations for performing the tests were sampled, the results often remained inconclusive. A meta-analysis of intraspecific IBD analyses indicated that more than nine populations were needed to achieve more than 50% probability of significant IBD, more than 17 populations were needed to achieve 75% probability of significant IBD, and more than 24 populations were required to achieve 90% probability of significant IBD (Jenkins et al., 2010). Such high numbers of populations per species are rarely sampled across a group of species for systematic studies. One reason for the large numbers of populations that are necessary for demonstrating IBD and for the inconclusive results of our tests is a large scatter of the genetic distances depending on the geographical distances. In the data sets we re-analysed, this is probably at least partly caused by insufficient sampling within populations so that the distances between the populations cannot be accurately estimated, resulting in unreliable estimates of the regression coefficients. The standard sampling for taxonomic studies that often deal with rare and/or geographically restricted species is usually not adequate for IBD tests at the population level.
In addition, IBD analyses based on populations have also more general problems. The distance measures between populations not only reflect the differentiation between populations but may also be affected by the variability within populations. The latter is not necessarily related to the geographical distances between populations. We used several statistics for quantifying the differentiation between populations (chord distance, Φ pt , F ST /(1 − F ST ), and three variants of the shared allele distance) in IBD tests, and they yielded mostly similar results. Chord and shared allele distances can also be calculated if a population is represented only by one individual, but several individuals of both populations are necessary for the calculation of F ST /(1 − F ST ) and Φ pt . Thus, more information is lost if the latter statistics are used.
Our tests indicate whether the differentiation between two candidate species can be explained by IBD. This is not necessarily a test for species status. As already mentioned, the population structure of a pair of species might also be compatible with IBD (e.g., if the two species originated from a widespread ancestral species that was structured by IBD across its range). An overlap of the ranges of the two species might nevertheless demonstrate their species status.
The sympatry criterion (i.e., the continued co-occurrence of two differentiated groups without an erosion of their differentiation) is always the strongest proof of their species status. However, for allopatric candidate species additional criteria are necessary. In the case of peripatric taxa, the amount of admixture, the width of a hybrid zone and the abruptness of the changes across a hybrid zone may provide arguments for the classification. Apart from crossing experiments, IBD tests are the only tests that provide an argument for the classification of strictly allopatric candidate species without contact zones. Another criterion, which has not been implemented in a formal test so far, might be whether the differentiation of an allopatric pair of candidate species reaches the degree found in closely related sympatric species. However, differential adaptation to different environments may include different genetic changes that may or may not be associated with morphological changes. Thus, the "degree of differentiation" is difficult to measure and even more difficult to test, even between closely related taxa. Speciation is usually a gradual evolutionary process and, thus, the decision on at which point in this process two differentiating groups should be classified as species will remain arbitrary to some degree. The IBD tests are a tool to make this decision slightly more objective.
A geographical expansion of two candidate species (e.g., proceeding from refuges) leading to an approximation of their distribution areas may result in more large genetic distances between individuals or populations of the two candidate species at smaller geographical distances. This would increase the likelihood that the two candidate species are considered distinct species. However, as geographical distances are log-transformed, species must approach each other significantly before this affects the distribution of distances and IBD tests. If candidate species approach each other geographically, the probability increases that individuals or propagules will be exchanged at least from time to time. If the candidate species are not isolated, this will lead to gene flow and a decrease in genetic distances between candidate species. On the other hand, if we do not observe gene flow and a decrease in genetic distances between candidate species, this will support their classification as distinct species.
Human-induced translocations, such as restocking of fish species, can disturb the natural pattern and decrease the informational value of the relationship between geographical and genetic distances. Thus, populations resulting from such translocations should not be used for IBD analyses.

| Comparison of regression-based IBD tests with approaches for assessing species status using Mantel tests
In contrast to the regression procedure proposed here, most previous approaches to assess whether the differentiation between candidate species can be explained by IBD (Gratton et al., 2016;Medrano et al., 2014) were based on permutation-based Mantel or partial Mantel tests (Mantel, 1967;Smouse, Long, & Sokal, 1986).

Medrano et al. (2014) used permutation-based simple and partial
Mantel tests "to determine the proportion of total variance of genetic distances between populations that could be attributed to long-term historical divergence or more recent and local isolation-by-distance processes." Decisive for the argumentation of Medrano et al. (2014) is whether a partial Mantel test indicates that a significant proportion of the genetic variation can be explained by the tested grouping after statistically accounting for the effect of the geographical distance matrix. Although Medrano et al. (2014) did not explicitly define a null hypothesis, what is tested by the partial Mantel test may be equivalent to our null hypothesis. However, we believe that it is more appropriate to frame this as a regression rather than a correlation problem because of the causal asymmetry between geography and genetics. Another possible permutation approach would be to fit the regression models presented here and to permute the group memberships of the individuals, which under H 0 should not change the regression parameters. Both of these approaches suffer from the same problem. In many cases most or all the within-group geographical distances are small and the between-groups geographical distances are large. Permuting the group labels (which implicitly also occurs in the partial Mantel test) means that some distances that were originally between-groups become within-group distances and vice versa. This will systematically change the distributions of geographical distances within groups, which in turn can have a strong effect on regression (and partial correlation) estimation, as regression estimation is less variable if there is more variation in the x (explanatory) variable whereas the variation in the y variable is unchanged. Therefore, such an approach is not appropriate to assess the expected variation for a real pattern in which within-group distances tend to be small. Similar problems regarding Mantel and partial Mantel tests have been reported by Frantz et al. (2009) and Guillot and Rousset (2013), who concluded that partial Mantel tests are not statistically valid. Gratton et al. (2016) specified as operational criteria for classifying clusters as species "(1) a pattern of within-clusters IBD …, and (2) genetic differentiation between pairs of individuals belonging to different clusters shows no clear dependence on their geographical distance (i.e., individuals sampled in, or near to, contact zones do not tend to be genetically intermediate)." Criterion (1) is not suitable for testing species status because IBD is not a general property of species (Jenkins et al., 2010). After speciation, the interspecific distances may still be correlated with geographical distances if two species originated from a widespread ancestral species that was structured by IBD across its range (see above). Introgression might also contribute to the maintenance of this pattern. Thus, the condition described as criterion (2) is not mandatory for pairs of recently diverged species. Thus, neither a lack of a correlation of genetic and geographical distances within clusters, nor a significant correlation of genetic distances between groups with geographical distances, can be interpreted as an argument for lumping candidate species.
The Mantel tests applied by Gratton et al. (2016) are not suitable to test specifically whether "individuals sampled in, or near to, contact zones do not tend to be genetically intermediate" (Gratton et al., 2016). The Mantel test assesses the correlation between genetic and geographical distances across the range occupied by the analysed individuals and not specifically the genetic distances of individuals from contact zones. Gratton et al. (2016) did not apply their tests consequently. Their second criterion for distinct species, the lack of a correlation of the genetic differentiation between pairs of individuals belonging to different clusters with geographical distances, was not fulfilled for Erebia tyndarus and E. nivalis; that is, they found a significant correlation of the genetic distances between these species with geographical distances. Nevertheless, they classified them as good species.
We agree with this decision because the ranges of the two species broadly overlap, they form clearly separated clusters in the principal components analysis (Gratton et al., 2016: fig. 2a) and a structure analysis indicated only little admixture between co-occurring populations of the two species (Gratton et al., 2016: fig. 3). We consider the continued co-occurrence of two taxa without fusing as decisive evidence for their species status. Our test rejected the null hypothesis that the genetic distances between individuals belonging to two different candidate species are not larger than expected based on their geographical distance for E. tyndarus and E. nivalis as well as all other pairs of the four species of the E. tyndarus complex (Figure S1l-n; Table S1). The interspecific genetic distances are larger than expected based on the relationship between the genetic intraspecific distances and the geographical distances of the sampled specimens. The positive correlation of the genetic distances between E. tyndarus and E.
nivalis specimens with their (least-cost path) geographical distances is not relevant for the test of the hypothesis that the magnitude of the interspecific distances can be explained by IBD as expected based on the intraspecific distances. Actually, the genetic distances between E. tyndarus and E. nivalis specimens are significantly larger than expected from the intraspecific IBD regression pattern ( Figure S1l; Table S1). Thus, Mantel tests with the interspecific genetic distances as applied by Gratton et al. (2016) do not provide relevant evidence for or against the species status of the considered taxa.
Whereas whether the differentiation between two candidate species can be explained by IBD is always tested in our approach, Medrano et al. (2014) and Gratton et al. (2016) included three groups in one Mantel or partial Mantel test in some cases. The outcome of a test with more than two groups is difficult to interpret because the differentiation between two of the tested groups might be explained by IBD, whereas the third group could be more strongly differentiated. For example, Gratton et al. (2016) reported that a Mantel test showed a significant correlation between genetic and geographical distances within E. cassioides in the wide sense. They concluded that the three clusters that were identified by k-means clustering and structure are conspecific. Our pair-wise regressions showed that the null hypothesis that the genetic distances between individuals belonging to two different candidate species are not larger than expected based on their geographical distance can actually not be rejected for the Western Alps + Pyrenees + Northern Apennines versus the Central + Southern Apennines cluster ( Figure S1p; Table S1). Thus, the data provide no evidence for the specific distinctness of these two subgroups. The structure analysis that showed that the ge-  Table   S1). Although the differentiation between these groups is smaller than between the other species of the E. tyndarus complex, these results suggest that the populations from the Orobian and Eastern Alps can be classified as a distinct species (albeit this should be corroborated with genetic data from additional samples). This conclusion is also supported by the result of the structure analysis of the RAD data that showed little admixture between the cluster from the Orobian and Eastern Alps and the two other clusters (Gratton et al., 2016: fig. 3a). In particular, the single specimen from the Orobian Alps showed no admixture with the geographically close populations from the Western Alps. Lattes et al. (1994) had already recognized the distinction between western and eastern subgroups of E. cassioides based on allozyme data. Thus, the IBD tests and the structure analysis of the RAD data together with allozyme data support the separation of the western populations as E. arvernensis (see Descimon & Mallet, 2009) from the eastern

E. cassioides.
Concerning Narcissus from the Baetic Ranges, our analysis confirmed that the two major groups distinguished by Medrano et al. (2014), the blue group including N. bujei and the green group including N. longispathus and N. nevadensis, form two distinct species complexes. This was not surprising because these groups have overlapping ranges and are not sister groups in nuclear ITS and organellar phylogenies (Marques, Fuertes Aguilar, Martins-Louçao, Moharrek, & Nieto Feliner, 2017;Rønsted, Savolainen, Mølgaard, & Jäger, 2008). Using partial Mantel tests, Medrano et al. (2014) found that the classification into three subgroups remained a significant predictor of genetic distance after having statistically accounted for the effect of geographical distance for the green group, whereas the classification into three subgroups was able to explain only a small portion of the genetic variation after statistically accounting for the effect of geographical distance for the blue group. They concluded from the results of the partial Mantel tests that the subdivision within the green subgroup could be explained in terms of long-term historical processes rather than microevolutionary processes resulting from IBD, whereas IBD is the most parsimonious explanation to account for genetic differentiation within the blue group. Our regression analyses (Figures S1g-k and S2d-h) revealed that the relationships of genetic and geographical distances within and between the subgroups are more complicated. Our tests for H 01 show that the regressions differ between all subgroups (Table S1). The general problems of Mantel tests (see above) are aggravated in these cases by combining subgroups (actually three in each test) with different relationships of genetic and geographical distances in a single correlation test.
Such a test cannot provide evidence for the distinctness of the single subgroups and may even be misleading. Our tests showed that the pair-wise differentiation of subgroups within the green group as well as of blue_N and blue_C can be explained by IBD considering the relationship of genetic and geographical distances within one of the subgroups, but not from the perspective of the other subgroup ( Figure S1h-k, Table S1). Thus, it might be prefera-

ACK N OWLED G M ENTS
We are grateful to Christy Edwards and Mónica Medrano for providing data and two reviewers for thoughtful comments. This study was funded by the priority programme "Taxon-Omics" of the Deutsche Forschungsgemeinschaft (HA 2763/6-1).