Range‐constrained co‐occurrence simulation reveals little niche partitioning among rock‐dwelling Montenegrina land snails (Gastropoda: Clausiliidae)

Abstract Aim Taxon co‐occurrence analysis is commonly used in ecology, but it has not been applied to range‐wide distribution data of partly allopatric taxa because existing methods cannot differentiate between distribution‐related effects and taxon interactions. Our first aim was to develop a taxon co‐occurrence analysis method that is also capable of taking into account the effect of species ranges and can handle faunistic records from museum databases or biodiversity inventories. Our second aim was to test the independence of taxon co‐occurrences of rock‐dwelling gastropods at different taxonomic levels, with a special focus on the Clausiliidae subfamily Alopiinae, and in particular the genus Montenegrina. Location Balkan Peninsula in south‐eastern Europe (46N–36N, 13.5E–28E). Methods We introduced a taxon‐specific metric that characterizes the occurrence probability at a given location. This probability was calculated as a distance‐weighted mean of the taxon's presence and absence records at all sites. We applied corrections to account for the biases introduced by varying sampling intensity in our dataset. Then we used probabilistic null‐models to simulate taxon distributions under the null hypothesis of no taxon interactions and calculated pairwise and cumulated co‐occurrences. Independence of taxon occurrences was tested by comparing observed co‐occurrences to simulated values. Results We observed significantly fewer co‐occurrences among species and intra‐generic lineages of Montenegrina than expected under the assumption of no taxon interaction. Main conclusions Fewer than expected co‐occurrences among species and intra‐generic clades indicate that species divergence preceded niche partitioning. This suggests a primary role of non‐adaptive processes in the speciation of rock‐dwelling gastropods. The method can account for the effects of distributional constraints in range‐wide datasets, making it suitable for testing ecological, biogeographical, or evolutionary hypotheses where interactions of partly allopatric taxa are in question.


| INTRODUCTION
Speciation is often classified as adaptive or non-adaptive. It is strictly adaptive when the evolution of a new species is triggered by adaptation to a new niche. This predominantly in situ process is characterized by descendant sister species remaining in sympatry and showing apparent differences in their habitat preferences. Radiations in ancient lakes are prime examples of such "strong" adaptations (Sch€ on & Martens, 2004). The majority of speciation events is allopatric (Coyne & Orr, 2004;Turelli, Barton, & Coyne, 2001), but an increasing number of studies suggests that sympatric speciation, when ranges of sister taxa overlap, is probably less exceptional than previously thought (Bolnick & Fitzpatrick, 2007;Li et al., 2016). In allopatry, sister taxa possess non-overlapping distribution areas, and their genetic differentiation follows geographical partitioning of the ancestral species' original range. Divergence is initiated by non-adaptive processes (e.g. drift) and, even if descendants adapt to distinct habitats, that is a consequence, rather than cause of the speciation (Rundell & Price, 2009). Therefore this process is often termed as "weak" adaptation (Knox, 2004). Some authors claim that allopatric speciation is driven mostly or entirely by nonadaptive factors (Gittenberger, 1991(Gittenberger, , 2004Wilke, Benke, Br€ andle, Albrecht, & Bichain, 2010), but most speciation events can be best explained by a combination of adaptive and non-adaptive forces (Dieckmann, Metz, Doebeli, & Tautz, 2004;Olson & Arroyo-Santos, 2009). Therefore, rather than polarizing the question whether speciation is adaptive or non-adaptive, we instead ask to what extent it is adaptive or non-adaptive.
Niche differentiation can be the cause or consequence of speciation: it precedes speciation in "strong" (adaptive) cases, but not necessarily in "weak" (non-adaptive) cases. We can assume that the less adaptive the speciation process is, the slower is the niche partitioning over time. Therefore, a comparative study of phylogenetic vs. niche divergence can provide indirect evidence for the role and relative significance of the adaptive and non-adaptive mechanisms in a taxon's evolution (Losos, 2008;Losos & Mahler, 2010). However, there are some shortfalls in implementing this seemingly simple idea in practice. Although there are methods for quantifying niche differences of species coexisting within the same community (Godoy, Kraft, & Levine, 2014), they are not applicable to allopatric species.
In allopatry, habitat descriptors and other environmental factors might be used to describe niches (McCormack, Zellmer, & Knowles, 2010). But even if habitat and climate are well-characterized, they provide little clue about the niche itself because similar habitat preferences of sister taxa cannot be seen as insurmountable evidence for their niche overlap, nor can slight differences in habitat preferences be taken as proof of niche differences (Olson & Arroyo-Santos, 2015;Sober on, 2007).
When Gittenberger (1991Gittenberger ( , 2004 considered the rock-dwelling gastropods (e.g. Albinaria) as ideal examples of non-adaptive radiations, he not only argued that their habitats are similar, but also claimed that there are fewer than expected known cases where more than one Albinaria species co-occur. A practical way to test this hypothesis could be obtaining information indirectly about niche segregation by studying co-occurrence patterns (Pianka, 2011). However, up to now Gittenberger's field experience-based assumption remained untested.
This prompted us to investigate co-occurrence patterns of rockdwelling gastropods and to test the hypothesis that observed co-occurrences of rock-dwelling gastropod congeners are less frequent in nature than expected under random distribution (Gittenberger, 1991(Gittenberger, , 2004. In accordance with the competitive exclusion principle (Hardin, 1960), we started with the assumption that frequent co-existence of two species is an indirect indication that their niches are not identical, otherwise one of them would have excluded the other. On the same basis, no or fewer than expected co-occurrence of two sympatric species indicates overlap of their niches. We compared not only pairs of species but also those of higher taxa at different stages of taxonomic/phylogenetic relatedness (Godoy et al., 2014). Our goal with this was to identify which was the likely phylogenetic level in their divergence at which niche segregation happened and, hence, to provide indirect information on the significance of adaptation in the process of speciation. As a model system, we chose gastropods native to rocky habitats in the Balkan Peninsula, and primarily the species-rich door snail genus Montenegrina Boettger, 1877. For the phylogenetic perspective we tested co-occurrences at different levels of taxonomic relatedness: at the genus level within Montenegrina (divided into morphologybased species, as well as intra-generic clades of the mitochondrial tree), at the subfamily level within the door snail subfamily Alopiinae, at the family level within door snails, at the subclass level within pulmonate gastropods, and at the class level between pulmonate and caenogastropod snails. We also present a methodological framework that is capable of simulating range-wide occurrence patterns of multiple species with partially overlapping ranges and data obtained by spatially varying survey effort.

| The study system
The main part of the analyses below the genus level was carried out with members of the obligate rock-dwelling door snail genus Montenegrina. Feh er and Szekeres (2016) distinguished 29 species with 106 subspecies based mainly on shell morphology. Montenegrina belongs to the same subfamily as Albinaria and has similar habitat preferences, but its smaller range allows more comprehensive sampling. The distribution area in the north-western part of the Balkan Peninsula includes approximately 400 known localities (Figure 1). Other land snail taxa were also included in the analyses (see alignments questionably aligned positions were eliminated with GBLOCKS 0.91b (Castresana, 2000), applying all "less stringent" block selection parameters. The lengths of 16S and 12S alignments after trimming were 755 and 659 bp, respectively. Thereafter, the three alignments were concatenated. trees were sampled every 100 generation. The first 20% of trees were discarded as burnin and a 50% majority rule consensus tree was calculated from the remaining trees. Maximum likelihood (ML) analysis was performed by GARLI 2.0 (Zwickl, 2006). We selected the tree with the best ML score after 20 independent runs with random  as "potentially suitable habitats." In order to keep habitat-related factors (habitat suitability) fixed, and thus allow focusing on rangerelated issues, only these 1649 sites were used in further analyses.
Their uneven geographical distribution is due partly to the uneven distribution of limestone areas, and partly to the uneven sampling activity within the study area ( Figure 1a).
Distribution records were arranged into presence-absence data matrices (Y matrices), where rows are sites, columns are taxa, and a Y it matrix element takes 1 or 0, depending on whether taxon t was detected at the site i. We made three dif- Taxa of the Y1-Y3 matrices form 1081, 1830 and 2628 taxon pairs respectively. For some of the further analyses these taxon pairs were categorized by the taxonomic/phylogenetic relatedness of their members, particularly whether they are related at the class, subclass, family, subfamily, genus or the intra-generic clade level ( Figure S1.1).

| Definition of taxon ranges
We defined taxon ranges as a continuous spatial utilization distribution based on occurrence data for each species, rather than as (binary) range maps. Assuming a presence-absence dataset of T taxa at I sites, we introduced a taxon-and site-specific measure denoted as OP it (occurrence probability of taxon t at site i). We used presenceabsence status at each site and a spatial weight matrix (W) to calculate OP. Spatial weights determined the extent that two sites contribute to each other's OP metrics, depending on their pairwise geographical distances, and the number of W values belonging to each site is equal to the number of sites involved in the study (= the number of rows in the Y matrix). The spatial weight between any two sites (i, j) was defined as a logistic function: Where d ij is the geographical distance between the two sites (in km), d 0 defines the distance where the weight is 0.5 and k determines the steepness of the distance decay function (see Figure S2.2). In order to get rid of d ii = 0 values, and thus errors by taking the logarithm of zero, 0.1 km was added to all pairwise distances. Considering the geographical distances between our study sites, as well as the applied k and d 0 values, this modification had no significant impact on the results, as indicated by sensitivity analyses that we performed. Thus, W ij can take a value between 0 and 1 and, unlike in conventional spatial weight matrices (e.g. Murayama, 2012), W ii was defined as 1.
Based on the W ij values defined above, the OP metric of a given t taxon at site i is given as F I G U R E 2 Phylogenetic tree of Montenegrina, based on Bayesian inference analysis of mitochondrial (COI, 16S rRNA and 12S rRNA) genes. Clade assignments correspond to those of Tables S1.2 and S1.3. The tree was rooted using Vallatia vallata (Mousson, 1859) as an outgroup (not illustrated) [Colour figure can be viewed at wileyonlinelibrary.com] where the binary Y jt value defines whether t taxon was present at site j. Thus, in practice, OP it is the sum of W values of the presence sites, divided by the sum of W values of all sites. OP values were calculated for all sites for all taxa, therefore the OP matrix has the same dimensions as the raw dataset.
Though, to some extent, this formula takes all sites into account, the proper selection of the two constant parameters (k and d 0 ) serve the purpose of avoiding too large an influence of the sites on their own probability values (undersmoothing), and also preventing a significant contribution by sites at biogeographically irrelevant distances (oversmoothing). Such smoothing is used to incorporate expert knowledge or range maps into species distribution modelling (see Merow, Wilson, & Jetz, 2017) where the level of smoothing is often empirically calibrated. During our exploration of the Balkans we have found that the probability of finding a certain rock-dwelling gastropod species at a newly explored site depends on the distance from the nearest sites where this species is known to occur. Based on our field experience we have come to suspect that the distance decay of the probability of jump dispersal events can be described by a logistic, rather than a linear function, and the transition between the biogeographically relevant and irrelevant distances might be somewhere between 10 and 100 km. Accordingly, we tried to set d 0 and k parameters so that neighbouring sites nearer than 10 km should make a nearly complete (W i % 1.0) contribution to each other's OP calculations, whereas those farther than 100 km apart contribute nearly zero.
To assess the sensitivity of the results to the smoothing parame-

| Range-constrained co-occurrence simulation
Presence-absence records were simulated by geographic range-constrained random selections (simulated presence-absence tables are also denoted as Ψ matrices). The constraints were defined by matri- Next, we randomized the order of its elements (n 0 ), and then multiplied by the raw OP matrix. The product was finally rescaled so that the total sum of the resulting matrix is equal to the total number of presence records in the observed Y data table, but the elements are proportional to the products of the OP it values of the corresponding site (site i) and the number of taxa found on a randomly selected site ("hard" corrected model). Third, in order to avoid eventual zero values in the rescaled OP 0 matrix, we modified the correction vector by adding 1 to each of its elements before multiplying that with the raw OP matrix ("soft" corrected model) ( Figure S2.3). The "soft" correction stands in between the "uncorrected" and "hard" corrected algorithms in terms of matching the marginal distribution of the input data matrix.
Based on the rescaled OP 0 matrices, occurrence data tables (denoted as u Ψ, h Ψ and s Ψ for the uncorrected, the "hard" corrected and the "soft" corrected simulations) were simulated based on unequal probabilities of selection (i.e. a doubling of an OP 0 it value provides twice the chance for a given taxon in a given site to be selected in each selection round). During the simulation, random factors acted independently in each round, but under the same rangeconstraints that were defined by the site-and taxon-specific occurrence probability values. Simulated co-occurrences (l) were calculated from Ψ matrices as the number of sites where both taxa were present. These steps of the distribution simulation and co-ocurrence calculation were repeated 1000 times and minimum, maximum and mean values were calculated for each of the simulated pairwise cooccurrences. It is important to note that for the "hard" and "soft" corrected models each simulation round started from a newly randomized n 0 vector.
An R script was used to calculate OP, OP 0 and corrected OP 0 matrices to make randomizations, to simulate Ψ tables and to calculate co-occurrences. We used the "nullmodel" and "oecosimu" functions in the "vegan" R extension package (Oksanen et al., 2016) to perform the Monte Carlo simulations for our null-model analysis.
The R script with a worked example is available at the Zenodo Digital Repository (https://doi.org/10.5281/zenodo.1124944). For a two-sided null hypothesis testing we used the range of the simulated values (minimum to maximum). Although both more than expected and fewer than expected co-occurrences were recorded, we were mainly interested in the latter, as from the point of our research question the distinction between dissociation and random distribution was of primary importance.
For a better visualization and better comparison of the results, the observed pairwise co-occurrence counts (m) were rescaled relative to the range of the simulated co-occurrence values (l) Hence, this rescaled value (m 0 ) is equal to À1, 0 and +1 when the observed co-occurrence count is equal to the lowest (l min ), the mean ( l), or the highest (l max ) simulated counts respectively. If the observed count is below or above the simulated range, m 0 takes a value lower than À1, or higher than +1.
Simulated pairwise co-occurrence values of rare and/or largely non-overlapping taxa are generally very low. In such cases the ranges of simulated values usually include zero, which makes it impossible to assess the lack of observed co-occurrences, that is the distinction between "expected" and "fewer-than-expected" zero cooccurrence values. This was the case with the subdivided Montenegrina data: the more clades we split them into, the more pairwise unassessable zero co-occurrence counts were obtained. To surmount this, we cumulated pairwise co-occurrence counts: the observed counts were summed up by the groups as outlined above (Table S1.1), and the same was done with the simulated pairwise counts after each simulation round. Means and ranges (minimummaximum) were calculated from these cumulated counts and compared to the group sums of the observed co-occurrences. This kind of calculation helps overcoming the problem caused by the pairwise unassessable zero values, but should be interpreted carefully because pairs of widespead taxa in many presence records may considerably influence, and eventually distort the results.

| Effect of model settings and correction modes
The Y1 matrix, in which Montenegrina records were merged (47 taxa 9 1649 sites, Table S1.4) contained 7033 presence records altogether. Instead of a symmetric shape, with a mode near the 4.3 mean value, the frequency distribution of the taxa per site values was strikingly right-skewed and platykurtic (the mode was at 1).
Hence, the number of sites with zero to two taxa, as well as those with more than six taxa, was higher than expected under a nearly symmetrical distribution (simulated by the "uncorrected" model) assumption (Figure 3).
The total number of observed co-occurrences was 20,031. Simulations under the "hard" correction, which were based on a taxon per site frequency distribution, yielded almost the same total number of co-occurrences (20,219). Uncorrected models, depending on how the d 0 and the k parameters were set, simulated 14,627-14,800 total co-occurrences, whereas models using "soft" correction simulated total co-occurrences in between those of the "hard" and the "uncorrected" models (18,186).
Each of the studied taxa occurred together with 2.2-12.8 other taxa (Table S1.1). Those taxa requiring more sampling effort (namely the members of Ferussaciidae, Argnidae, Cochlicopidae, Pupillidae, Valloniidae, Punctidae, Euconulidae, which comprise mainly smallsized and/or locally rare species that often require special collecting techniques) co-occured with more than nine other taxa on average.
At the other end of the spectrum there were larger-sized and locally frequent taxa (including most of the alopiinine genera), which are easy to collect and, therefore, are more likely to be found at cursorily sampled sites (Table S1.1).
Out  Table 1). The highest values were simulated under the "hard correction," and the lowest ones under the "uncorrected" way of modeling. The model setting that led to the lowest co-occurrence counts was the distance decay with steepest slope (k = 5 and d 0 = 30 km), assigning the lowest W values to distances higher than 70 km (Table 1). The fewest outliers were found when the k = 5 and d 0 = 30 km settings were combined with the "hard" correction. This combination of model settings resulted in the lowest higher-than-expected and, at the same time, the highest lower-thanexpected values ( Figure S3.4, Table 1).
Montenegrina did not co-occur with the families Bradybaenidae, Cochlicellidae and Euconulidae and three alopiinine genera: Alopia, Carinigera and Dilataria (Table S3.7). All these observed zero cooccurrences were, however, within the simulated ranges. Montenegrina co-occurred at least once with the other 40 taxa in our dataset, and the vast majority of these observed non-zero co-occurrence values was also within the simulated pairwise ranges, regardless of the correction types and model settings applied. Three of the nine models simulated more co-occurrences than observed with the genus Herilla. Two of the models simulated fewer co-occurrences than observed with the family Helicidae, and only one of the models with the families Enidae and Hygromiidae (Table S3.7).

| Correlation between co-occurrences and taxonomic relatedness
We categorized pairwise co-occurrence counts by taxonomic relatedness. The most extreme outliers were at the class level, where we found only higher-than-expected outliers. By contrast, at and below the subfamily level there were no higher-than-expected outliers at all. In general, we found that the closer related the taxon pair members were, the lower the observed co-occurrence counts were relative to the simulated ones on average ( Figure 4).
Cumulated data showed a similar picture. For the 29 pulmonate families the cumulative observed co-occurrence value was almost within the ranges simulated by the models applying "hard" correction, and well above those simulated by the "soft" and the "uncorrected" models. For the 17 alopiinine genera the cumulative observed co-occurrence values fell within the ranges simulated by the three "uncorrected" models, and slightly below those that were simulated by the six corrected models. Regardless of the model settings, correction types and the way of division (phylogeny-or morphology-based), total observed co-occurrences among Montenegrina subgroups were far fewer than expected ( Figure 5, Table 2).

| DISCUSSION
We evaluated co-occurrence patterns of rock-dwelling land snails in a phylogenetic perspective and we found strong support to the assumed dominance of non-adaptive factors in their speciation (Gittenberger, 1991(Gittenberger, , 2004. In order to demonstrate this we developed RaCoCOS (range-constrained co-occurrence simulation), a method comprising a probabilistic framework to define taxon ranges, a cooccurrence simulation null model that accounts for taxon ranges, and a method to assess and correct for biases in opportunistically collected biotic data.

| A need for range-constrained co-occurrence
analysis Elton (1946) suggested that, based on the competitive exclusion principle, co-occurrence patterns of two or more taxa can provide information about their niches. However, it has rarely been applied to gastropods (an exception is Dillon, 1987), and never to obligate  (Table S1.4). Simulations were done with k = 3 and d 0 = 30 km smoothing parameters with "hard" (triangles), "soft" (dots) or no (squares) model correction. This is a tool for quick visual assessment of the bias in the raw data. The striking difference between the frequencies of observed taxa per site counts and those simulated without correction indicates some bias, presumably caused by uneven sampling [Colour figure can be viewed at wileyonlinelibrary.com] T A B L E 1 Summary of co-occurrence simulations for 47 Balkan land snail taxa (Y1 data matrix, Table S1.4). We applied nine different combinations of model corrections and smoothing parameter settings and ran 1000 simulations using each. Ranges (minimum-maximum) of the simulated co-occurrence counts for each of the 1081 taxon pairs were compared to the observed values. On this basis, taxon pairs were categorized into four groups: observed value is less than the simulated range/observed zero value falls within the simulated range/observed non-zero value falls within the simulated range/observed value is higher than the simulated range. Detailed outcome of the simulations using the "hard" correction with k = 5 and d 0 = 30 parameter settings is shown in Figure S3.4  (Gotelli, 2000). In the cases of dispersal-limited taxa a severe limitation of traditional null-model techniques is their inability to account for the spatially autocorrelated nature of the species distributions. As a consequence, these methods cannot be applied directly to a study system like ours, comprising partially allopatric species (Stone, Dyan, & Simberloff, 1996). To circumvent this, we introduced RaCoCOS which simulates co-occurrences under the constraint of the geographic ranges of the taxa.

| Definition of geographic ranges
In contrast to range definition methods that sharply outline areas, for instance by drawing convex hulls around the known occurrence records (Connor, Collins, & Simberloff, 2013), in this study, we F I G U R E 5 Pairwise co-occurence counts cumulated for groups of Balkan land snail taxon pairs. The phylogenetic relatedness-based division is the same as that of Figure S1.1 and Figure 4, but two of them, due to low numbers of elements, where left out. Dots indicate observed values, maximum-mean-minimum lines indicate simulated values. Vertical axes indicate the number of co-occurrences. Detailed results of the nine different model settings are given in Table 2. Here, only the two most extreme simulated ranges are illustrated; namely those of the "hard" correction with k = 3 and d 0 = 30 km (right) and the "uncorrected" model with k = 5 and d 0 = 30 km parameter settings (left) defined taxon ranges through a spatial distribution kernel based on the probability of occurrence, which is the function of a spatial weight matrix and the known presences and absences of the taxon.
This approach can take into account the frequency (number of occurrences relative to the range size) and uneven density of taxa within their ranges, as well as whether two taxa are regionally allopatric or widely interspersed within the range overlap. Its theoretical background is the assumption that current ranges and distribution patterns are the results of various, partly deterministic factors like ancestral locations ("areas of origin" or "areas of refuge"), dispersal limitations (time and spatial dependence of colonization events), and the probabilities of local extinction events. When we found a high proportion of presence records of a given taxon within an area, it was interpreted as an indication that all potentially suitable sites in that area had a high probability of being colonized. Hence, we assigned high occurrence probability values for that taxon to each site in that area. By contrast, a high proportion of absence records (no or just sporadical presences) of a taxon in a given area meant that any site had a low probability of being colonized by the taxon, therefore the sites belonging there received low probability values (see Figure 1b,c for an example).
Mathematically the distance decay of spatial weights can be defined in different ways (Murayama, 2012). Here we chose a logistic function and defined its parameters empirically. To assess the sensitivity of the results to these settings we used three pairs of k  (Table S3.7).
The distances considered relevant in terms of dispersal and colonizing ability can vary considerably between different animal groups, and it is conceivable that taxon-specific parameterization of the distance decay would increase the effectiveness of our method. Finding a way to objectively define weight distances was beyond the scope of this study, but this might be a future advancement. A promising T A B L E 2 Cumulative pairwise co-occurence counts of Balkan land snail taxa grouped by the phylogenetic/taxonomic relatedness of taxon pair members. These groups are 27 Montenegrina morphospecies, Montenegrina subclades within the main clades, Montenegrina subclades between the main clades, 17 genera in the subfamily Alopiinae and 29 families of the class Pulmonata. Simulations of pairwise co-occurrences were made with nine different combinations of model corrections and settings. Values of taxon pairs in the same category were summed by each simulation round. Ranges (minimum-maximum), as well as mean values of the cumulated counts were calculated. The two most extreme ranges, namely those simulated by the "hard" correction with k = 3 and d 0 = 30 km and by the uncorrected model with k = 5 and d 0 = 30 km parameter settings are illustrated in Figure 5 Observed values approach could be combining spatial filtering methods developed for data sampled at non-regular grids (Wagner & Dray, 2015) with extensions to presence-absence and presence only datasets.
Nevertheless, the sensitivity analyses indicated that the model was far less sensitive to changes in these parameters than to the application of differing correction types (Tables 1-2, S3.7).
Once the d 0 and k parameters are selected, further points to consider are sampling density and the size of the study area. If, compared to the selected d 0 and k parameters, the study area is sampled at too low a density, the OP values of each site will be determined mainly (sometimes exclusively) by their own presence-absence statuses. Thus, after fitting model parameters to the study system, the spatial representation of the samples should be in accordance with the selected model parameters. At peripheral sites of the delimited study area the exclusion of sites in neighbouring outside regions might be a further source of bias in the OP calculation. To circumvent this, either complete territorial units (e.g. an entire island) should be selected or, when not possible, the study area should contain a reasonably wide peripheral "buffer zone."

| Quality assessment of biotic data
Even with standardized sampling methods locally rare or smallersized species, as well as those preferring cryptic microhabitats, are more likely to be overlooked than others and thus are underrepresented in biotic datasets (S olymos, 2007). Furthermore, if the sampling effort was not equal the more effort-demanding taxa may also show aggregation at better explored sites. If we use biotic data collected by different methods for different original purposes, which is usually the case for datasets harvested from various sources, we might reasonably expect the dataset to be biased to some degree. In order to correct this, and/or be able to interpret the analysis outputs properly, it is essential to assess the quality of the datasets.
Considering similar habitats and equal sampling effort, one would expect a nearly symmetric density distribution of taxon per site numbers, and also small differences in the average numbers of other taxa with which a given taxon co-occurs. If in a dataset the density distribution of taxa per site values considerably deviates from this assumption and certain taxa differ substantially in the average numbers of taxa they co-occur with, it might be the indication of biased data. Our field experience suggests that our dataset's deviation from the symmetry assumption must be due to uneven sampling, rather than to differences in habitat suitability. In other words, in most cases low taxon richness at a site is primarily due to cursory sampling. The fact that those taxa demanding most sampling effort were found to be associated with the higher taxon richness strongly supports this assumption. At the same time, we might also suppose that some taxa, specifically those requiring least sampling effort (e.g. members of the Alopiinae, upon which our study focused) are more reliably represented in this dataset than others.
The OP value of a given taxon at a given site is calculated only on the basis of its presence at the neighbouring sites, regardless of the presence or absence of other taxa. Therefore, the OP values, as we calculated them, are not capable of reflecting differences in the sampling efforts. If the simulation of taxon distributions is based only on the rescaled OP matrix (as in the "uncorrected" models), the sum of the elements in the simulated Ψ matrices will approximate the total number of the observed presence data (the sum of the Y matrix), and the frequency distribution of the simulated matrix will be symmetrical with a mode similar to the mean. The more uneven the sampling that produced the raw presence-absence data, the more right-skewed would be the frequency distribution of the observed taxa per site values (rows sum of the Y matrix). As the number of co-occurences at a site is equal to (n 2 -n)/2, if n is the number of presence records, it is easy to foresee that higher n implicates exponentially higher co-occurrence values. When the distribution of n value density gets more right-skewed and platykurtic, that is the number of sites of around average n values decrease and those above and below it increase, the total number of co-occurrences will also increase.
In view of the above, the fact that the sum of observed cooccurrences was far larger than that we simulated under uncorrected model assumption was primarily due to the deviation of observed data from null distribution, and thus reflected data quality rather than real taxon interactions.
Such bias can be best corrected if the density of taxon per site values of the simulated Ψ matrices approximate those of the observed data. Correction methods that define null hypotheses taking number of taxa per site proportional to observed species-richness are in general use (e.g. SIM5 type model in Gotelli, 2000). Our assumption, however, was that the biased distribution of speciesrichness per site in our dataset is mainly due to uneven sampling and not to differential habitat suitability. Instead of directly using the number of taxa per site values for correction, we introduced an additional step, namely the randomization of the number of taxa per site vector, before each simulation round ( Figure S2.3). The correction with this vector ensured that the distribution of simulated speciesrichness per site values approximated those of the observed ones (and, therefore, the sum of total co-occurrences did the same). But, due to the randomization step before each simulation round, the uneven sampling effort can be simulated under the assumption of equal habitat suitability. As the "hard" correction is based on such taxa per site density distribution as that of the input Y matrix (Figure 3), the total sum of co-occurrences simulated under this model correction is close to the observed value.
The "hard" correction outlined above excludes as many sites from each simulation round as have zero presences in the input dataset. It is conceivable that one might encounter Y matrices that contain a certain number of sites with zero presence values (either because no taxa were sampled at such sites, or because the sampled ones are excluded from the analysis). The fewer the taxa that comprise an input Y matrix and the higher the number of sites without presence records, the stronger the constraint imposed by the "hard" correction method. The "soft" correction was introduced to relax this constraint by not allowing the exclusion of any of the sites from any of the simulation rounds. Due to the taxa per site density distribution on which the simulation was based, the resulting total sum of co-occurrences was between those of the "hard" and the "uncorrected" models.
The "hard" method corrects according to the average bias of the raw data. As demonstrated above, the representation of different taxa may be differently biased in such opportunistically collected biotic datasets. Accordingly, the "hard" correction method might "overcorrect" for better represented taxa than for others, for example for Montenegrina and for the other obligate rock-dwelling alopiinine genera in this study. Such "overcorrection" might result in false dissociations (Type I error), or might mask existing dissociations (Type II error). As it is usually impossible to precisely define which taxa deviate, and to what extent, from the average bias in a biotic dataset, it is expedient to make simulations with different models and to take them all into account when interpreting the results.

| Co-occurrence in a phylogenetic perspective
When we compared co-occurrence patterns at different phases of phylogenetic divergence a somewhat similar concept was followed to that of the "age-range correlation," where the geographic range overlaps are placed into a phylogenetic perspective (Fitzpatrick & Turelli, 2006). At the family level we assumed no competitive exclusion (Diamond & Gilpin, 1982), and thus we used family level records as "negative control" in our study, expecting neither associations nor dissociations in co-occurrences. As expected, random co-occurrence patterns were indicated by corrected models, but under uncorrected model assumptions these appeared to include associations. As it was demonstrated above that raw data are biased, we suspect that any associations at the family level under the uncorrected model assumptions are due to this, rather than to any real interactions between the families. It was more difficult interpreting the patterns found at the genus level among alopiinine door snails. Here corrected models infer some dissociations. If we consider the simulations of these models more realistic, we might assume that even at the genus level there is some degree of competition. An alternative explanation can be that the less biased representation of these taxa in the raw presence-absence datasets leads to the overestimation of their expected co-occurrences.
Observed co-occurrences among Montenegrina species and subgeneric clades are far fewer than expected under the assumption of random distribution, regardless of the model settings or correction types we applied. Below the genus level the differences between the observed and simulated values are so clear and obvious that it can be interpreted with little doubt as a sign of competitive exclusion and, indirectly, as an indication that the appearance of subgeneric clades and species did not result in considerable niche partitioning. Compared to the phylogenetic divergence, the rate of niche differentiation appears to have lagged behind. This can be viewed as strong support for the hypothesis that the speciation of rock-dwelling gastropods, at least those belonging to the alopiinine door snails, was driven primarily by non-adaptive processes (Gittenberger, 1991).
Nevertheless, there are at least 10 known cases when distinct species of Montenegrina co-occur (Feh er & Szekeres, 2016: Table 1). Nine of these were included in the raw presence-absence dataset of this study. Although this number is far lower than expected from any of the simulations, it is worth considering how and why these species can co-occur. There are two likely explanations. One is that even if most of the Montenegrina species have highly similar niche preference, some of them might already diverged in this respect. Then these niche-diverged species would account for the few co-occurrences. This might explain two of the observed cases at the shore of Lake Ohrid, where M. stankovici (Urba nski, 1960) co-occurs with M. dofleini pinteri Nordsieck, 1974 and M. perstriata ochridensis (Wagner, 1925). Montenegrina stankovici prefers a microhabitat different from those of its congeners, namely exclusively inhabiting rocks in the immediate vicinity of the lake surface. At these two co-occurrence sites an apparent spatial segregation can be observed on a fine scale as the con-  Table 1). These gorges harbouring descendants of more than one Montenegrina species function as natural filters for washed-away individuals. Although in the drainage area these species occur allopatrically, incidental or recurrent colonization can result in their coexistence at these gorges. If this assumption is correct, such sites can be viewed as "natural experiments" that offer ideal model systems for studying the spatial and temporal dynamics of competitive exclusion.

| Prospects
With the use of community null-modelling techniques we studied co-occurrence patterns in range-wide distribution data of gastropods from an evolutionary point of view. In order to draw meaningful conclusions we assembled a large distribution dataset of various gastropod taxa of similar habitats. By using geographical distribution kernels the introduced method proved capable of accounting for range differences in allopatrically distributed taxa. It includes a correction method to balance biased distribution data that arise from differential sampling. This is particulary useful if we harvest presence-absence data from museum databases or public biodiversity repositories. Although it was developed for rock-dwelling land snails, this null-modelling framework can be adopted with little alteration for the analysis of large-scale distribution records of other taxonomic groups. Hence, our approach can provide a blueprint for studies addressing a variety of biogeographical, phylogeographical, evolutionary, or community ecological problems in relation to allopatry.