Estimating immigration in neutral communities: theoretical and practical insights into the sampling properties


  • François Munoz,

    1. UM2 – UMR AMAP (botAnique et bioinforMatique de l’Architecture des Plantes), Boulevard de la Lironde, TA A-51/PS2, F-34398 Montpellier Cedex 5, France
    Search for more papers by this author
  • Pierre Couteron

    1. IRD – UMR AMAP (botAnique et bioinforMatique de l’Architecture des Plantes), Boulevard de la Lironde, TA A-51/PS2, F-34398 Montpellier Cedex 5, France
    Search for more papers by this author

Correspondence author. E-mail:


1. Widening applications of neutral models of communities necessitates mastering the process of inferring parameters from species composition data. In a previous paper, we introduced the novel conditional GST(k) statistic based on community composition. We showed that it is a reliable basis for assessing migrant fluxes into local communities under a generalized version of the spatially implicit neutral model of SP Hubbell, which can accommodate non-neutral patterns at scales broader than the communities.

2. We provide here new insights into the sampling properties of the GST(k) statistic and on the derived immigration number, I(k). The analytical formulas for bias and variance are useful to assess estimation accuracy and investigate the variation of I(k) across communities.

3. Immigration estimation is asymptotically unbiased as sample size increases. We confirm the validity of our analytical results on the basis of simulated neutral communities.

4. We also underline the potential of using I(k) as a descriptive index of community isolation, without reference to any model of community dynamics.

5. We further propose a practical application of the bias and variance analysis for defining sampling designs for immigration quantification by efficiently balancing the number and size of community samples.


Since the seminal work of Hubbell (2001), the neutral theory of biodiversity has raised much interest by providing a nontrivial null hypothesis against which the nature of real-world biodiversity patterns may be tested and discussed (McGill 2003; Nee & Stone 2003; Volkov et al. 2003; Leigh 2007). Under the key assumption of ‘ecological equivalence’, namely that individuals have equal fitness whatever their species or habitat, Hubbell (2001) and followers (Etienne 2005) proposed a two-level spatially implicit neutral model (so-called 2L-SINM by Munoz, Couteron, & Ramesh 2008; see a synthesis in Beeravolu et al. 2009 and present Fig. 1a left) to decouple the dynamics of the large biogeographical background (i.e. the metacommunity), in which the effect of infrequent speciation and extinction events is quantified via the biodiversity parameter θ (speciation-drift balance), from local community dynamics, which is controlled by the immigration parameter, I (migration-drift balance, Etienne & Olff 2004). The theory has found some support from the investigation of very rich tree communities in wet evergreen tropical forests, where alternative models were not satisfactory for explaining some observed diversity patterns, such as the overrepresentation of rare species in the species abundance distributions (SAD) (Hubbell 2001; Chave, Alonso, & Etienne 2006). From a theoretical standpoint, neutral models can help us grasp how the interaction between limited immigration and local ‘ecological drift’ (Hubbell 2001) may influence taxonomic composition. Munoz, Couteron, & Ramesh (2008) have further proposed to relax the assumptions made by Hubbell (2001) at the upper level (i.e. the metacommunity) and to generalize the initial 2L-SINM into a 3L-SINM, by introducing an intermediate level, the regional pool of migrants, which may be shaped by non-neutral influences (Fig. 1a right). Under the 3L-SINM, the study of the local migration-drift equilibrium can be disconnected from non-neutral influences occurring at larger scales, via the estimation of the immigration parameter I.

Figure 1.

 Immigration theory and estimation in a network of communities. (a) Two versions of a spatially implicit neutral model are considered, including two levels (2L-SINM, conforming to Hubbell 2001; left) and three levels (3L-SINM, Munoz, Couteron, & Ramesh 2008; right). In the 2L-SINM, the migrants are randomly drawn from a large biogeographical source and the model is neutral at all scales, while in the 3L-SINM, the local community dynamics is neutral but the model allows for non-neutral sources of variation within the pool of immigrants (the third level). (b) summarizes the sampling and estimation issues addressed in the paper. All of the sampled communities k (e.g. forest plots) are assumed to be independent to allow the application of equation 1 as in Munoz, Couteron, & Ramesh (2008), and the immigration numbers are estimated from equation 2 (the hat notation represents estimated values).

Although inferring Hubbell’s parameters motivated a great deal of research in recent years leading to several estimation methods (Volkov et al. 2003; Etienne 2005, 2009; Munoz et al. 2007; Munoz, Couteron, & Ramesh 2008), the rapidly growing number of applications to real-world data sets has raised questions about how to assess the reliability of the inference, even in the strictly neutral case. Furthermore, published methods have frequently led to substantially differing parameter estimates when applied to the same data sets (see for instance Latimer, Silander, & Cowling 2005; Etienne et al. 2006; Etienne 2009, Table 1). The lack of knowledge on confidence limits has discouraged wider and more systematic applications of neutral theory through meta-analyses on parameter values. It is also hindering the development and application of more realistic models of community assemblages. Our aim is to provide such a statistically informed perspective on the estimation of the immigration parameter under the 3L-SINM.

The immigration parameter I is here central and represents the number of potential migrants competing with resident offspring at each mortality event within the community, and as such, it is closely related to Hubbell’s (2001) migration probability, inline image (N notes the size of a local community). The neutral theory basically invokes ‘dispersal limitation’ from the metacommunity to the local community to define I and m (Hubbell 2001; Etienne 2005). But when these parameters are estimated from real-world data, they may also include the effects of many processes that altogether determine the relative isolation and differentiation of the local community from its biogeographical background. I and m could therefore be taken as an integrative measure of community isolation and differentiation, even though the strictly neutral ‘dispersal limitation’ invoked by Hubbell is not the only process shaping their values. An estimation of those immigration parameters could therefore be especially useful for quantifying how the fragmentation of threatened habitats (e.g. tropical forests) may influence the diversity in isolated patches taken as different local communities. More generally, analyses comparing consistent estimates of I and m across biogeographical regions and biomes would be a valuable contribution to community ecology and conservation biology by providing a measure of apparent fragmentation, which could be related to the ecological, physiographical and historical peculiarities of the different regions. This research programme requires increasing our knowledge about the sampling properties of I and m, which is the purpose of the present paper.

Neutral models used for inference of I and m initially focused on species abundances in a single community (e.g. Hubbell 2001; Etienne 2005; Latimer, Silander, & Cowling 2005), while most recent work (Etienne 2007, 2009; Munoz et al. 2007; Munoz, Couteron, & Ramesh 2008; Jabot & Chave 2009) has allowed estimating neutral parameters from species compositions in a set of spatially scattered sampled communities, and provide estimations of I(k) and m(k) for each sample k. In this regard, Munoz, Couteron, & Ramesh (2008) have proposed a conditional version of the classical GST statistic from population genetics (Nei 1973; Takahata & Nei 1984; Slatkin 1985), GST(k), and have showed that I(k) can be inferred from GST(k) under the more general three-level spatially implicit neutral model (3L-SINM, Munoz, Couteron, & Ramesh 2008; Beeravolu et al. 2009). Although this GST(k)-based estimation of neutral immigration performed well on the basis of simulated neutral communities (Munoz, Couteron, & Ramesh 2008), there is still a lack of theoretical knowledge regarding the sampling properties of the method (as for all of the published alternative methods). We bring here new insights into the estimation of immigration based on the GST(k) statistic, by (i) providing analytical formulas, independent of any community model, for bias and variance of the similarity statistics underlying the computation of GST(k) and I(k) and (ii) analysing the sensitivity of the estimation process to variation in the target parameters and to the main features of the data, such as the number of samples and the sample sizes. These results will yield practical insights for sampling design by allowing for the choice of a favourable trade-off in the number and size of community samples, according to a desired level of estimation accuracy.

Methodological background

Let us consider Nc distinct ecological communities indexed by k. The probability that an individual in community k is from species i is denoted by pik, where = 1…S, and pi is the probability that an individual in the lumped set of communities is from species i. A sample is drawn from each community and includes a collection of individuals identified at a consistent taxonomic level (e.g. species), such as tree census plots scattered in a tropical forest. In practice, we assume that the samples are far enough away from one another so as to belong to distinct communities (see Munoz, Couteron, & Ramesh 2008). This allows attributing the same index, 1 ≤ ≤ Nc, to both the source communities and the corresponding samples. Sample k is then made of Nk individuals belonging to S species, and it thereby includes Nik individuals of each arbitrary species i. The lumped set of community samples further includes Ni individuals of each species i summing to N individuals.

The classical statistic GST (Nei 1973) is the ratio of beta diversity over gamma diversity, to use ecological concepts, which was defined from similarity statistics that directly relate to the Simpson’s diversity indexes (Simpson 1949; Couteron & Pélissier 2004). GST is thereby a standardized measure of the variation in composition across samples. The conditional version, GST(k), was proposed by Munoz, Couteron, & Ramesh (2008) to measure this information relative to each local community sample. It integrates local similarity Fintra(k), which is the probability of drawing two conspecific individuals conditional on drawing them in sample k, and global similarity Fglobal(k), which is the probability of drawing two conspecific individuals conditional on drawing the first in sample k and the other in the overall data set. We also denoted Finter(k) as the probability of drawing two conspecific individuals conditional on drawing the first one in sample k and the other one in another sample distinct of k. We will finally consider the probability of simultaneously drawing n individuals that belong to the same species conditional on drawing them in sample k, denoted as Fn(k). Specifically, Fintra(k) = F2(k).

The spatially implicit neutral models and the GST(k) statistic

A spatially implicit neutral model (SINM) represents the dynamics of local community composition, consisting of speciation, extinction and immigration events, without any explicit reference to the relative distances between communities. This family of models was introduced in neutral community ecology by Hubbell (2001). It is intrinsically hierarchical and aims to decouple biogeographical and ecological scales. Indeed, evolutionary, biogeographical and ecological processes together influence species diversity at different temporal and spatial scales, and hierarchical models can therefore be relevant and helpful for representing such nested influences (Beeravolu et al. 2009). In the two-level spatially implicit neutral model (2L-SINM, Fig. 1a left) of Hubbell (2001), species abundances in the large-scale metacommunity are driven by the balance between speciation and extinction events (speciation-drift equilibrium), under the control of the biodiversity parameter θ. At the scale of a local community, species abundances are shaped by a balance between immigration from the metacommunity and local extinction (migration-drift equilibrium). The immigration parameter I is the number of immigrants that compete with local offspring when an individual dies in the local community. This important parameter was first named the ‘fundamental dispersal number’ (Etienne & Alonso 2005), but the concept can be enlarged to include other sources of immigration limitation from the background region (Munoz, Couteron, & Ramesh 2008). Munoz et al. (2007) showed that the 2L-SINM of Hubbell can accommodate varying immigration conditions across local communities k, by allowing varying values of I(k), and several recent approaches have been proposed to estimate I(k) in the presence of this variation (Etienne 2007, 2009; Munoz, Couteron, & Ramesh 2008; Jabot & Chave 2009).

In the three-level spatially implicit neutral model (3L-SINM, Fig. 1a right) of Munoz, Couteron, & Ramesh (2008), the larger-scale source of immigrants is not necessarily considered to be a metacommunity at speciation-drift equilibrium because the model allows for various processes to occur at an intermediate scale, thereby providing greater flexibility for representing real-world patterns and migration pathways (Fig. 1 in Munoz, Couteron, & Ramesh 2008). In this case, contrary to the 2L-SINM, there is no assumption about the SAD of the immigrants. In the context of the 3L-SINM, Munoz, Couteron, & Ramesh (2008) introduced the computation of inline image and showed that GST(k) directly relates to the immigration parameter, I(k), of the source community k, through the relationship:

image(eqn 1)

Here, P(k/k) is the probability of drawing an individual in sample k conditional on first drawing an individual in k, which is measured as inline image. Applications to simulated neutral communities showed that exact estimators of the similarities,




are to be used to get unbiased estimates of the immigration parameters on the basis of equation 1 (Munoz, Couteron, & Ramesh 2008):

image(eqn 2)

The corresponding immigration rates (i.e. the migration probability of Hubbell 2001) can be estimated using inline image.

Sampling properties

A key issue is to investigate in more detail the sampling properties of inline image (equation 2) so as to gain insights about its bias and variance and to formally assess the degree of confidence in its estimation. The word ‘sampling’ here conceptually encompasses two distinct sources of variation.

First, the neutral theory of Hubbell (2001) allows for random fluctuations in species abundances owing to the finite sizes of the community (local drift) and of the metacommunity (global drift). Influxes of new species (speciation) in the metacommunity and of immigrants in local communities are necessary to avoid species fixation and monodominance, and to maintain species diversity. The stochastic nature of the replacement of dead individuals was shown to be strictly analogous to a sampling process (Etienne & Alonso 2005), so that Etienne (2005) and followers could derive exact sampling formulas for species abundances in local community samples k (Etienne 2005, 2007; Munoz, Couteron, & Ramesh 2008; Noble et al. 2010). The relationship between GST(k) and I(k) in equation 1 holds for any sample taken in a given community (Fig. 1b, part above the dashed line; see also Munoz, Couteron, & Ramesh 2008).

Second, although equation 1 is exact for any sample in community k, it only provides an estimate of I(k), denoted inline image, which is marked by random fluctuations in the constitutive similarity statistics within GST(k), because the species frequencies in samples are themselves random variables. Thus, the I(k) notation embodies the sampling variation because of ecological drift in neutral theory (equation 1 and Fig. 1b, above the line), while inline image further includes the sampling variation because of estimation error of similarity statistics (equation 2 and Fig. 1b, below the dashed line). The latter source of sampling error is characterized by the ‘hat’ notation and is the focus of the present work. We intend to gain insights into this sampling variation by applying a multinomial model of sample draws from the corresponding source community. This model requires assuming, as a first approximation, that the communities are large enough compared to the samples to allow the drawing of samples from the communities with replacement.

Analytical results

We investigated the sampling properties of inline image and inline image, using a delta method for calculating approximate sampling means and variances (Davison 2003). We then provided further analytical results for inline image in the particular case of the two-level version (2L-SINM). We used Mathematica ® (Wolfram, 2003) for Taylor series expansions and investigation of the limit cases.

Sampling variation in similarity statistics

We investigated the variation among many samples successively drawn in a given community to predict the error in estimating similarities from a single sample. As mentioned above, we assumed each community sample to be small enough compared to the source community, so that the individuals making up a sample are drawn from the source community with replacement. Species abundances in a sample made of Nk individuals follow a multinomial distribution with parameters pik, the species probabilities in the reference community. We therefore investigated the influence of multinomial variation on the sampling properties of inline image.We used binary indicator functions (Cormen et al. 2001) of species identity to explore the variations of the similarities, as Lande (1996) did for investigating Simpson’s diversity.

Specifically, for inline image, we established (Appendix S1) that

image(eqn 3A)


image(eqn 3B)

as in Lande (1996).

Appendix S1 further shows that the two other similarity statistics, inline image and inline image, are also unbiased:

image(eqn 3C)
image(eqn 3D)

The sampling properties of the similarity statistics here hold irrespective of any underlying model of species dynamics.

Sampling error on inline image and inline image

From this premise, we turned to investigate the sampling error in the parameters inline image and inline image from their relationships with inline image and the constitutive similarity statistics inline image and inline image, such as inline image, with inline image (from equation 2) and inline image, with inline image.

We denote the derivatives inline image, inline image, inline image, and so on. Using a delta method based on Taylor series expansions (Davison 2003; see Appendix S2), we could obtain approximate expected values and sampling variances of inline image:

image(eqn 4A)
image(eqn 4B)

with inline image, inline image, inline image and inline image.

We assumed the sampling variance of inline image and the covariance of inline image and inline image to be negligible compared with the variance of inline image. The assumption is fairly reasonable, insofar as inline image depends on the sampling variation in the large overall data set (lumped samples), which is automatically an order of magnitude smaller than the sampling variation in each community sample k, as far as there are many samples. Thus, under the assumptions of a network of many small community samples, equations 4A and 4B simplify into inline image and inline image, with inline image and inline image.

Replacing inline image and inline image with the corresponding expected values of the similarity statistics (equations 4A and 4B) yielded the following relationship:


Using equation 1, this simplified into




By denoting inline image, we finally got

image(eqn 5A)

and the estimation of bias

image(eqn 5B)

The quantities inline image and T(k) are central here for characterizing the sampling error because they control both the bias and the variance in estimation of inline image. Using a least-square loss function, the best bias-variance trade-off is obtained by minimizing the mean square error


Recall that inline image to get inline image. Let us consider that P(k/k) = (Nk − 1)/(− 1) primarily depends on the fixed number of sites sampled, Nc = N/Nk, so that inline image. I(k) and the actual similarity statistics (without ‘hats’) are independent from Nk, so that both the estimation bias and variance decrease as 1/Nk; hence,


and inline image.

As a consequence, MSE = O(1/Nk), and the sampling variance is here more constraining for optimization. We further investigated the sampling properties of


by using the delta method, so that


and inline image with


Therefore, inline image and inline image. In this case, MSE = O(1/Nk3), and the faster decrease in estimation error when community samples get large may have been a reason for preferring investigating m(k) in earlier applications of neutral theory (Hubbell 2001; Etienne 2005; Munoz et al. 2007).

The sampling results here are not dependent on any assumption on the nature of species assembly, as we have only analysed nonlinear functions of similarity statistics. At this stage, inline image and inline image can be used as heuristic indexes of community isolation from their common background (similarly to using diversity indexes). But one may also want to further interpret and discuss their nature as immigration parameters in the context of the 3L-SINM, where there is still no a priori expectation for Fintra(k), F3(k) and Fglobal(k). The derivations below are given for the particular case of the 2L-SINM, where we can further specify the nature of the migrant pool and the species abundance distributions therein to predict these values.

Sampling error for the two-level spatially implicit neutral model (2L-SINM)

Let us now consider the well-known case of the 2L-SINM of Hubbell (2001), which is a particular case of the 3L-SINM. We analytically derived the expected probabilities Fn(k) in a given community sample k as explicit functions of the parameters θ and I(k) (Appendix S3). The analytical formulas for the first values of n are

image(eqn 6A)

as presented by Etienne (2005),

image(eqn 6B)
image(eqn 6C)

where Γ is the Gamma function.

The three similarity statistics, Fintra(k), Fglobal(k) and Finter(k), defined above, are linked through the relationship inline image (Munoz, Couteron, & Ramesh 2008). Here, we recall that Finter(k) is the probability that two individuals, one drawn in k and the other in another community, are conspecific. Individuals drawn from distinct communities are descendants of distinct immigrants from the common pool of migrants (Etienne & Olff 2004), and hence, Finter(k) is the probability that two individuals are conspecific in the pool of migrants (Munoz, Couteron, & Ramesh 2008: equation 3). In the context of the 2L-SINM, the pool of migrants is directly the metacommunity (Hubbell 2001); hence, inline image (Ewens 1972) and then

image(eqn 6D)

Under the 2L-SINM, the expected values of Fintra(k), F3(k) and Fglobal(k) are therefore functions of the two fundamental parameters I(k) and θ, and we can derive inline image and inline image by calculating the exact expression of T(k) as a function of I(k) and θ:

image(eqn 7)

Trade-off between sample size and number of samples

The above results allow for the assessment of sampling designs and for delineating a domain of efficient parameter inference. Let us consider, for instance, forest plots in wet evergreen tropical forests, which are a classical field of application of neutral models. Different sampling strategies are possible with sample sizes ranging, for illustration purposes, from 0,1 ha plots including Nk = 50 individuals on average to 1 ha plots including 500 individuals or more (Munoz, Couteron, & Ramesh 2008). The mean square error MSE = Bias2 + Var can be used as an accuracy index measuring the performance of a given sampling design. The smaller is MSE, the better is the estimation of immigration.

In an application of the 2L-SINM of Hubbell (2001), Fig. 2A shows contours of constant MSE when the sample size Nk (abscissa) and the number of samples Nc (ordinate) are varying in a case where = 10 and θ = 50, which are fairly realistic values for tropical forests in South India (Munoz et al. 2007; Munoz, Couteron, & Ramesh 2008). The log10 values of MSE are shown along with the contours. This figure shows that one cannot improve MSE over a certain limit for a given sample size because increasing the number of samples leads to an asymptote in MSE. Specifically, for 0,1-ha forest plots of about Nk = 50, log10(MSE) cannot go below 1,09, while for 1-ha forest plots of about Nk = 500, it can go down to 0,016, and the asymptotic difference in accuracy is a factor of about 12 in absolute scale. Hence, one cannot expect good estimation from community samples that are too small, even if there are many samples. For illustration, consider a fixed overall sampling effort of = 5000 individuals, which can be achieved for different sample sizes Nk and sample numbers Nc such as, for instance, Nc = 50 samples of size Nk = 100 (Fig. 2A, point a), or Nc = 10 samples of size Nk = 500 (Fig. 2A, point b). In the former case, log10(MSE) = 0,8, while in the latter case, log10(MSE) = 0,1, so that the MSE is improved by a factor 4,6 with fewer large samples. Furthermore, the arrow in Fig. 2A indicates that sampling 10 samples of 500 individuals is as accurate as sampling 30 samples of 440 individuals because both designs stand on the same isoline of log10(MSE) = 0,1. In the former case, only 5000 individuals are to be sampled, compared with 13 200 in the latter case. Therefore, it is better for estimation performance to rely on fewer large samples, as long as the assumption that the samples are smaller than the corresponding communities is correct.

Figure 2.

 Trade-off between sample size (Nk) and number of samples (Nc) for estimating immigration, as measured using the MSE statistic of squared bias plus variance. In (A), isolines of constant MSE value are shown with 0·1 increments on log10 scale. (a) and (b) are cases where 5000 individuals are sampled overall using, respectively, 10 samples of 500 individuals and 50 samples of 100 individuals. The arrow shows that 10 samples of 500 individuals (5000 individuals overall) and 30 samples of 440 individuals (13 200 individuals overall) are at the same level of MSE = 0·1. In these cases, it is less expensive to use larger samples. (B) shows the value of Nc on ordinates such as inline image is equal to −0,05 (solid line), −0,01 (dashed line) and −0,005 (dotted line) for a range of local sample sizes on abscissa. For these examples, the statistics were calculated in the context of the 2L-SINM of Hubbell (2001) with I(k) = 10 and θ = 50.

It is also possible to use the derivatives of MSE to investigate the gain in performance when increasing sampling pressure. Let us fix, for instance, a qualitative limit at inline image, above which the gain is arbitrarily judged no longer worth it. Fig. 2B (solid line) shows the corresponding number of samples as a function of Nk. For instance, 24 samples of 50 individuals or seven samples of 500 individuals are located at the same level of inline image, and this illustrates again that the advantage of getting additional samples closely depends on sample size. One may use other isoclines of inline image (e.g. in Fig. 2B, isoclines at −0,01 represented by the dashed line and at −0,005 represented by the dotted line) to reach a desired limit in estimation performance when increasing the sample number. Information on sampling properties is therefore useful for establishing a sampling strategy prior to field work. The approach is analogous, in principle, with using saturation curves and exploratory statistics to fix the size of sampling plots for the assessment of diversity in a local community (Gimaret-Carpentier et al. 1998).

Simulation application

Simulated neutral community samples

We used the two-step simulation algorithm of Munoz et al. (2007, Appendix) in Matlab®(Mathworks, 2004) to simulate neutral community samples complying with the 2L-SINM of Hubbell (2001). First, we used the sequential algorithm of Etienne (2005) to generate a pool of migrant ancestors (migrant pool) as a very large metacommunity sample. The pool was characterized by the biodiversity parameter θ, which controls the speciation-drift equilibrium. We repeated the procedure for two values of θ, 50 and 100, which illustrates a range of biodiversity figures of tree communities in semi-evergreen and evergreen tropical forests (Hubbell 2001; Munoz et al. 2007). For each migrant pool representing a shared biogeographical background (metacommunity), we generated 50 local community samples k with constant local immigration numbers I(k) (see Munoz et al. 2007 for more details). We repeated the procedure to generate 50 independent migrant pools as replicates, for each of which we simulated 50 community samples, so as to get a total of 2500 community samples for each value of I(k). We varied I(k) from 10 to 200 with increments of 10, and we considered two sample sizes: Nk = 400 and 800 (wet evergreen tropical forest plots of about 1 ha usually fall within this range). We ensured that the migrant pools included a significantly larger number of individuals than the derived local community samples, namely 50Nk, and ensured that the results remained consistent when sizes ranging from 25Nk to 1000Nk were used.

Simulated vs. predicted sampling characteristics

For given values of I(k), Nk and θ, we calculated the mean and standard deviation of inline image using equations (5A), (5B) and (7) on the basis of 50 simulated community samples associated with a migrant pool. To avoid spurious effects because of some rare outliers, we calculated 95% trimmed statistics by excluding 2,5% of observations from each side. We repeated the procedure for each of the fifty replicate migrant pools and for each of the 20 values of I(k). Figure 3 shows the departures of observed bias and variance from our theoretical predictions. The results are presented for three combinations of parameters: θ = 50, Nk = 400 (a and b); θ = 50, Nk = 800 (c and d); and θ = 100, Nk = 400 (e and f).

Figure 3.

 The differences divided by I(k) between simulated and analytical values of the mean (ΔMean; a, c, e) and standard deviation (ΔSTD; b,d,f) of inline image, as a function of I(k) (abscissa). The results are provided for three simulated data sets that comply with the 2L-SINM of Hubbell (2001), with parameters θ = 50, Nk = 400 (a, b); θ = 50, Nk = 800 (c, d) and θ = 100, Nk = 400 (e, f). There are 50 replicate values for each theoretical I(k).

We conducted extensive simulation experiments (results not shown) to verify that the sampling properties remained consistent for varying sizes of the pool of migrants (from 25Nk to 1000Nk) and for varying numbers of community samples (from 50 to 250 samples) associated with a migrant pool. In spite of the approximations made, our analytical predictions of sampling variance and mean of the immigration parameter were globally consistent with simulation results (Fig. 3). The relative difference between simulated and analytical inline image (i.e. ΔMean/I(k)) wavered around 0 over the range of I(k) values, showing good agreement for varying configurations of θ and Nk (Fig. 3a,c,e). We still noted that the width of the scatter increased at values of I(k) above 75–100. On the other hand, despite a good overall fit, the sampling standard deviation was slightly underestimated at small I(k) (ca. under 75; Fig. 3b,d,f; ΔSTD/I(k)>0). We found quite similar results for inline image (results not shown).


Our central result here is that estimating immigration parameters I(k) and m(k) is asymptotically unbiased for large enough samples, and it is an important and original result as far as the novel conditional version GST(k) of GST is concerned. The unconditional FST and related statistics, such as GST, have been widely investigated and debated in population genetics, especially about getting reliable enough estimates (Rottenstreich et al., 2007; Guillot, 2010) and subsequently getting a correct inference of migration (Whitlock and McCauley, 1999). A major interest of using the conditional form GST(k) to estimate immigration parameters I(k) is that it overcomes the problem of averaging immigration effects across all the communities, which Whitlock and McCauley (1999) pointed out for the unconditional FST-based approach.

Because GST(k) and I(k) are nonlinear functions of similarity statistics of analytically established sampling variances, we performed the statistical analysis of their sampling accuracy using appropriate Taylor series expansions (delta method) to avoid cumbersome formulas and computation (Appendix S2). The delta method basically assumes that the sampling statistics are dominated by lower-order terms of the Taylor series (variances and covariances), and we further neglected the variance of Fglobal(k) and the covariance of Fglobal(k) and Fintra(k) against the variance of Fintra(k). The simulation application for community samples complying with Hubbell’s model confirmed that the analytical results based on these approximations were reliable enough for a range of realistic parameters (sample size and values of I(k) as from surveys in wet evergreen tropical forests). To such extent, the variance of the estimation is mostly sensitive to the variation in local similarity, Fintra(k). This result may be robust against departures from Hubbell’s model, and to verify it, simulations studies are still needed, but these are beyond the scope of this paper.

In a broad perspective, the bias and variance formulas of equations 4 and 5 are independent from any underlying model of community dynamics, and notably, their applicability is not restricted to neutral models. As such, I(k) and m(k) can be used as descriptive indexes of community isolation, complementary to diversity statistics. We may also note that our results (equations 4 and 5) are straightforwardly applicable to the D statistic proposed by Jost (2008) as an alternative to GST. In practice, the variation of I(k) values over a set of sampling sites can be compared to external environmental information (e.g. geology, climate) as to detect possible environmental filtering of community composition. For this, the analytical results on the sampling variance of I(k) will become useful to design tests of departures from a regional mean. In this regard, a perspective will be to investigate in greater details the distribution of inline image to provide further insights into the confidence limits.

Furthermore, the mean square error formula (MSE) based on sampling bias and variance offers a synthetic measure of estimation performance, which is of practical interest for designing efficient sampling schemes (illustration in Fig. 2). In our application, estimation performance as measured by MSE reaches a plateau when the size and/or the number of community samples is increased. The isolines of MSE in Fig. 2A can be used to select the appropriate sample number and size for a desired level of accuracy, which can provide guidance in the design and evaluation of sampling schemes. It is quite similar to designing sampling schemes to correctly estimate species alpha and beta diversities in communities (Gimaret-Carpentier et al. 1998).

A further important message for community ecologists is that sampling issues are central to the analysis of community composition, when stochastic processes and estimation error are intertwined. Therefore, one should take into account both (i) the sampling nature of the neutral theory (Etienne & Alonso 2005), and more generally of the fundamental concept of composition drift (a process that is likely to be pervasive even when interacting with non-neutral processes); and (ii) the sampling error in parameter estimation as in any inference process.

Finally, the strict dichotomy of neutral vs. non-neutral models is progressively vanishing and recent works have suggested that the scope of neutral models should be enlarged and that the predictions on relative species abundances are robust (Allouche & Kadmon 2009; Noble et al. 2010). Several authors (Hubbell 2006; Zillio & Condit 2007) argued that the assumption of ecological equivalence of individuals is realistic insofar as the trade-off between life traits in a group of trophically similar species (a guild) may produce comparable levels of fitness. Furthermore, the three-level spatially implicit neutral model of Munoz, Couteron, & Ramesh (2008) illustrates how non-neutral processes at regional scale can be incorporated in the traditional neutral approach. Hence, this 3L-SINM framework may be applied to a variety of contexts that have to do with community isolation, including anthropogenic fragmentation in tropical forests, and it may become a useful approach for conservation issues (Pearse & Crandall 2004).


We warmly thank the editor and an anonymous reviewer for their valuable comments and suggestions.