Correspondence site: http://www.respond2articles.com/MEE/

# Estimating immigration in neutral communities: theoretical and practical insights into the sampling properties

Article first published online: 12 JUL 2011

DOI: 10.1111/j.2041-210X.2011.00133.x

© 2011 The Authors. Methods in Ecology and Evolution © 2011 British Ecological Society

Additional Information

#### How to Cite

Munoz, F. and Couteron, P. (2012), Estimating immigration in neutral communities: theoretical and practical insights into the sampling properties. Methods in Ecology and Evolution, 3: 152–161. doi: 10.1111/j.2041-210X.2011.00133.x

#### Publication History

- Issue published online: 1 FEB 2012
- Article first published online: 12 JUL 2011
- Received 20 October 2010; accepted 22 May 2011 Handling Editor: Robert Freckleton

### Keywords:

- bias;
- community isolation;
*G*_{ST}(*k*) statistic;- immigration;
- spatially implicit neutral model;
- unified neutral theory of biodiversity and biogeography;
- variance

### Summary

- Top of page
- Summary
- Introduction
- Methodological background
- Analytical results
- Simulation application
- Discussion
- Acknowledgements
- References
- Supporting Information

**1.** Widening applications of neutral models of communities necessitates mastering the process of inferring parameters from species composition data. In a previous paper, we introduced the novel conditional *G*_{ST}(*k*) statistic based on community composition. We showed that it is a reliable basis for assessing migrant fluxes into local communities under a generalized version of the spatially implicit neutral model of SP Hubbell, which can accommodate non-neutral patterns at scales broader than the communities.

**2.** We provide here new insights into the sampling properties of the *G*_{ST}(*k*) statistic and on the derived immigration number, *I*(*k*). The analytical formulas for bias and variance are useful to assess estimation accuracy and investigate the variation of *I*(*k*) across communities.

**3.** Immigration estimation is asymptotically unbiased as sample size increases. We confirm the validity of our analytical results on the basis of simulated neutral communities.

**4.** We also underline the potential of using *I*(*k*) as a descriptive index of community isolation, without reference to any model of community dynamics.

**5.** We further propose a practical application of the bias and variance analysis for defining sampling designs for immigration quantification by efficiently balancing the number and size of community samples.

### Introduction

- Top of page
- Summary
- Introduction
- Methodological background
- Analytical results
- Simulation application
- Discussion
- Acknowledgements
- References
- Supporting Information

Since the seminal work of Hubbell (2001), the neutral theory of biodiversity has raised much interest by providing a nontrivial null hypothesis against which the nature of real-world biodiversity patterns may be tested and discussed (McGill 2003; Nee & Stone 2003; Volkov *et al.* 2003; Leigh 2007). Under the key assumption of ‘ecological equivalence’, namely that individuals have equal fitness whatever their species or habitat, Hubbell (2001) and followers (Etienne 2005) proposed a two-level spatially implicit neutral model (so-called 2L-SINM by Munoz, Couteron, & Ramesh 2008; see a synthesis in Beeravolu *et al.* 2009 and present Fig. 1a left) to decouple the dynamics of the large biogeographical background (i.e. the metacommunity), in which the effect of infrequent speciation and extinction events is quantified via the biodiversity parameter *θ* (speciation-drift balance), from local community dynamics, which is controlled by the immigration parameter, *I* (migration-drift balance, Etienne & Olff 2004). The theory has found some support from the investigation of very rich tree communities in wet evergreen tropical forests, where alternative models were not satisfactory for explaining some observed diversity patterns, such as the overrepresentation of rare species in the species abundance distributions (SAD) (Hubbell 2001; Chave, Alonso, & Etienne 2006). From a theoretical standpoint, neutral models can help us grasp how the interaction between limited immigration and local ‘ecological drift’ (Hubbell 2001) may influence taxonomic composition. Munoz, Couteron, & Ramesh (2008) have further proposed to relax the assumptions made by Hubbell (2001) at the upper level (i.e. the metacommunity) and to generalize the initial 2L-SINM into a 3L-SINM, by introducing an intermediate level, the regional pool of migrants, which may be shaped by non-neutral influences (Fig. 1a right). Under the 3L-SINM, the study of the local migration-drift equilibrium can be disconnected from non-neutral influences occurring at larger scales, via the estimation of the immigration parameter *I*.

Although inferring Hubbell’s parameters motivated a great deal of research in recent years leading to several estimation methods (Volkov *et al.* 2003; Etienne 2005, 2009; Munoz *et al.* 2007; Munoz, Couteron, & Ramesh 2008), the rapidly growing number of applications to real-world data sets has raised questions about how to assess the reliability of the inference, even in the strictly neutral case. Furthermore, published methods have frequently led to substantially differing parameter estimates when applied to the same data sets (see for instance Latimer, Silander, & Cowling 2005; Etienne *et al.* 2006; Etienne 2009, Table 1). The lack of knowledge on confidence limits has discouraged wider and more systematic applications of neutral theory through meta-analyses on parameter values. It is also hindering the development and application of more realistic models of community assemblages. Our aim is to provide such a statistically informed perspective on the estimation of the immigration parameter under the 3L-SINM.

The immigration parameter *I* is here central and represents the number of potential migrants competing with resident offspring at each mortality event within the community, and as such, it is closely related to Hubbell’s (2001) migration probability, (*N* notes the size of a local community). The neutral theory basically invokes ‘dispersal limitation’ from the metacommunity to the local community to define *I* and *m* (Hubbell 2001; Etienne 2005). But when these parameters are estimated from real-world data, they may also include the effects of many processes that altogether determine the relative isolation and differentiation of the local community from its biogeographical background. *I* and *m* could therefore be taken as an integrative measure of community isolation and differentiation, even though the strictly neutral ‘dispersal limitation’ invoked by Hubbell is not the only process shaping their values. An estimation of those immigration parameters could therefore be especially useful for quantifying how the fragmentation of threatened habitats (e.g. tropical forests) may influence the diversity in isolated patches taken as different local communities. More generally, analyses comparing consistent estimates of *I* and *m* across biogeographical regions and biomes would be a valuable contribution to community ecology and conservation biology by providing a measure of apparent fragmentation, which could be related to the ecological, physiographical and historical peculiarities of the different regions. This research programme requires increasing our knowledge about the sampling properties of *I* and *m*, which is the purpose of the present paper.

Neutral models used for inference of *I* and *m* initially focused on species abundances in a single community (e.g. Hubbell 2001; Etienne 2005; Latimer, Silander, & Cowling 2005), while most recent work (Etienne 2007, 2009; Munoz *et al.* 2007; Munoz, Couteron, & Ramesh 2008; Jabot & Chave 2009) has allowed estimating neutral parameters from species compositions in a set of spatially scattered sampled communities, and provide estimations of *I*(*k*) and *m*(*k*) for each sample *k*. In this regard, Munoz, Couteron, & Ramesh (2008) have proposed a conditional version of the classical *G*_{ST} statistic from population genetics (Nei 1973; Takahata & Nei 1984; Slatkin 1985), *G*_{ST}(*k*), and have showed that *I*(*k*) can be inferred from *G*_{ST}(*k*) under the more general three-level spatially implicit neutral model (3L-SINM, Munoz, Couteron, & Ramesh 2008; Beeravolu *et al.* 2009). Although this *G*_{ST}(*k*)-based estimation of neutral immigration performed well on the basis of simulated neutral communities (Munoz, Couteron, & Ramesh 2008), there is still a lack of theoretical knowledge regarding the sampling properties of the method (as for all of the published alternative methods). We bring here new insights into the estimation of immigration based on the *G*_{ST}(*k*) statistic, by (i) providing analytical formulas, independent of any community model, for bias and variance of the similarity statistics underlying the computation of *G*_{ST}(*k*) and *I*(*k*) and (ii) analysing the sensitivity of the estimation process to variation in the target parameters and to the main features of the data, such as the number of samples and the sample sizes. These results will yield practical insights for sampling design by allowing for the choice of a favourable trade-off in the number and size of community samples, according to a desired level of estimation accuracy.

### Methodological background

- Top of page
- Summary
- Introduction
- Methodological background
- Analytical results
- Simulation application
- Discussion
- Acknowledgements
- References
- Supporting Information

Let us consider *N*_{c} distinct ecological communities indexed by *k*. The probability that an individual in community *k* is from species *i* is denoted by *p*_{ik}, where *i *=* *1…*S*, and *p*_{i} is the probability that an individual in the lumped set of communities is from species *i*. A sample is drawn from each community and includes a collection of individuals identified at a consistent taxonomic level (e.g. species), such as tree census plots scattered in a tropical forest. In practice, we assume that the samples are far enough away from one another so as to belong to distinct communities (see Munoz, Couteron, & Ramesh 2008). This allows attributing the same index, 1 ≤ *k *≤ *N*_{c}, to both the source communities and the corresponding samples. Sample *k* is then made of *N*_{k} individuals belonging to *S* species, and it thereby includes *N*_{ik} individuals of each arbitrary species *i*. The lumped set of community samples further includes *N*_{i} individuals of each species *i* summing to *N* individuals.

The classical statistic *G*_{ST} (Nei 1973) is the ratio of beta diversity over gamma diversity, to use ecological concepts, which was defined from similarity statistics that directly relate to the Simpson’s diversity indexes (Simpson 1949; Couteron & Pélissier 2004). *G*_{ST} is thereby a standardized measure of the variation in composition across samples. The conditional version, *G*_{ST}(*k*), was proposed by Munoz, Couteron, & Ramesh (2008) to measure this information relative to each local community sample. It integrates local similarity *F*_{intra}(*k*), which is the probability of drawing two conspecific individuals conditional on drawing them in sample *k*, and global similarity *F*_{global}(*k*), which is the probability of drawing two conspecific individuals conditional on drawing the first in sample *k* and the other in the overall data set. We also denoted *F*_{inter}(*k*) as the probability of drawing two conspecific individuals conditional on drawing the first one in sample *k* and the other one in another sample distinct of *k*. We will finally consider the probability of simultaneously drawing *n* individuals that belong to the same species conditional on drawing them in sample *k*, denoted as *F*_{n}(*k*). Specifically, *F*_{intra}(*k*) =* F*_{2}(*k*).

#### The spatially implicit neutral models and the *G*_{ST}(*k*) statistic

A spatially implicit neutral model (SINM) represents the dynamics of local community composition, consisting of speciation, extinction and immigration events, without any explicit reference to the relative distances between communities. This family of models was introduced in neutral community ecology by Hubbell (2001). It is intrinsically hierarchical and aims to decouple biogeographical and ecological scales. Indeed, evolutionary, biogeographical and ecological processes together influence species diversity at different temporal and spatial scales, and hierarchical models can therefore be relevant and helpful for representing such nested influences (Beeravolu *et al.* 2009). In the two-level spatially implicit neutral model (2L-SINM, Fig. 1a left) of Hubbell (2001), species abundances in the large-scale metacommunity are driven by the balance between speciation and extinction events (speciation-drift equilibrium), under the control of the biodiversity parameter *θ*. At the scale of a local community, species abundances are shaped by a balance between immigration from the metacommunity and local extinction (migration-drift equilibrium). The immigration parameter *I* is the number of immigrants that compete with local offspring when an individual dies in the local community. This important parameter was first named the ‘fundamental dispersal number’ (Etienne & Alonso 2005), but the concept can be enlarged to include other sources of immigration limitation from the background region (Munoz, Couteron, & Ramesh 2008). Munoz *et al.* (2007) showed that the 2L-SINM of Hubbell can accommodate varying immigration conditions across local communities *k*, by allowing varying values of *I*(*k*), and several recent approaches have been proposed to estimate *I*(*k*) in the presence of this variation (Etienne 2007, 2009; Munoz, Couteron, & Ramesh 2008; Jabot & Chave 2009).

In the three-level spatially implicit neutral model (3L-SINM, Fig. 1a right) of Munoz, Couteron, & Ramesh (2008), the larger-scale source of immigrants is not necessarily considered to be a metacommunity at speciation-drift equilibrium because the model allows for various processes to occur at an intermediate scale, thereby providing greater flexibility for representing real-world patterns and migration pathways (Fig. 1 in Munoz, Couteron, & Ramesh 2008). In this case, contrary to the 2L-SINM, there is no assumption about the SAD of the immigrants. In the context of the 3L-SINM, Munoz, Couteron, & Ramesh (2008) introduced the computation of and showed that *G*_{ST}(*k*) directly relates to the immigration parameter, *I*(*k*), of the source community *k*, through the relationship:

- (eqn 1)

Here, *P*(*k*/*k*) is the probability of drawing an individual in sample *k* conditional on first drawing an individual in *k*, which is measured as . Applications to simulated neutral communities showed that exact estimators of the similarities,

and

are to be used to get unbiased estimates of the immigration parameters on the basis of equation 1 (Munoz, Couteron, & Ramesh 2008):

- (eqn 2)

The corresponding immigration rates (i.e. the migration probability of Hubbell 2001) can be estimated using .

#### Sampling properties

A key issue is to investigate in more detail the sampling properties of (equation 2) so as to gain insights about its bias and variance and to formally assess the degree of confidence in its estimation. The word ‘sampling’ here conceptually encompasses two distinct sources of variation.

First, the neutral theory of Hubbell (2001) allows for random fluctuations in species abundances owing to the finite sizes of the community (local drift) and of the metacommunity (global drift). Influxes of new species (speciation) in the metacommunity and of immigrants in local communities are necessary to avoid species fixation and monodominance, and to maintain species diversity. The stochastic nature of the replacement of dead individuals was shown to be strictly analogous to a sampling process (Etienne & Alonso 2005), so that Etienne (2005) and followers could derive exact sampling formulas for species abundances in local community samples *k* (Etienne 2005, 2007; Munoz, Couteron, & Ramesh 2008; Noble *et al.* 2010). The relationship between *G*_{ST}(*k*) and *I*(*k*) in equation 1 holds for any sample taken in a given community (Fig. 1b, part above the dashed line; see also Munoz, Couteron, & Ramesh 2008).

Second, although equation 1 is exact for any sample in community *k*, it only provides an estimate of *I*(*k*), denoted , which is marked by random fluctuations in the constitutive similarity statistics within *G*_{ST}(*k*), because the species frequencies in samples are themselves random variables. Thus, the *I*(*k*) notation embodies the sampling variation because of ecological drift in neutral theory (equation 1 and Fig. 1b, above the line), while further includes the sampling variation because of estimation error of similarity statistics (equation 2 and Fig. 1b, below the dashed line). The latter source of sampling error is characterized by the ‘hat’ notation and is the focus of the present work. We intend to gain insights into this sampling variation by applying a multinomial model of sample draws from the corresponding source community. This model requires assuming, as a first approximation, that the communities are large enough compared to the samples to allow the drawing of samples from the communities with replacement.

### Analytical results

- Top of page
- Summary
- Introduction
- Methodological background
- Analytical results
- Simulation application
- Discussion
- Acknowledgements
- References
- Supporting Information

We investigated the sampling properties of and , using a delta method for calculating approximate sampling means and variances (Davison 2003). We then provided further analytical results for in the particular case of the two-level version (2L-SINM). We used Mathematica ® (Wolfram, 2003) for Taylor series expansions and investigation of the limit cases.

#### Sampling variation in similarity statistics

We investigated the variation among many samples successively drawn in a given community to predict the error in estimating similarities from a single sample. As mentioned above, we assumed each community sample to be small enough compared to the source community, so that the individuals making up a sample are drawn from the source community with replacement. Species abundances in a sample made of *N*_{k} individuals follow a multinomial distribution with parameters *p*_{ik}, the species probabilities in the reference community. We therefore investigated the influence of multinomial variation on the sampling properties of .We used binary indicator functions (Cormen *et al.* 2001) of species identity to explore the variations of the similarities, as Lande (1996) did for investigating Simpson’s diversity.

Appendix S1 further shows that the two other similarity statistics, and , are also unbiased:

- (eqn 3C)

- (eqn 3D)

The sampling properties of the similarity statistics here hold irrespective of any underlying model of species dynamics.

#### Sampling error on and

From this premise, we turned to investigate the sampling error in the parameters and from their relationships with and the constitutive similarity statistics and , such as , with (from equation 2) and , with .

We denote the derivatives , , , and so on. Using a delta method based on Taylor series expansions (Davison 2003; see Appendix S2), we could obtain approximate expected values and sampling variances of :

- (eqn 4A)

- (eqn 4B)

with , , and .

We assumed the sampling variance of and the covariance of and to be negligible compared with the variance of . The assumption is fairly reasonable, insofar as depends on the sampling variation in the large overall data set (lumped samples), which is automatically an order of magnitude smaller than the sampling variation in each community sample *k*, as far as there are many samples. Thus, under the assumptions of a network of many small community samples, equations 4A and 4B simplify into and , with and .

Replacing and with the corresponding expected values of the similarity statistics (equations 4A and 4B) yielded the following relationship:

Using equation 1, this simplified into

Likewise,

By denoting , we finally got

- (eqn 5A)

and the estimation of bias

- (eqn 5B)

The quantities and *T*(*k*) are central here for characterizing the sampling error because they control both the bias and the variance in estimation of . Using a least-square loss function, the best bias-variance trade-off is obtained by minimizing the mean square error

Recall that to get . Let us consider that *P*(*k*/*k*) = (*N*_{k} − 1)/(*N *− 1) primarily depends on the fixed number of sites sampled, *N*_{c} = *N*/*N*_{k}, so that . *I*(*k*) and the actual similarity statistics (without ‘hats’) are independent from *N*_{k}, so that both the estimation bias and variance decrease as 1/*N*_{k}; hence,

and .

As a consequence, MSE = *O*(1/*N*_{k}), and the sampling variance is here more constraining for optimization. We further investigated the sampling properties of

by using the delta method, so that

and with

Therefore, and . In this case, MSE = *O*(1/*N*_{k}^{3}), and the faster decrease in estimation error when community samples get large may have been a reason for preferring investigating *m*(*k*) in earlier applications of neutral theory (Hubbell 2001; Etienne 2005; Munoz *et al.* 2007).

The sampling results here are not dependent on any assumption on the nature of species assembly, as we have only analysed nonlinear functions of similarity statistics. At this stage, and can be used as heuristic indexes of community isolation from their common background (similarly to using diversity indexes). But one may also want to further interpret and discuss their nature as immigration parameters in the context of the 3L-SINM, where there is still no a priori expectation for *F*_{intra}(*k*), *F*_{3}(*k*) and *F*_{global}(*k*). The derivations below are given for the particular case of the 2L-SINM, where we can further specify the nature of the migrant pool and the species abundance distributions therein to predict these values.

#### Sampling error for the two-level spatially implicit neutral model (2L-SINM)

Let us now consider the well-known case of the 2L-SINM of Hubbell (2001), which is a particular case of the 3L-SINM. We analytically derived the expected probabilities *F*_{n}(*k*) in a given community sample *k* as explicit functions of the parameters *θ* and *I*(*k*) (Appendix S3). The analytical formulas for the first values of *n* are

- (eqn 6A)

where Γ is the Gamma function.

The three similarity statistics, *F*_{intra}(*k*), *F*_{global}(*k*) and *F*_{inter}(*k*), defined above, are linked through the relationship (Munoz, Couteron, & Ramesh 2008). Here, we recall that *F*_{inter}(*k*) is the probability that two individuals, one drawn in *k* and the other in another community, are conspecific. Individuals drawn from distinct communities are descendants of distinct immigrants from the common pool of migrants (Etienne & Olff 2004), and hence, *F*_{inter}(*k*) is the probability that two individuals are conspecific in the pool of migrants (Munoz, Couteron, & Ramesh 2008: equation 3). In the context of the 2L-SINM, the pool of migrants is directly the metacommunity (Hubbell 2001); hence, (Ewens 1972) and then

- (eqn 6D)

Under the 2L-SINM, the expected values of *F*_{intra}(*k*), *F*_{3}(*k*) and *F*_{global}(*k*) are therefore functions of the two fundamental parameters *I*(*k*) and *θ*, and we can derive and by calculating the exact expression of *T*(*k*) as a function of *I*(*k*) and *θ*:

- (eqn 7)

#### Trade-off between sample size and number of samples

The above results allow for the assessment of sampling designs and for delineating a domain of efficient parameter inference. Let us consider, for instance, forest plots in wet evergreen tropical forests, which are a classical field of application of neutral models. Different sampling strategies are possible with sample sizes ranging, for illustration purposes, from 0,1 ha plots including *N*_{k} = 50 individuals on average to 1 ha plots including 500 individuals or more (Munoz, Couteron, & Ramesh 2008). The mean square error MSE = Bias^{2} + Var can be used as an accuracy index measuring the performance of a given sampling design. The smaller is MSE, the better is the estimation of immigration.

In an application of the 2L-SINM of Hubbell (2001), Fig. 2A shows contours of constant MSE when the sample size *N*_{k} (abscissa) and the number of samples *N*_{c} (ordinate) are varying in a case where *I *= 10 and *θ *= 50, which are fairly realistic values for tropical forests in South India (Munoz *et al.* 2007; Munoz, Couteron, & Ramesh 2008). The log10 values of MSE are shown along with the contours. This figure shows that one cannot improve MSE over a certain limit for a given sample size because increasing the number of samples leads to an asymptote in MSE. Specifically, for 0,1-ha forest plots of about *N*_{k} = 50, log10(MSE) cannot go below 1,09, while for 1-ha forest plots of about *N*_{k} = 500, it can go down to 0,016, and the asymptotic difference in accuracy is a factor of about 12 in absolute scale. Hence, one cannot expect good estimation from community samples that are too small, even if there are many samples. For illustration, consider a fixed overall sampling effort of *N *=* *5000 individuals, which can be achieved for different sample sizes *N*_{k} and sample numbers *N*_{c} such as, for instance, *N*_{c} = 50 samples of size *N*_{k} = 100 (Fig. 2A, point a), or *N*_{c} = 10 samples of size *N*_{k} = 500 (Fig. 2A, point b). In the former case, log10(MSE) = 0,8, while in the latter case, log10(MSE) = 0,1, so that the MSE is improved by a factor 4,6 with fewer large samples. Furthermore, the arrow in Fig. 2A indicates that sampling 10 samples of 500 individuals is as accurate as sampling 30 samples of 440 individuals because both designs stand on the same isoline of log10(MSE) = 0,1. In the former case, only 5000 individuals are to be sampled, compared with 13 200 in the latter case. Therefore, it is better for estimation performance to rely on fewer large samples, as long as the assumption that the samples are smaller than the corresponding communities is correct.

It is also possible to use the derivatives of MSE to investigate the gain in performance when increasing sampling pressure. Let us fix, for instance, a qualitative limit at , above which the gain is arbitrarily judged no longer worth it. Fig. 2B (solid line) shows the corresponding number of samples as a function of *N*_{k}. For instance, 24 samples of 50 individuals or seven samples of 500 individuals are located at the same level of , and this illustrates again that the advantage of getting additional samples closely depends on sample size. One may use other isoclines of (e.g. in Fig. 2B, isoclines at −0,01 represented by the dashed line and at −0,005 represented by the dotted line) to reach a desired limit in estimation performance when increasing the sample number. Information on sampling properties is therefore useful for establishing a sampling strategy prior to field work. The approach is analogous, in principle, with using saturation curves and exploratory statistics to fix the size of sampling plots for the assessment of diversity in a local community (Gimaret-Carpentier *et al.* 1998).

### Simulation application

- Top of page
- Summary
- Introduction
- Methodological background
- Analytical results
- Simulation application
- Discussion
- Acknowledgements
- References
- Supporting Information

#### Simulated neutral community samples

We used the two-step simulation algorithm of Munoz *et al.* (2007, Appendix) in Matlab^{®}(Mathworks, 2004) to simulate neutral community samples complying with the 2L-SINM of Hubbell (2001). First, we used the sequential algorithm of Etienne (2005) to generate a pool of migrant ancestors (migrant pool) as a very large metacommunity sample. The pool was characterized by the biodiversity parameter *θ*, which controls the speciation-drift equilibrium. We repeated the procedure for two values of *θ*, 50 and 100, which illustrates a range of biodiversity figures of tree communities in semi-evergreen and evergreen tropical forests (Hubbell 2001; Munoz *et al.* 2007). For each migrant pool representing a shared biogeographical background (metacommunity), we generated 50 local community samples *k* with constant local immigration numbers *I*(*k*) (see Munoz *et al.* 2007 for more details). We repeated the procedure to generate 50 independent migrant pools as replicates, for each of which we simulated 50 community samples, so as to get a total of 2500 community samples for each value of *I*(*k*). We varied *I*(*k*) from 10 to 200 with increments of 10, and we considered two sample sizes: *N*_{k} = 400 and 800 (wet evergreen tropical forest plots of about 1 ha usually fall within this range). We ensured that the migrant pools included a significantly larger number of individuals than the derived local community samples, namely 50*N*_{k}, and ensured that the results remained consistent when sizes ranging from 25*N*_{k} to 1000*N*_{k} were used.

#### Simulated vs. predicted sampling characteristics

For given values of *I*(*k*), *N*_{k} and *θ*, we calculated the mean and standard deviation of using equations (5A), (5B) and (7) on the basis of 50 simulated community samples associated with a migrant pool. To avoid spurious effects because of some rare outliers, we calculated 95% trimmed statistics by excluding 2,5% of observations from each side. We repeated the procedure for each of the fifty replicate migrant pools and for each of the 20 values of *I*(*k*). Figure 3 shows the departures of observed bias and variance from our theoretical predictions. The results are presented for three combinations of parameters: *θ *= 50, *N*_{k} = 400 (a and b); *θ *= 50, *N*_{k} = 800 (c and d); and *θ *= 100, *N*_{k} = 400 (e and f).

We conducted extensive simulation experiments (results not shown) to verify that the sampling properties remained consistent for varying sizes of the pool of migrants (from 25*N*_{k} to 1000*N*_{k}) and for varying numbers of community samples (from 50 to 250 samples) associated with a migrant pool. In spite of the approximations made, our analytical predictions of sampling variance and mean of the immigration parameter were globally consistent with simulation results (Fig. 3). The relative difference between simulated and analytical (i.e. ΔMean/*I*(*k*)) wavered around 0 over the range of *I*(*k*) values, showing good agreement for varying configurations of *θ* and *N*_{k} (Fig. 3a,c,e). We still noted that the width of the scatter increased at values of *I*(*k*) above 75–100. On the other hand, despite a good overall fit, the sampling standard deviation was slightly underestimated at small *I*(*k*) (ca. under 75; Fig. 3b,d,f; ΔSTD/*I*(*k*)>0). We found quite similar results for (results not shown).

### Discussion

- Top of page
- Summary
- Introduction
- Methodological background
- Analytical results
- Simulation application
- Discussion
- Acknowledgements
- References
- Supporting Information

Our central result here is that estimating immigration parameters *I*(*k*) and *m*(*k*) is asymptotically unbiased for large enough samples, and it is an important and original result as far as the novel conditional version *G*_{ST}(*k*) of *G*_{ST} is concerned. The unconditional *F*_{ST} and related statistics, such as *G*_{ST}, have been widely investigated and debated in population genetics, especially about getting reliable enough estimates (Rottenstreich *et al.*, 2007; Guillot, 2010) and subsequently getting a correct inference of migration (Whitlock and McCauley, 1999). A major interest of using the conditional form *G*_{ST}(*k*) to estimate immigration parameters *I*(*k*) is that it overcomes the problem of averaging immigration effects across all the communities, which Whitlock and McCauley (1999) pointed out for the unconditional *F*_{ST}-based approach.

Because *G*_{ST}(*k*) and *I*(*k*) are nonlinear functions of similarity statistics of analytically established sampling variances, we performed the statistical analysis of their sampling accuracy using appropriate Taylor series expansions (delta method) to avoid cumbersome formulas and computation (Appendix S2). The delta method basically assumes that the sampling statistics are dominated by lower-order terms of the Taylor series (variances and covariances), and we further neglected the variance of *F*_{global}(*k*) and the covariance of *F*_{global}(*k*) and *F*_{intra}(*k*) against the variance of *F*_{intra}(*k*). The simulation application for community samples complying with Hubbell’s model confirmed that the analytical results based on these approximations were reliable enough for a range of realistic parameters (sample size and values of *I*(*k*) as from surveys in wet evergreen tropical forests). To such extent, the variance of the estimation is mostly sensitive to the variation in local similarity, *F*_{intra}(*k*). This result may be robust against departures from Hubbell’s model, and to verify it, simulations studies are still needed, but these are beyond the scope of this paper.

In a broad perspective, the bias and variance formulas of equations 4 and 5 are independent from any underlying model of community dynamics, and notably, their applicability is not restricted to neutral models. As such, *I*(*k*) and *m*(*k*) can be used as descriptive indexes of community isolation, complementary to diversity statistics. We may also note that our results (equations 4 and 5) are straightforwardly applicable to the *D* statistic proposed by Jost (2008) as an alternative to *G*_{ST}. In practice, the variation of *I*(*k*) values over a set of sampling sites can be compared to external environmental information (e.g. geology, climate) as to detect possible environmental filtering of community composition. For this, the analytical results on the sampling variance of *I*(*k*) will become useful to design tests of departures from a regional mean. In this regard, a perspective will be to investigate in greater details the distribution of to provide further insights into the confidence limits.

Furthermore, the mean square error formula (MSE) based on sampling bias and variance offers a synthetic measure of estimation performance, which is of practical interest for designing efficient sampling schemes (illustration in Fig. 2). In our application, estimation performance as measured by MSE reaches a plateau when the size and/or the number of community samples is increased. The isolines of MSE in Fig. 2A can be used to select the appropriate sample number and size for a desired level of accuracy, which can provide guidance in the design and evaluation of sampling schemes. It is quite similar to designing sampling schemes to correctly estimate species alpha and beta diversities in communities (Gimaret-Carpentier *et al.* 1998).

A further important message for community ecologists is that sampling issues are central to the analysis of community composition, when stochastic processes and estimation error are intertwined. Therefore, one should take into account both (i) the sampling nature of the neutral theory (Etienne & Alonso 2005), and more generally of the fundamental concept of composition drift (a process that is likely to be pervasive even when interacting with non-neutral processes); and (ii) the sampling error in parameter estimation as in any inference process.

Finally, the strict dichotomy of neutral vs. non-neutral models is progressively vanishing and recent works have suggested that the scope of neutral models should be enlarged and that the predictions on relative species abundances are robust (Allouche & Kadmon 2009; Noble *et al.* 2010). Several authors (Hubbell 2006; Zillio & Condit 2007) argued that the assumption of ecological equivalence of individuals is realistic insofar as the trade-off between life traits in a group of trophically similar species (a guild) may produce comparable levels of fitness. Furthermore, the three-level spatially implicit neutral model of Munoz, Couteron, & Ramesh (2008) illustrates how non-neutral processes at regional scale can be incorporated in the traditional neutral approach. Hence, this 3L-SINM framework may be applied to a variety of contexts that have to do with community isolation, including anthropogenic fragmentation in tropical forests, and it may become a useful approach for conservation issues (Pearse & Crandall 2004).

### Acknowledgements

- Top of page
- Summary
- Introduction
- Methodological background
- Analytical results
- Simulation application
- Discussion
- Acknowledgements
- References
- Supporting Information

We warmly thank the editor and an anonymous reviewer for their valuable comments and suggestions.

### References

- Top of page
- Summary
- Introduction
- Methodological background
- Analytical results
- Simulation application
- Discussion
- Acknowledgements
- References
- Supporting Information

- 2009) A general framework for neutral models of community dynamics. Ecology Letters, 12, 1287–1297. & (
- 2009) Studying ecological communities from a neutral standpoint: a review of models’ structure and parameter estimation. Ecological Modelling, 220, 2603–2610. , , & (
- 2006) Comparing models of species abundance. Nature, 441, E1. , & (
- 2001) Indicator random variables. Introduction to Algorithms, 2nd edn, pp. 94–99. MIT Press and McGraw-Hill, Cambridge. , , & (
- 2004) Additive apportioning of species diversity: towards more sophisticated models and analyses. Oikos, 107, 215–221. & (
- 2003) Statistical Models. Cambridge University Press, Cambridge, UK. (
- 2005) A new sampling formula for neutral biodiversity. Ecology Letters, 8, 253–260. (
- 2007) A neutral sampling formula for multiple samples and an “exact” test of neutrality. Ecology Letters, 10, 608–618. (
- 2009) Maximum likelihood estimation of neutral model parameters for multiple samples with different degrees of dispersal limitation. Journal of Theoretical Biology, 257, 510–514. (
- 2005) A dispersal-limited sampling theory for species and alleles. Ecology Letters, 8, 1147–1156. & (
- 2004) A novel genealogical approach to neutral biodiversity theory. Ecology Letters, 7, 170–175. & (
- 2006) Comment on ‘‘Neutral Ecological Theory Reveals Isolation and Rapid Speciation in a Biodiversity Hot Spot’’. Science, 311, 610b. , , & (
- 1972) The sampling theory of selectively neutral alleles. Theoretical Population Biology, 3, 87–112. (
- 1998) Sampling strategies for the assessment of the tree species diversity. Journal of Vegetation Science, 9, 161–172. , , & (
- 2010) Splendor and misery of indirect measures of migration and gene flow. Heredity, 106, 11–12. (
- 2001) The Unified Neutral Theory of Biodiversity and Biogeography. Princeton University Press, Princeton and Oxford. (
- 2006) Neutral theory and the evolution of ecological equivalence. Ecology, 87, 1387–1398. (
- 2009) Inferring the parameters of the neutral theory of biodiversity using phylogenetic information and implications for tropical forests. Ecology Letters, 12, 239–248. & (
- 2008) GST and its relatives do not measure differentiation. Molecular Ecology, 17, 4015–4026. (
- 1996) Statistics and partitioning of species diversity, and similarity among multiple communities. Oikos, 76, 5–13. (
- 2005) Neutral Ecological Theory Reveals Isolation and Rapid Speciation in a Biodiversity Hot Spot. Science, 309, 1722–1725. , & (
- 2007) Neutral theory: a historical perspective. Journal of Evolutionary Biology, 20, 2075–2091. (
- Mathworks (2004) Matlab 7.0. Mathworks Inc., Natwick, MA.
- 2003) A test of the unified neutral theory of biodiversity. Nature, 422, 881–885. (
- 2008) Beta-diversity in spatially implicit neutral models: a new way to assess species migration. The American Naturalist, 172, 116–127. , & (
- 2007) Estimating parameters of neutral communities: from one Single Large to Several Small samples. Ecology, 88, 2482–2488. , , & (
- 2003) The end of the beginning for neutral theory. Trends in Ecology & Evolution, 18, 433–434. & (
- 1973) Analysis of Gene Diversity in Subdivided Populations. Proceedings of the National Academy of Sciences of the United States of America, 70, 3321–3323. (
- 2011) A sampling theory for asymmetric communities, Journal of Theoretical Biology, 273, 1–14. , , & (
- 2004) Beyond F-ST: analysis of population genetic data for conservation. Conservation Genetics, 5, 585–602. & (
- 2007) Steady state of homozygosity and Gst for the island model. Theoretical Population Biology, 72, 231–244. , & (
- 2007) Dynamics of Fst for the island model. Theoretical Population Biology, 72, 485–503. , & (
- 1949) Measurement of diversity. Nature, 163, 688. (
- 1985) Gene flow in natural populations. Annual Review of Ecology and Systematics, 16, 393–430. (
- 1984) FST and GST statistics in the finite island model. Genetics, 107, 501–504. & (
- 2003) Neutral theory and relative species abundance in ecology. Nature, 424, 1035–1037. , , & (
- 1999) Indirect measures of gene flow and migration: FST!=1/(4Nm+1). Heredity, 82, 117–125. & (
- Wolfram (2003) Mathematica 5.2. Wolfram Research, Inc., Champaign, IL.
- 2007) The impact of neutrality, niche differentiation and species input on diversity and abundance distributions. Oikos, 116, 931–940. & (

### Supporting Information

- Top of page
- Summary
- Introduction
- Methodological background
- Analytical results
- Simulation application
- Discussion
- Acknowledgements
- References
- Supporting Information

**Appendix A.** Sampling properties of the similarities**Appendix B.** Delta method to assess the sampling mean and variance of twice differentiable functions**Appendix C.** Raw moments of the local species abundance distribution (SAD) in the context of the two-level spatially implicit neutral model (2L-SINM) of Hubbell (2001)As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials may be re-organized for online delivery, but are not copy-edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.

Filename | Format | Size | Description |
---|---|---|---|

MEE3_133_sm_AppendixA-C.doc | 288K | Supporting info item |

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.