## Introduction

Since the seminal work of Hubbell (2001), the neutral theory of biodiversity has raised much interest by providing a nontrivial null hypothesis against which the nature of real-world biodiversity patterns may be tested and discussed (McGill 2003; Nee & Stone 2003; Volkov *et al.* 2003; Leigh 2007). Under the key assumption of ‘ecological equivalence’, namely that individuals have equal fitness whatever their species or habitat, Hubbell (2001) and followers (Etienne 2005) proposed a two-level spatially implicit neutral model (so-called 2L-SINM by Munoz, Couteron, & Ramesh 2008; see a synthesis in Beeravolu *et al.* 2009 and present Fig. 1a left) to decouple the dynamics of the large biogeographical background (i.e. the metacommunity), in which the effect of infrequent speciation and extinction events is quantified via the biodiversity parameter *θ* (speciation-drift balance), from local community dynamics, which is controlled by the immigration parameter, *I* (migration-drift balance, Etienne & Olff 2004). The theory has found some support from the investigation of very rich tree communities in wet evergreen tropical forests, where alternative models were not satisfactory for explaining some observed diversity patterns, such as the overrepresentation of rare species in the species abundance distributions (SAD) (Hubbell 2001; Chave, Alonso, & Etienne 2006). From a theoretical standpoint, neutral models can help us grasp how the interaction between limited immigration and local ‘ecological drift’ (Hubbell 2001) may influence taxonomic composition. Munoz, Couteron, & Ramesh (2008) have further proposed to relax the assumptions made by Hubbell (2001) at the upper level (i.e. the metacommunity) and to generalize the initial 2L-SINM into a 3L-SINM, by introducing an intermediate level, the regional pool of migrants, which may be shaped by non-neutral influences (Fig. 1a right). Under the 3L-SINM, the study of the local migration-drift equilibrium can be disconnected from non-neutral influences occurring at larger scales, via the estimation of the immigration parameter *I*.

Although inferring Hubbell’s parameters motivated a great deal of research in recent years leading to several estimation methods (Volkov *et al.* 2003; Etienne 2005, 2009; Munoz *et al.* 2007; Munoz, Couteron, & Ramesh 2008), the rapidly growing number of applications to real-world data sets has raised questions about how to assess the reliability of the inference, even in the strictly neutral case. Furthermore, published methods have frequently led to substantially differing parameter estimates when applied to the same data sets (see for instance Latimer, Silander, & Cowling 2005; Etienne *et al.* 2006; Etienne 2009, Table 1). The lack of knowledge on confidence limits has discouraged wider and more systematic applications of neutral theory through meta-analyses on parameter values. It is also hindering the development and application of more realistic models of community assemblages. Our aim is to provide such a statistically informed perspective on the estimation of the immigration parameter under the 3L-SINM.

The immigration parameter *I* is here central and represents the number of potential migrants competing with resident offspring at each mortality event within the community, and as such, it is closely related to Hubbell’s (2001) migration probability, (*N* notes the size of a local community). The neutral theory basically invokes ‘dispersal limitation’ from the metacommunity to the local community to define *I* and *m* (Hubbell 2001; Etienne 2005). But when these parameters are estimated from real-world data, they may also include the effects of many processes that altogether determine the relative isolation and differentiation of the local community from its biogeographical background. *I* and *m* could therefore be taken as an integrative measure of community isolation and differentiation, even though the strictly neutral ‘dispersal limitation’ invoked by Hubbell is not the only process shaping their values. An estimation of those immigration parameters could therefore be especially useful for quantifying how the fragmentation of threatened habitats (e.g. tropical forests) may influence the diversity in isolated patches taken as different local communities. More generally, analyses comparing consistent estimates of *I* and *m* across biogeographical regions and biomes would be a valuable contribution to community ecology and conservation biology by providing a measure of apparent fragmentation, which could be related to the ecological, physiographical and historical peculiarities of the different regions. This research programme requires increasing our knowledge about the sampling properties of *I* and *m*, which is the purpose of the present paper.

Neutral models used for inference of *I* and *m* initially focused on species abundances in a single community (e.g. Hubbell 2001; Etienne 2005; Latimer, Silander, & Cowling 2005), while most recent work (Etienne 2007, 2009; Munoz *et al.* 2007; Munoz, Couteron, & Ramesh 2008; Jabot & Chave 2009) has allowed estimating neutral parameters from species compositions in a set of spatially scattered sampled communities, and provide estimations of *I*(*k*) and *m*(*k*) for each sample *k*. In this regard, Munoz, Couteron, & Ramesh (2008) have proposed a conditional version of the classical *G*_{ST} statistic from population genetics (Nei 1973; Takahata & Nei 1984; Slatkin 1985), *G*_{ST}(*k*), and have showed that *I*(*k*) can be inferred from *G*_{ST}(*k*) under the more general three-level spatially implicit neutral model (3L-SINM, Munoz, Couteron, & Ramesh 2008; Beeravolu *et al.* 2009). Although this *G*_{ST}(*k*)-based estimation of neutral immigration performed well on the basis of simulated neutral communities (Munoz, Couteron, & Ramesh 2008), there is still a lack of theoretical knowledge regarding the sampling properties of the method (as for all of the published alternative methods). We bring here new insights into the estimation of immigration based on the *G*_{ST}(*k*) statistic, by (i) providing analytical formulas, independent of any community model, for bias and variance of the similarity statistics underlying the computation of *G*_{ST}(*k*) and *I*(*k*) and (ii) analysing the sensitivity of the estimation process to variation in the target parameters and to the main features of the data, such as the number of samples and the sample sizes. These results will yield practical insights for sampling design by allowing for the choice of a favourable trade-off in the number and size of community samples, according to a desired level of estimation accuracy.