Inferring seed bank from hidden Markov models: new insights into metapopulation dynamics in plants

Authors


Summary

  1. Capturing metapopulation dynamics of plants that have seed banks is challenging, because of the difficulty in characterizing the seed bank in the field.
  2. To account for the presence of a seed bank, we developed a hidden Markov model, where the focus species can be present in two forms, both above-ground and below-ground, the latter being unobservable. We generated patch histories of presence–absence for a species with a one-year seed bank under different colonization–extinction dynamics and metapopulation sizes, using a mechanistic model that accounts for three different sources of seedlings (seed bank, newly locally produced seeds and migrant seeds) as well as a disturbance process reflecting extinction. Using the program e-surge, we analysed these simulated data to evaluate the statistical performance of the hidden Markov model in detecting the presence of a seed bank and providing accurate estimates of the model parameters for different sets of parameter values.
  3. Our simulation tests showed that the absence of a seed bank was very well detected when data sets were simulated with no seed bank, regardless the size of the metapopulation. Similarly, the presence of a seed bank was well detected when data sets were simulated with a seed bank. In this latter case, detection of the seed bank improved with increasing size of the metapopulation.
  4. The quality of the estimates of the model parameters increased with the size of the metapopulation but still remained high for small metapopulation sizes. The two parameters reflecting the colonization process and seed dormancy were those best estimated. In addition, we showed that ignoring the presence of a seed bank unvaryingly led to overestimations of colonization and extinction rates.
  5. Synthesis. Hidden Markov models offer a reliable way to estimate colonization and extinction rates for plant metapopulations with a seed bank using time series of presence–absence data. Therefore, these models have the potential to provide valuable insights into the metapopulation dynamics of many plant and animal species with an unobservable life form that have remained poorly studied because of methodological constraints.

Introduction

Since first proposed by Levins (1969), the metapopulation concept has triggered many developments in evolutionary biology (Ronce, Perret & Olivieri 2000) and ecology (Hanski & Gilpin 1997; Hanski 1999). The metapopulation framework considers that species distribution over space and time results from a balance between extinction of local populations inhabiting a discrete network of suitable patches and colonization of empty patches. While such a framework has been extremely influential in the study of animal species, its relevance for studying plant species has been much more controversial (Bullock et al. 2002; Freckleton & Watkinson 2002, 2003; Ehrlén & Eriksson 2003). On the one hand, Husband & Barrett (1996) argued that the patchy structure of plant populations, as well as their supposedly high turnover, made them ideal candidates for metapopulation studies. On the other hand, Freckleton & Watkinson (2002) argued that specific plant characteristics could make the metapopulation model inadequate for capturing the regional dynamics of many plant species. Among these characteristics is the seed bank, which is a prevalent trait of plant species (Harper 1977; Thompson, Bakker & Bekker 1997) and that is known to have a major impact on regional dynamics. Indeed, by spreading seed germination and reproduction through time, prolonged seed dormancy that builds up the seed bank (Harper 1977) can represent a bet-hedging strategy that allows species to reduce temporal variation in fitness in unpredictably varying environments and thus mitigate the effects of unfavourable years (Cohen 1966; Evans et al. 2007; Venable 2007). Seed bank effects on regional dynamics may, however, not be captured when considering only colonization and extinction processes (Freckleton & Watkinson 2002; Ouborg & Eriksson 2004). Therefore, metapopulation dynamics of species possessing a seed bank have been poorly documented in the field because of the difficulty in characterizing the seed bank. The few empirical metapopulation studies conducted in plants either have ignored the potential effect of a documented seed bank on metapopulation persistence (Lesica 1992; see a review in Freckleton & Watkinson 2002) or have been developed for species with no seed bank (Dornier, Pons & Cheptou 2011). The specific problem associated with a seed bank is that species cannot be detected when present only below-ground. Consequently, such missing information makes it unclear whether a newly observed population is derived from colonization or from the germination of the seed bank. Similarly, it is not clear whether a previously occupied habitat corresponds to an extinction process or whether there are still some individuals left in the seed bank. Therefore, estimates of colonization and extinction probabilities are necessarily biased whenever prolonged seed dormancy is ignored for species with a seed bank.

In a monitoring context, field ecologists are able to obtain long-term presence–absence data from patch surveys above-ground. Such data sets have recently stimulated a large amount of work on the issue of imperfect detection (e.g. MacKenzie et al. 2003; Royle 2006; Royle & Kery 2007; and references therein). Repeated surveys within a season can be used to compensate for imperfect detection (MacKenzie et al. 2009). However, they do not help detect the presence of a seed bank, making standard patch occupancy models inappropriate. Patch occupancy surveys only allow occupancy to be assigned to the above-ground state, and thus present uncertainty for patch occupancy below-ground. The question remains whether such occupancy data above-ground can provide information about the presence of the species below-ground through the existence of a seed bank. Hidden Markov models, such as multievent models (Pradel 2005) and patch occupancy models (MacKenzie et al. 2009), decouple the observation process and the state process and thus enable to take into account the uncertainty in the assignment of state. In the context of plant metapopulations with a seed bank, the observation corresponds to the presence or absence of the species above-ground, and the states correspond to the combination of the presence or absence of the species above-ground and below-ground.

In this study, we highlight how hidden Markov models can be used to address the longstanding problem of unobservable stages in the life cycle, a problem that has hampered empirical studies of plant metapopulations. To do so, we performed stochastic simulations using a mechanistic model to generate patch histories of presence–absence for a species with a one-year seed bank, utilizing different colonization–extinction dynamics and metapopulation sizes. Using the program e-surge (Choquet, Rouan & Pradel 2009), we analysed these simulated data to evaluate the statistical performance of our model in (i) detecting the presence of a seed bank and (ii) providing accurate estimates of the model parameters for different sets of parameter values.

Materials and methods

Plant life cycle

To explore the potential of our approach, we limited ourselves to a simple situation, although we suggest further extension of our model in the discussion. We considered a metapopulation of an annual plant species with a one-year seed bank, consisting of a finite number N of discrete suitable patches (Fig. 1). At the time of the census during the flowering period, if any plant was observed in a patch at time t, we assumed that those plants automatically produced seeds. We assumed that those seeds gave rise to at least one seedling between t and t + 1 with a probability g0, while some automatically entered the seed bank and produced at least one seedling between t + 1 and t + 2 with a probability g1. Thus, whenever a plant was observed above-ground in a patch at time t, this patch contained a seed bank at time t + 1. Given that seeds had a one-year longevity in the seed bank, seeds produced at time t could thus germinate at the latest at time t + 2. Each patch had a probability c of carrying at least one seedling following colonization between t and t + 1. We further assumed that migrant seeds did not enter the seed bank. Thus, whenever a patch was not occupied above-ground at time t, the species was not present in the seed bank at time t + 1. The three sources of seedling occurrence between t and t + 1 in each patch corresponding, respectively, to parameters g0, g1 and c were independent in our mechanistic model. Finally, we defined d as the probability of all seedlings from any origin dying due to the occurrence of a disturbance taking place between t and t + 1. It is worth noting that the probability of a successful colonization of a patch empty at time t, that is totally devoid of the focus species above- and below-ground, is c(1 − d) because the alien seedlings must escape extinction from disturbance to effectively reinstate the species in the patch at time t + 1.

Figure 1.

Life cycle graph and demographic survey for a metapopulation of an annual species with a one-year seed bank. Using an example of patch history over three occasions, we illustrate the link between the parameters g0, g1, c and d reflecting the mechanistic processes of the model, the observations {‘0’: no plant above-ground, ‘1’: plant above-ground} and the underlying patch states {AA, AP, PA, PP}.

Demographic model

Each patch was characterized by whether the species was observed above-ground (‘1’) or not (‘0’) at the time of the census. Thus, the data set consisted of the histories of encounters of the species in a set of N patches surveyed each year during the flowering period over a finite number of years. To account for the uncertainty of patch state inherent to the presence of a seed bank, we developed a hidden Markov model (Pradel 2005) based on the species’ life cycle described above. The model was characterized by four states, AA (absent above-ground and absent in the seed bank), AP (absent above-ground and present in the seed bank), PA (present above-ground and absent in the seed bank), PP (present above-ground and present in the seed bank) and two observations (no plant above-ground, plant above-ground). This implementation can thus be seen as defining a multistate occupancy model (MacKenzie et al. 2009), where the focus species can be present in two forms, one of which is unobservable. In the presence of a one-year seed bank, the metapopulation dynamics were fully described by the 4 × 4 Markovian transition matrix, Φt, which describes the fate of patches in states AA, AP, PA and PP from t to t + 1, and whose elements were expressed as a function of the parameters g0, g1, c and d. Given our assumptions of the species’ life cycle, patch occupancy was dependent upon the balance between the occurrence of seedlings arising from the three possible sources of seeds (seed bank g1, newly produced seeds g0, immigrant seeds c) and the disturbance process d. For the sake of clarity, we thus expressed each transition matrix as a function of those two steps. We thus defined a parameter fijk, reflecting the first step, as the probability for a patch in a given state to contain seedlings arising from the seed bank (i = 1 when the patch contained seeds in the bank at time t, 0 otherwise), from the seeds produced by plants above-ground at t (j = 1 when the patch was occupied above-ground at time t, 0 otherwise) and from immigrant seeds (k = 1, any patch, regardless of its state at time t, having the same probability c to contain at least one seedling arising from the colonization process between t and t + 1). For instance, f101 corresponded to the probability for a patch in state AP at time t to contain seedlings that emerged between t and t + 1. Under the four-state model with a one-year seed bank, hereafter referred to as Model 1, the transition matrix Φt from the state at t (in rows) to the state at t + 1 (in columns) can be written as:

display math

with f001 = c, f101 = (g1 + − g1c), f011 = (g0 + c − g0c) and f111 = (g1 + g0 + c − g0 g1 − g1c − g0c + g1g0c) (see Appendix S1 in Supporting Information for the detailed transition matrix as a function of g0, g1, c and d). Besides the transitions between states, probabilities of the two observations {no plant above-ground, plant above-ground} had to be specified as conditional on the underlying state. If no plant was observed above-ground at t (conventional code ‘0’), the patch was not occupied above-ground at t (states AA and AP). Conversely, if plants were observed above-ground at t (conventional code ‘1’), the patch was occupied above-ground at t (states PA and PP). We thus defined the following matrix of observation probabilities Et with states in rows and observations in columns:

display math

When g1 equalled 0, the model reduced to the classical metapopulation model for a species with no seed bank, with only two possible states A (absent above-ground) and P (present above-ground). By analogy with the four-state model, we defined a parameter fjk, as the joint probability for a patch to contain seedlings arising from the seeds produced by plants above-ground at t (j = 1 when the patch was occupied above-ground at time t, 0 otherwise) and from immigrant seeds (k = 1, see above). Under the two-state model with no seed bank, hereafter referred to as Model 2, the transition matrix Φ’t from the state at t (in rows) to the state at t + 1 (in columns) was as follows:

display math

with f01 = c and f11 = (g0 + c − g0c). Note that the matrix elements f01(1 − d) and (1 − f11) + f11d correspond, respectively, to the apparent colonization rate ca and apparent extinction rate ea in classical metapopulation models with no seed bank (Etienne, ter Braak & Vos 2004). In the absence of a seed bank, there was no uncertainty in patch state and thus no need to resort to an observation matrix. However, using the method of Choquet & Cole (2012), we showed that Model 1 was not identifiable when g1 equalled 0; that is, g0, c and d could not be separately estimated. Indeed, in this case, the 2 × 2 transition matrix is parameterized with three parameters, one more than is allowed for a full rank model.

Simulation study

We performed a simulation study to test (i) whether we could retrieve the presence or absence, respectively, of a seed bank by a model selection approach comparing Model 1 and Model 2 when patch occupancy data were simulated with g1 ≠ 0 or g1 = 0, (ii) in the presence of a seed bank, the ability of Model 1 to provide precise estimates depending upon the values of the parameters that characterized the metapopulation dynamics. To that aim, we simulated histories of encounters in a set of N patches with N = 200 and N = 1000, according to Model 1 where parameters g0, g1, c and d were set constant over time. The number of years was set to 10, a reasonable length for monitoring programmes relying on presence–absence data. We assumed that both the proportion of patches going extinct and the proportion of patches being colonized did not depend on occupancy rate and were constant over time (Gotelli 1991). We set π the initial vector of frequency πj (j = 1 to 4) of each of the four states to those expected at equilibrium for each transition matrix. Parameters c and d were set alternatively to 0.1 and 0.4. Parameter g0 was fixed to 0.5 or 0.8, while parameter g1 was fixed to 0, 0.2 or 0.5. Moreover, we restricted our simulations to data sets where g0 ≥ g1, given that most empirical studies dealing with species with a seed bank have shown that seed survival declines with age (e.g. Saatkamp et al. 2009). We therefore obtained a total of 24 combinations of parameters (g0, g1, c and d) for N = 200 and N = 1000 (Table 1). For each metapopulation size and each combination of parameters, we simulated 100 replicates to account for the stochastic nature of the processes. We thus performed 4800 simulations, using Matlab 7.

Table 1. Results of the model selection for detecting the presence–absence of a seed bankThumbnail image of

Data analysis

We conducted the analysis of simulated histories of encounters with program e-surge (Choquet, Rouan & Pradel 2009). The practical implementation of the hidden Markov model in e-surge is described in Appendix S2.

Evidence for a seed bank

Because Model 1 was not identifiable when g1 = 0, we could not perform the likelihood ratio test between Model 1 and Model 2. We thus tested the ability to retrieve the presence or absence of a seed bank from the patch occupancy data set by a model selection procedure (Burnham & Anderson 1998), comparing Model 1 (with a seed bank) and Model 2 (with no seed bank) when patch occupancy data were simulated with g1 ≠ 0 (respectively, g1 = 0). Models were ranked according to their AIC values calculated as:

AICi = −2 log Li + 2Ki

where Li is the likelihood of Model i (i = 1;2) (−2 log Li being the deviance of Model i) and Ki is the number of parameters of Model i. Model 1 and Model 2 had, respectively, seven (c, g0, g1, d, π1, π2, π3) and three parameters (ca, ea, π1). Model 2 was selected whenever −2 log L1 = −2 log L2. When −2 log L1 ≠ −2 log L2, we computed ΔAIC, which was the difference of AIC values between Model 2 and Model 1. We considered that (i) when ΔAIC > 2, the best model was that having the lowest AIC value, and (ii) when ΔAIC ≤ 2, both models had substantial support.

Estimation of the parameters

In a second step, to assess the quality of e-surge estimates of the model parameters for each metapopulation size, we focused on the 16 combinations of parameters for which Model 1 was identifiable (i.e. g1 ≠ 0). For each set (N, g0, g1, c, d), we estimated the mean of g0, g1, c and d over the 100 replicates. To assess the quality of each estimator, we computed the following statistics over the 100 replicates: (i) the standard error, (ii) the bias, calculated as the difference between the mean and the true value of the parameter and (iii) the mean square error of the estimator (mse = bias² + variance of the estimator).

Bias in colonization and extinction rates when ignoring the seed bank

To estimate the bias in apparent colonization rate (ca) and extinction rate (ea) when neglecting the seed bank, we selected the simulated data sets with a seed bank (g1 ≠ 0) for each metapopulation size and computed for each of the 16 combinations of parameters, the mean and the standard error of parameters ca and ea estimated from Model 2 (i.e. assuming no seed bank) over the 100 replicates. We then compared the mean apparent colonization rate ca with the expected effective colonization rate ce calculated from the four-state model. The expected colonization rate ce corresponded to the probability of an empty patch at time t (i.e. in state AA) to be occupied at time t + 1 (i.e. to be in state PA) as a consequence of the colonization process. This was equal to c(1 − d). Similarly, we compared the mean apparent extinction rate ea with the expected effective extinction rate ee calculated from the four-state model. The expected effective extinction rate ee corresponded to the sum of the probability of an occupied patch at time t (i.e. in states AP, PA and PP), to be empty at time t + 1 (i.e. to be in state AA). In our model, patches in states PA and PP at time t could not go extinct at time t + 1. Thus, ee corresponded to the weighted probability of patch AP at time t to be in state AA at time t + 1 and was calculated as follows:

display math

with π2, π3 and π4 being the frequency at equilibrium of the states AP, PA and PP, respectively.

Results

Evidence for a seed bank

Increasing the number of surveyed patches from 200 to 1000 greatly increased the ability to detect the presence of a seed bank with a model selection approach (Table 1). Indeed, when g1 ≠ 0, the model with the seed bank was selected for 9 data sets of 16 for more than 95 % of the replicates when N = 200, whereas it was selected for 15 data sets of 16 when N = 1000. The incorrect model, that is the one with no seed bank, was predominantly selected for three data sets when N = 200 (note that in this case, this occurred for far < 95% of the replicates), whereas this never occurred when N = 1000. Moreover, whatever the size of the metapopulation, the eight data sets for which the correct model was not selected (for < 95% of the replicates) were simulated with the smallest g1 value (g1 = 0.2). When data sets were simulated with no seed bank (g1 = 0), the correct model was selected for all data sets for more than 95% of the replicates whatever the size of the metapopulation (Table 1).

Estimation of the parameters with e-surge

For each data set simulated with a seed bank (g1 ≠ 0), the mean of g0, g1, c and d and their standard error calculated over the 100 replicates are provided in Table S1. The quality of the estimates of the model parameters increased with the size of the metapopulation (Fig. 2). That is, the mean square error of each estimator g0, g1, c and d decreased markedly when the number of patches increased from 200 to 1000. The higher quality of the estimates of the model parameters for N = 1000 resulted from both a lower bias and a lower variance of the estimators (Table S1). Although of poorer quality, the model parameters were still well estimated for N = 200 (mse < 0.113). Overall, for each set of fixed values {g0, g1}, the quality of the estimators of each parameter was highest when disturbance was low (d = 0.1) (see for instance data sets 1, 2, 3 and 4, where g0 = 0.5 and g1 = 0.2). For the following results, we only show graphical results for N = 200, a metapopulation size likely to be closer to those of patch occupancy studies (MacKenzie & Royle 2005), provided that results for N = 1000 were qualitatively similar (Table S1). Among the four parameters of the model, c and to a lesser extent g1 were very well estimated (mse < 0.053), whereas estimates of g0 and d were of poorer quality (mse < 0.113) (Figs 2 and 3; Table S1). The higher quality of the estimates of c and g1 resulted both from a lower bias and a lower variance of their estimators compared with g0 and d (Fig. 3). Overall, parameters g0, g1, c and d tended to be underestimated when N equalled 1000 (Table S1). This pattern was only marked for d when N = 200 (Fig. 3).

Figure 2.

Mean square errors of the estimates of the parameters g0, g1, c and d when data sets were simulated with a seed bank (g1 > 0). For each parameter and each data set, the mean square error is given for N = 200 (○) and N = 1000 (●). Data sets are numbered as in Table 1.

Figure 3.

Quality of the estimates of the model parameters g0, g1, c and d when data sets were simulated with a seed bank (g1 > 0) for a metapopulation of size N = 200 patches. For each data set, the size of the box represents the absolute bias calculated as the difference between the estimator's expectation and the true value of the parameter. A black box corresponds to a data set where the parameter is underestimated, whereas an unfilled box corresponds to a data set where the parameter is overestimated. The error bar represents the mean of the parameter +/− its standard error. Data sets are numbered as in Table 1.

Bias in the colonization and extinction rates when ignoring the seed bank

Colonization rates and extinction rates were both overestimated when data sets simulated with a one-year seed bank were analysed with the two-state model that ignores the seed bank. In fact, the apparent colonization rate (ca) and the apparent extinction rate (ea) were always higher than the expected effective colonization rate (ce) and the expected effective extinction rate ee, respectively (Fig. 4). Furthermore, the bias for the extinction rate tended to be larger than that of the colonization rate (Fig. 4).

Figure 4.

Bias in the colonization and extinction rates when neglecting the seed bank for a species with a one-year seed bank and a metapopulation size of 200 patches. For each data set, we plotted the mean apparent colonization rate ca (●), estimated with the two-state model, and the expected effective colonization rate ce (○), calculated as c(1 − d). Similarly, we plotted the mean apparent extinction rate ea (●), estimated with the two-state model, with the expected effective extinction rate ee (○), calculated as inline image, with π2, π3 and π4 being the frequency at equilibrium of the states AP, PA and PP, respectively. Error bars represent the mean of the parameters ca and ea +/− their standard error. Data sets are numbered as in Table 1.

Discussion

Performance of the model selection approach for detecting the presence - absence of a seed bank

Issues of uncertainty in state assignment have hampered metapopulation studies in plants with a seed bank because of the inability to separate true and false absences of species at survey locations (Freckleton & Watkinson 2002). Using hidden Markov models that account for uncertainty in state assignment, we show that time-series data of occupancy above-ground can provide information about the presence of a species below-ground, whenever there is an unobservable stage in the life cycle. The existence and properties of seed banks are one of the least well-understood stages in plant populations. Indeed, seed banks have proved very difficult to measure for studying population dynamics or life-history trait evolution without resorting to complex ecological experiments and demographic surveys (Evans et al. 2007; David et al. 2010; Siewert & Tielbörger 2010; Tielbörger, Petrů & Lampei 2012) or genetic analyses (Tellier et al. 2011). Additionally, seed banks have been shown to be highly variable at the intraspecific level (Thompson, Bakker & Bekker 1997; Baskin & Baskin 1998), making it difficult to generalize results among survey locations.

In this context, our model selection approach offers a reliable way to test for the existence of a one-year seed bank on the sole basis of time series of patch occupancy data and, if found, to account for it in metapopulation models. Given the model parameter values used in our study, the reliability of the model selection approach for detecting the presence of a seed bank increased with metapopulation size. Indeed, in the presence of a seed bank, the seed bank model was selected in 15 of 16 data sets (94%) for N = 1000, whereas it was selected in 9 of 16 data sets (56%) for N = 200. Survey efforts in patch occupancy studies are highly dependent from economic constraints as well as observational constraints arising from occupancy rate and detectability (Field, Tyre & Possingham 2005). However, optimal designs for occupancy studies usually involve a much larger total sampling effort than surveying 200 patches once a year (MacKenzie & Royle 2005). When data sets were simulated with g1 = 0, the absence of a seed bank was very well detected whatever the size of the metapopulation. Our model selection approach is thus likely to provide reliable results regarding the presence or absence of a seed bank in the vast majority of patch occupancy studies in plants.

Apart from the size of the metapopulation, the reliability of our approach also depends upon the parameter g1. Indeed, the eight data sets for which the seed bank model was not selected (seven for N = 200 and one for N = 1000) were simulated with the smallest g1 value (g1 = 0.2). In our model, g1 corresponds to the probability that seeds having entered the seed bank at t - 1 will produce at least one seedling between t and t + 1. This probability is the product of many terms representing different processes in the life cycle, including seed production, the fraction of seeds entering the seed bank, the survival of seeds in the soil during the first year and the probability of dormant seeds germinating during the second year conditional on their survival. Because these quantities are very difficult to estimate in the field, we do not know about the distribution of g1 values in species having a one-year seed bank. Gathering quantitative data in the field on a large number of species would thus help assessing the power of our model selection approach for detecting the presence of a seed bank from presence–absence data collected in natural conditions. Overall, our results show that the model selection approach greatly reduces the risk of analysing patch occupancy data with an inadequate metapopulation model.

Performance of hidden Markov models for estimating the parameters of the model in the presence of a seed bank

We show that hidden Markov models perform very well for estimating the four parameters of the demographic model, each representing a mechanistic process of the plant metapopulation dynamics. The quality of the estimates of the model parameters increased with the size of the metapopulation but still remained high for small metapopulation sizes. As a consequence, the methodology we propose offers a reliable way to estimate colonization and extinction rates from time series of presence–absence data, using the mathematical expressions linking colonization and extinction to the four underlying parameters of the model. To our knowledge, our study is the first to address the issue of unobservable state in plants within the metapopulation framework. Lamy et al. (2013) also use multistate occupancy models to quantify metapopulation dynamics for species that possess unobservable stages in their life cycle. However, their approach appears difficult to extend to plants for two reasons. First, it relies on the idea that unobservable life stages are often associated with a particular habitat characteristic, making the habitat temporally unsuitable for the presence of the species above-ground. This is, for instance, the case of many freshwater species, for which resistant eggs are a way to survive habitat desiccation. In plants, prolonged seed dormancy corresponds to the production of mature seeds that do not emerge even when the environmental conditions are favourable (Evans & Dennehy 2005) and can represent a bet-hedging strategy in temporally varying environments (Evans et al. 2007; Venable 2007). Thus, in contrast to resistant forms of many animal species, the seed bank is not associated with unsuitable habitat conditions and may thus coexist with above-ground life stages, as in our model. Second, Lamy et al. (2013) assume that the presence of the unobservable stage is associated with an unambiguously observable habitat characteristic. In plants, such a relationship seems very difficult to ascertain in the field.

As already shown in other studies (MacKenzie et al. 2003; Lamy et al. 2013), imperfect detection unvaryingly leads to overestimations of colonization and extinction rates when ignoring the presence of a seed bank in the life cycle. Our approach has thus the potential to provide invaluable insights into the metapopulation dynamics of plant species that have remained poorly studied because of methodological constraints (Freckleton & Watkinson 2002). Besides metapopulation dynamics studies, the methodology we propose can also be applied to address important issues regarding the evolution of life-history traits. Seed dormancy has been classically viewed as a way to disperse in time (Venable & Brown 1988). It has thus been suggested that seed dormancy and seed dispersal may respond to similar evolutionary forces, those strategies representing alternative responses to environmental heterogeneity (e.g. Olivieri 2001). Accordingly, because those strategies entail some cost, most theoretical studies have predicted a negative covariation between dormancy and dispersal (Olivieri 2001; but see Snyder 2006; Vitalis et al. 2013). Such predictions have never been tested in natural populations because of the difficulty in quantifying dormancy and dispersal (see Siewert & Tielbörger 2010 for a test using experimental populations). Using a long-term series of presence–absence data in habitats with varying ecological characteristics such as fragmentation level, our approach would allow testing of whether there is a trade-off between dispersal and dormancy in natural conditions. Indeed, parameters c and g1, which are those best estimated among the four model parameters, represent the contribution of migrant seeds and dormant seeds to the emergence of new seedlings and can thus be used as proxies for dispersal and dormancy, respectively.

Extensions of the model

To explore the potential of our approach, we limited ourselves to the case of annual plant species with a one-year seed bank. The present hidden Markov model can be easily extended to account for different characteristics in the plant life cycle. Extending our theoretical framework to species with a longer seed bank longevity as well as to perennial species represent the next steps to broaden the applicability of our approach. Our approach is highly flexible in terms of data analysis. Although we considered in this study that colonization and extinction rates were constant over time and did not depend on occupancy patterns, these hypotheses can be relaxed when specifying the structure of the model in e-surge to account for alternative metapopulation models (Gotelli 1991; Dornier & Cheptou 2012). For example, a rescue effect can be incorporated using the distance between patches as a covariate for the colonization process (Moilanen 2004). Time dependency can also be included for the four parameters of the model. Finally, the parameterization of our model makes it directly applicable to many animal species with an unobservable stage in their life cycle. Our approach has, therefore, the potential to provide valuable insights into the metapopulation dynamics of many species, making the distinction between plants and animals rather arbitrary in the metapopulation literature.

Acknowledgements

We thank G. Tuleu for its contribution to a preliminary stage of the project. We are very grateful to Roberto Salguero-Gómez and two anonymous reviewers for suggestions that greatly improved the manuscript. We also thank Susan Lambrecht for carefully copy-editing the manuscript. This work was funded by a ‘Chercheurs d'Avenir’ grant to P-O. Cheptou, R. Choquet, H. Fréville and R. Pradel from the Région Languedoc-Roussillon.

Ancillary