How best to collect demographic data for population viability analysis models



    Corresponding author
    1. Institute of Botany, Academy of Sciences of the Czech Republic, CZ-252 43 Prùhonice, Czech Republic and
    2. Department of Botany, Faculty of Sciences, Charles University, Benátská, CZ-128 01 Prague, Czech Republic; and
    Search for more papers by this author

    1. Department of Botany, Stockholm University, SE-106 91 Stockholm, Sweden
    Search for more papers by this author

Zuzana Münzbergová, Institute of Botany, Academy of Sciences of the Czech Republic, CZ-252 43 Prùhonice, Czech Republic (e-mail


  • 1Matrix population models have become important tools in many fields of ecology and conservation biology, and are the most commonly used method in population viability analysis (PVA). There is a large literature concerned with different aspects of matrix model analysis, but relatively little attention has been paid to how data are collected.
  • 2In most demographic population viability studies, data are sampled in permanent plots, often resulting in poor representation of some stages. It has been suggested that by using previous knowledge of species’ demography it is possible to sample demographic data more efficiently. Here we propose an alternative method that is much simpler and does not rely on any assumptions, namely sampling an equal number of individuals per stage.
  • 3By using demographic data from 32 species we showed that sampling an equal number of individuals per stage provides more precise estimates of both population growth rate and elasticity than the traditional plot-based method. In some cases it is also better than the estimates gained using the method based on previous knowledge of the species’ demography. The conclusions of the latter method are very sensitive to the quality of the previous knowledge of the species’ demography. In contrast, collecting demographic data from an equal number of individuals per stage is independent of any assumptions.
  • 4Synthesis and applications. A central aim for management of threatened species is to develop robust and accurate methods for assessing population viability that are also efficient in terms of costs and labour. A key issue is how to collect the data on which viability assessments are based. We show that it is possible to increase considerably the accuracy and robustness of demographic PVA without increasing sampling effort, by using a simple method based on sampling an equal number of individuals per life-cycle stage. Improved PVA model performance will be important to guide conservation efforts and to evaluate different management options.


Matrix population models are a standard method used to assess the viability of structured populations (Morris & Doak 2002). Repeated iterations of the matrix model result in the projection of a population's equilibrium growth rate and extinction risk, providing a measure of the overall performance of populations. Moreover, sensitivity and elasticity analyses of matrix models can identify the life-history stages most critical for the persistence of a species. The results of matrix model analysis and simulation are often used to assess the vulnerability of a population to extinction and to evaluate different management options (Freckleton et al. 2003; Garcia 2004; Beverly & Martell 2004; van Mantgem et al. 2004).

Several recent studies have investigated how data sampling, parameter estimation and model construction influence the predictions of demographic population viability models (population viability analysis, PVA; Ludwig 1999; Easterling, Ellner & Dixon 2000; Gross 2002; Morris & Doak 2002; Kaye & Pyke 2003). These studies show that model output depends sensitively upon estimated parameters, and that the data used to parameterize models are frequently insufficient. As a result predictions of future population status will be uncertain (Ludwig 1999; Fieberg & Ellner 2000). There is a large literature concerned with how the techniques used to analyse collected data can be adapted to make predictions more accurate (Horvitz, Schemske & Caswell 1997; Ehrlén & van Groenendael 1998; Ludwig 1999; Mills, Doak & Wisdom 1999; de Kroon, van Groenendael & Ehrlén 2000; Caswell 2001; Ehrlén, van Groenendael & de Kroon 2001; Calder et al. 2003; Wilcox & Elderd 2003; Hodgson & Townley 2004). In contrast, very little attention has been paid to how the primary data are actually sampled. This is surprising as the accuracy of the data is critical for the quality of predictions of population viability (Lindborg & Ehrlén 2002) and no post-collection procedure can fully compensate for the shortcomings of the data.

Almost all recent plant demographic studies have sampled all individuals within plots (Z. Münzbergová, unpublished data). The most probable explanation for this is that sampling within plots is easy to apply in the field and also provides information on actual stage distribution. However, for any given sampling effort this strategy is unlikely to provide the most accurate estimates of demographic parameters because it often results in a very unequal number of individuals per stage. In the literature it is, in fact, not rare to encounter transition probabilities that are estimated using only one or a few individuals (Z. Münzbergová, unpublished data). A sampling strategy that is able to decrease the problems with already strongly unbalanced sample sizes at the stage of data collection therefore has much larger potential to increase the accuracy of estimates than any post-collection model adjustments.

The only published suggestion on how to sample demographic data other than by a plot-based method is that of Gross (2002). He suggests a method where the number of individuals to be sampled in each stage depends on how much the stages contribute to the growth rate of the population. To apply the method, one needs to make an educated guess on the relative importance of different transitions in the life cycle for population growth; this can be made using data from studies of the same or a related organism, the investigator's prior knowledge, pilot data or comparative demographic studies (Gross 2002).

In this study we suggest a third method for sampling demographic data that is based on sampling an equal number of individuals per stage. It can reduce the problems of plot-based sampling and requires no previous knowledge. We compared the accuracy and precision of this method with both the traditional plot-based sampling method and the alternative method suggested by Gross (2002), using demographic data from 32 plant species under different overall sampling efforts.


To estimate the effect of different sampling strategies on the accuracy and precision of the properties of the resulting matrix, we used data from 32 different plant species collected from the literature (see Appendix S1). We selected species for which at least three different matrices were available.

Three strategies were used to determine sampling proportions for each stage. First, the conventional plot-based strategy was investigated by using a vector of proportions derived from the projected stable-stage distribution for the target matrix. We used these proportions as estimates of the actual stage distribution in the field (which equals the average expected proportion in a demographic plot) because the latter was not available for all species. Secondly, we sampled an equal number of individuals per stage, i.e. all entries of the vector of sampling proportions for each stage were the same. Thirdly, we estimated the sampling proportions for each stage using the algorithm provided by Gross (2002). To do this we used one matrix (the last one in each paper) as the target matrix and all the other matrices of that species as reference matrices. One sampling proportion was calculated for each reference matrix. We used each of these proportions as well as the mean of all the sampling proportions as separate sampling strategies. Sampling an equal number of individuals per stage and plot-based sampling always resulted in one sampling proportion per strategy and species. The Gross (2002) method, on the other hand, resulted in as many sampling proportions per species as there were reference matrices (n = 2–27) plus the one sampling proportion based on the mean of these.

To assess the quality of the matrices resulting from the different strategies, we simulated 2000 hypothetical data sets in which individuals of each stage had average transition probabilities equal to those in the target matrix (Gross 2002). This was done by drawing an individual of a given stage in year 1, drawing a random number and using this random number to assign the fate of that individual (stage in year 2). The fate was determined by creating a cumulative distribution of transition probabilities of that stage in year 1 to all possible stages in year 2 and calculating where the random number would fall.

A separate transition matrix was created for each simulation run. As the transitions involving reproduction were often estimated in separate experiments and were not subject to the same sampling concerns, these transitions were not applied to the above sampling procedure and were kept constant over all the simulation runs.

The median number of individuals sampled per stage in published demographic studies 1977–2001 was 17 (n = 45; Z. Münzbergová, unpublished data). For simulations we used 0·5, 1, 2 and 8 times this value to represent different sampling intensities, and thus sampled 8, 17, 34 and 136 individuals times the dimension of the matrix (= number of stages) for each matrix. These individuals were distributed according to the respective sampling proportions derived by the three basic strategies. The plot-based and Gross methods occasionally resulted in a very low (< 5) number of sampled individuals for a stage. To avoid strongly skewed sampling distributions, we therefore applied an arbitrary correction and also sampled five individuals per stage when a lower number was suggested by the strategy. The number of individuals in the other stages was reduced proportionally to maintain the total number of individuals sampled. This adjustment could increase the accuracy of the plot-based and Gross methods. It was never applied to the equal-number strategy.

The precision of the sampled matrices was estimated as variation in population growth rate (λ) as well as mean variation of elasticity values. We used elasticities and not sensitivities because the former are much more frequently used in population viability assessments. Not only the precision but also the accuracy of population growth rate and elasticity values are a main concern in PVA. We therefore also examined the accuracy of the model predictions by comparing the mean deviation of simulated values from λ and elasticity values based on the target matrix. The deviation (dev) was calculated as dev = (S − T)/T, where S is the simulated value and T is the value of λ or elasticity for the target matrix.

To compare the accuracy and precision between the basic sampling strategies we used the results of the plot-based and equal-number strategies, the median results of the Gross method and results based on sampling using the mean of the different proportions derived using the Gross method. We also recorded the best and poorest estimates gained by the Gross method, to investigate how sensitive the Gross method can be to assumptions in terms of our previous knowledge of the species’ demography. This is important because often only one or a few reference matrices are available and we have no a priori knowledge of how appropriate they are.

To assess the significance of pair-wise differences between the different sampling strategies, we used a non-parametric Wilcoxon paired test applied to each pair of methods. All the simulations were done using Matlab version 5·3·1 (MathWorks Inc. 1999). Statistical testing was carried out with Statistica 5·1 (StatSoft Inc. 1998).

To test whether there were differences in success of the different methods for different types of species, we calculated elasticities for all the reference matrices. We summed the elasticities by fecundity, growth and survival, as suggested by Silvertown et al. (1993). Calculation of these values allowed comparison of the species by types of life cycle and seemed to be the best way to compare matrices of the studied species. We then tested whether there was a relationship between elasticity of fecundity and the success of the different methods in deriving precise and accurate λ and elasticity of the matrices. The success of the method was measured as the rank of that method for each particular species. We tested the relationship between success rank of equal-number sampling, median of the Gross method and the strategy based on the mean of Gross proportions. The tests were done separately for each sampling intensity and for the four parameters of precision and accuracy.


Sampling equal numbers of individuals per stage provided more precise and equally accurate measures of λ and elasticity compared with plot-based sampling; the difference was lower for higher sampling effort (Fig. 1). Equal-number sampling was also often better than the median results gained using the Gross method concerning both precision and accuracy of λ and elasticity. Again the difference was strongest for the lowest sampling effort (Fig. 1).

Figure 1.

The effect of different sampling strategies on (a) precision and (b) accuracy of the population growth rate and (c) precision and (d) accuracy of elasticity values for matrices representing 32 plant species. Precision is expressed as the coefficient of variation and accuracy as the mean proportional difference between simulated and actual values based on the target matrix. Different letters denote pair-wise differences between groups that are significant at P= 0·05, estimated using the non-parametric Wilcoxon paired test. The graph shows median and non-outlier minimum and maximum. Gross is the method proposed by Gross (2002).

The best estimates gained using the Gross method were significantly more precise and accurate for estimates of λ and elasticity than results gained by any other strategy for most sampling efforts. On the other hand, the poorest estimates using the Gross method were significantly less precise and accurate than any other estimates for both λ and elasticity (Fig. 1). The proportion based on the mean of all the proportions derived using the Gross method was significantly better than equal-number sampling for precision of elasticity in the two higher sampling efforts, but it was the same in all the other comparisons.

None of the tests of the relationship between success of different sampling strategies and elasticity of fecundity was significant. This indicates that the importance of fecundity of the matrix cannot be used to decide which sampling strategy to use.


Our simulations show that sampling an equal number of individuals per stage overall provides better model predictions than plot-based sampling and sampling protocols following Gross (2002). The only case in which the Gross method performs similarly well or sometimes even better is when multiple sampling proportions based on different matrices of the species are averaged. This suggests that the simple method based on sampling an equal number of individuals per stage can significantly improve the quality of PVA and that the Gross method can be also very useful but only when multiple reference matrices are available.

The advantages of sampling an equal number of individuals are also evident from the comparison of this method with the extreme estimates gained using the Gross method. While the best estimates are often better than equal-number sampling, the poorest estimates are always worse than estimates using any other sampling strategy. This shows that while the Gross method has the potential to provide very good predictions, it is risky when knowledge is less then perfect. In contrast, sampling an equal number of individuals per stage is a method that is very robust and provides reasonably good estimates in all cases. The Gross method is able to provide reliable estimates only in cases where multiple matrices are available for the species and the average proportion based on all the matrices is used. Such approach is very data demanding. If such data are available, however, then the Gross method can sometimes provide estimates that perform better than the equal-number sampling.

The need for an optimal sampling protocol is obvious from the fact that the quality of any PVA critically depends on how well it can be parameterized. Sampling individuals in a more efficient way will therefore greatly improve our ability to make predictions (Fig. 2).

Figure 2.

Effect of different sampling strategies on predictions of population size. The true value is a prediction based on the lambda of the original matrix. We show low (L) and high (H) ends of a confidence interval based on 1000 simulation runs over 10 transition intervals using matrices derived by three of the sampling strategies. The initial matrices are for Saxifraga cotyledon (Dinnetz & Nilsson 2000).

The results of comparisons of the different sampling strategies were similar when comparing estimates based on population growth rate and elasticity. However, differences between sampling strategies were more often significant for elasticities than for population growth rates. We expect that this is because elasticities are sensitive to sampling differences in each individual transition rate, while population growth rate averages the sampling effects over all transitions.

To be useful for PVA studies, the suggested protocol involving sampling an equal number of individuals per stage must be easily applicable in the field. Compared with the easily applicable plot-based method, sampling an equal number of individuals raises two practical difficulties. First, it requires a priori knowledge of the stages. Information on optimal stage division is usually the outcome of an analysis and is not known at the beginning of the study. However, we believe that a basic knowledge of the species is sufficient to delimit preliminary size categories in most cases. This preliminary decision on size classes will motivate a careful search for individuals of all different sizes and will thus increase the probability that no important stage is overlooked or underrepresented. Unlike the decision on optimal sampling proportion in the method of Gross, this decision will always improve the resulting quality of the data. Secondly, sampling in a plot-based manner provides an easy protocol for selecting individuals and ensuring relocation in the subsequent year. A statistically rigorous way to sample an equal number of individuals per stage would be to sample individuals in a random stratified manner. On the other hand, placing individuals randomly over the whole locality may become intractable. A compromise solution could be to use several plots randomly distributed over the population and sample a certain number of individuals per stage in each of these.

We were not able to detect any significant relationship between the importance of reproduction for each particular species and the performance of the different methods. Most of our examples, however, are based on herbs. If we consider other species groups one can generally expect that the larger the differences in reproductive value among stage classes, the larger the potential advantages of a prior knowledge of the sensitivity structure. Thus, given that correct information on sensitivity is available, the Gross approach is going to be more advantageous in these cases. Still, the critical point is that such precise prior knowledge is only rarely available, in which case the Gross approach is more risky.

In conclusion, sampling an equal number of individuals in demographic studies appears to be both a relatively simple and accurate way of collecting data for demographic studies. Applying this method, instead of using the common practice of sampling all individuals within plots, is likely to improve considerably the quality of demographic information and the predictions of population viability models for most species, especially in cases where the sampling effort is limited and there is no extensive prior knowledge of the study species.


We thank Tomáš Herben and Hans de Kroon for initiating the discussion on sampling data for demography studies and for comments on the manuscript. We also thank Ove Eriksson, Kevin Gross and two anonymous referees for comments on a previous version of the manuscript. This study was done when Z. Münzbergová held a scholarship from the Swedish Institute; it was also partly supported by grant GAÈR no. 206/02/05 to T. Herben and by grants from Swedish Research Council (VR) to J. Ehrlén. It was also partly supported by MSMST 0021620828, AV0Z6005908 and KSK6005114.