Notice: Wiley Online Library will be unavailable on Saturday 30th July 2016 from 08:00-11:00 BST / 03:00-06:00 EST / 15:00-18:00 SGT for essential maintenance. Apologies for the inconvenience.
1. Matrix population models are widely used to describe population dynamics, conduct population viability analyses and derive management recommendations for plant populations. For endangered or invasive species, management decisions are often based on small demographic data sets. Hence, there is a need for population models which accurately assess population performance from such small data sets.
2. We used demographic data on two perennial herbs with different life histories to compare the accuracy and precision of the traditional matrix population model and the recently developed integral projection model (IPM) in relation to the amount of data.
3. For large data sets both matrix models and IPMs produced identical estimates of population growth rate (λ). However, for small data sets containing fewer than 300 individuals, IPMs often produced smaller bias and variance for λ than matrix models despite different matrix structures and sampling techniques used to construct the matrix population models.
4.Synthesis and applications. Our results suggest that the smaller bias and variance of λ estimates make IPMs preferable to matrix population models for small demographic data sets with a few hundred individuals. These results are likely to be applicable to a wide range of herbaceous, perennial plant species where demographic fate can be modelled as a function of a continuous state variable such as size. We recommend the use of IPMs to assess population performance and management strategies particularly for endangered or invasive perennial herbs where little demographic data are available.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
Matrix population models, where individuals are divided into distinct classes based on their size, age or life history stage, are widely used in plant demographic studies to assess population performance. These models allow the long-term population growth rate (λ) and other population parameters to be estimated from individual-level data on survival, growth and fecundity (Caswell 2001). In addition to the traditional use of matrix models for population viability analyses (Menges 2000), matrix models have been used to analyse complex population dynamics and species interactions (e.g. Smith, Caswell & Mettler-Cherry 2005; Ramula, Toivonen & Mutikainen 2007), to produce harvest and restoration recommendations (e.g. Freckleton et al. 2003; Linares, Coma & Zabala 2008), and to assess alternative management strategies for invasive plant species (reviewed in Ramula et al. 2008).
Despite the popularity of matrix models, they have some limitations, which may decrease the accuracy and precision of estimated population parameters. First, all individuals within the same class are assumed to have identical demographic rates (Caswell 2001). This simplification makes matrix models sensitive to the selected matrix structure and therefore, different matrix structures may produce divergent estimates of λ (Ramula & Lehtilä 2005). The second limitation of matrix models is the large number of individuals required in each class to estimate demographic rates accurately. Small sample sizes often lead to the pooling of individuals with very different demographic rates within classes.
Using a large (n > 600 individuals) data set Easterling et al. (2000) found that matrix models and IPMs produced identical estimates of λ for a perennial herb, and recent applications of IPMs are usually based on data sets with hundreds of individuals (Childs et al. 2004; Ellner & Rees 2006, 2007; Hesse et al. 2008; Kuss et al. 2008). For endangered or invasive plant species, such large data sets are often lacking and population dynamics and management strategies must be assessed based on the available small data sets (Menges 2000; Simberloff 2003; Buckley et al. 2005). Hence, there is a need for models which reliably predict population dynamics from small demographic data sets, making the application of IPMs of great interest. We might generally expect IPMs to be more reliable than matrix models because they require fewer parameters to be estimated and these are estimated from the complete data set, rather than by dividing the data into classes. However, the magnitude of this effect has not been quantified, and we currently have no evidence that IPMs are more suitable for small data sets than matrix population models.
Many factors affect the accuracy and precision of λ estimates, including distance from the stable stage distribution, the variability of demographic rates and sample size (Caswell 2001). Small sample size may produce inaccurate estimates of λ because of large sampling error (Fiske, Bruna & Bolker 2008). One possibility to minimize sampling error for matrix model estimates is to focus the greatest sampling effort on the life stage(s) to which λ is most sensitive (Gross 2002). If no a priori knowledge of the importance of different demographic transitions to λ is available, the best accuracy for matrix models is achieved by sampling an equal number of individuals for all matrix classes (Münzbergová & Ehrlén 2005). In addition to model accuracy, the precision of the model is important. A model that produces precise but biased estimates is still useful if the magnitude of the bias is known and can therefore be corrected.
We explore the accuracy and precision of matrix population models and IPMs in relation to the amount of demographic data using two perennial herbs with different life histories (Cirsium palustre and Primula veris). We also compare two different techniques to parameterize an IPM, first using a constant regression model structure derived from the full data set and second, allowing the regression model structure to vary according to the data set at hand, which is sub-sampled from the full data set. We start by constructing a matrix population model and an IPM from the full data sets. We then reduce the number of individuals by sub-sampling the full data sets with and without replacement, and compare the accuracy and precision of λ in relation to full data sets. For the matrix models we use two alternative matrix structures and two different sampling techniques, a random sampling from the observed stage distribution and an equal sampling for all matrix classes. We concentrate on λ because it is commonly used to quantify population performance.
Materials and methods
Cirsium palustre L. (Asteraceae) is a short-lived, monocarpic herb that forms a 50–200 cm high flowering stem in its third summer or later, and reproduction is usually fatal (S. Ramula, personal observation). Primula veris L. (Primulaceae) is a long-lived, iteroparous herb. Both species are rosette-forming, mainly sexually reproducing, and have a persistent seed bank. We used demographic data collected from two Cirsium palustre populations during 2002–2005 in Sweden, and five Primula veris populations during 1996–1998 in Finland. For data description, see Appendix S1 (Supporting Information).
Since our aim was to compare the accuracy and precision of matrix models and IPMs in relation to the amount of data, not to predict population dynamics, we pooled the data across the populations and years within the species to increase sample size. Pooling was done for modelling purposes and is rarely recommended for demographic studies (Jongejans & de Kroon 2005). After pooling, we had 1040 individuals for Cirsium and 2155 individuals for Primula. We then sub-sampled these full data sets with samples of 100, 200, 300…1000 individuals. For each sample size, we used 500 replicates, which were randomly drawn from the full data sets without replacement. In addition, we also used an equal sampling, where matrix classes within a species contained equal numbers of individuals, excluding the seed bank. This sampling technique is recommended to minimize sampling error for matrix transitions when the importance of different life stages to λ is unknown (Münzbergová & Ehrlén 2005). Our equal sampling resulted in total sample sizes approximately similar to those gained using sampling from the observed distributions. As a result of small number of individuals for some size classes, we sub-sampled individuals from the full data sets with replacement.
Matrix population models
A deterministic matrix population model to predict the population state at time t + 1 is denoted as nt+1 = Ant where A is the matrix and n is the proportion of individuals in each class at time t. The matrix consists of matrix elements (aij), which describe an average contribution of an individual in stage j to stage i over time. To automate the construction of matrices from the sub-samples, we used a slightly different matrix structure from earlier studies for these species (Lehtiläet al. 2006; Ramula 2008). For Cirsium, our transition matrix consisted of six classes nearly identical to the original publication (Appendix S1, Supporting Information). For Primula, we pooled small- and medium-sized vegetative plants, resulting in a 5 × 5 matrix (Appendix S1, Supporting Information). For both species, seed bank transitions were estimated separately from the field data by calculating averages across the populations and years.
In addition to the matrices described above, for both species we used the smallest possible, biologically meaningful matrix dimension consisting of seed bank, seedlings, vegetative plants and flowering plants (Appendix S1, Supporting Information). Further, we reviewed 63 published demographic studies to explore the relationship between the size of demographic data sets and matrix dimensionality using Pearson’s correlation coefficient. For species with multiple matrices available, we used the average sample size.
Integral projection models
An IPM that contains a seed bank is described using two equations. The first equation describes the number of seeds in the seed bank at time t + 1, i.e. seeds remaining in the seed bank+ fresh seeds entering the seed bank, as
( eqn 1)
where ss is the constant seed survival in the seed bank, sr is recruitment from the seed bank and se is the establishment rate for fresh seeds. The fecundity function is described as fs(x) = fp(x)fn(x), where fp(x) is the probability of flowering and fn(x) is the number of seeds produced by plants of size x.
The second equation describes the density of individuals of size (y) at time t + 1 in the established population, including seedlings that germinate from the seed bank (first part), as
( eqn 2)
where the kernel k(y,x) describes all possible transitions from plant size x to plant size y, integrated over all sizes (Ω). Similar to other studies (Rees & Rose 2002; Rose et al. 2005), we used the integration of 0·9 times the minimum and 1·1 times the maximum rosette size observed, for evaluating the integrals see Table S1 (Supporting Information). The kernel consists of a survival-growth function, p(y,x), and a fecundity function, f(y,x), which both depend on plant size. For the monocarpic Cirsium where flowering is fatal, the survival-growth function is p(y,x) = s(x)[1 − fp(x)]g(y,x), where s(x) is the probability of survival for a plant size of x, fp(x) is the probability of flowering for a plant size of x and g(y,x) is the probability of a plant of size x growing to size y. For the iteroparous Primula, the survival-growth function is p(y,x) = s(x)g(y,x). For both species, the growth function, g(y,x), is a normal probability density function with mean and variance. The fecundity function is described as f(y,x) = fp(x)fn(x)sefd (y), where fp(x) is the probability of flowering and fn(x) is the number of seeds produced by plants of size x, and fd(y) is the probability distribution of seedling size with constant mean and variance. As a result of a lack of empirical data, we adopted the same procedure as others (Rees & Rose 2002; Childs et al. 2003; Rose et al. 2005; Williams & Crone 2006) and assumed that seedling size was independent of maternal plant size; matrix models make the same assumption. For the kernel parameters and equations, see Table S1 (Supporting Information).
To calculate the kernels from the data, we constructed regression models with plant size (rosette diameter for Cirsium and leaf length for Primula) at time t + 1 and seed production at time t as response variables and plant size at time t as an explanatory variable. Plant size and seed production were log-transformed in all the models. We estimated the dependence of plant survival and flowering probability on plant size using a generalized linear model with a logit link function (Table S1, Supporting Information). For each model, we included a quadratic size term and then selected the best model according to Akaike’s information criterion (Burnham & Anderson 1998). The selection of the best model for each sub-sample (termed the best model) allowed regression equations to vary from linear to quadratic depending on the sub-sample of data at hand. An exception was seed production, for which we always used a linear function to avoid drastic overestimates of seed production for small plants resulting from quadratic functions that sometimes fitted best for small data sets. The parameterization of the kernel from the data at hand is preferable for large data sets but it may not be the best solution for small data sets where additional information from other studies may be useful. In such a situation a priori knowledge of the species could be used to define the forms of the regression models. Therefore, we also used a constant model (termed the constant model), in which the forms of regression models were parameterized from the full data set and were kept fixed (Table S1, Supporting Information), while the parameters were estimated from the sub-samples.
To examine the accuracy of the demographic models, we calculated λ from each sub-sample and compared the mean λsub-sample with λ estimated from the full data set (hereafter λfull-data) for each model. This reveals whether the models on average produce biased estimates of λ in relation to λfull-data and if so to which direction. For equal sampling, we used mean λfull-data for the matrix model and IPM calculated from the full data sets with replacement (for estimates see Fig. S1, Supporting Information). The precision of the models was examined from variances for λsub-sample estimates in relation to variances for λfull-data. We conducted all calculations in r 2·4·1 (R Development Core Team 2006).
The matrix models and IPMs constructed from the full data sets produced approximately similar estimates of λ (1·234 and 1·221 for Cirsium; 1·331 and 1·301 for Primula respectively), as would be expected. Both models produced quite unbiased and precise estimates of λ for large data sets with the constant IPM usually being most accurate and most precise (Fig. 1). For all the models, the variance of the λ estimates increased with a decreasing amount of data and most rapidly so for the matrix model (Fig. 1c,d). For the smallest data set of 100 individuals, the matrix model resulted in 2·4 times greater variance than the constant IPM for Cirsium and 1·6 times greater variance for Primula (Fig. 1c,d). For small data sets containing fewer than 300 individuals, IPMs thus produced smaller bias and variance for λ than matrix models which generally underestimated λ (Fig. 1).
Equal sampling of individuals for the matrix classes did not qualitatively affect the results, and IPMs still tended to produce smaller bias and variance for λ than matrix models for small data sets (Fig. S1, Supporting Information). However, equal sampling somewhat reduced bias and variance in λ for the matrix models of Primula but not for Cirsium (Fig. 1 and S1 in Supporting Information).
For both study species, the smallest possible, biologically meaningful matrices overestimated λ (Fig. 2). Interestingly, a review of published demographic studies revealed that matrix dimensionality increased with an increasing number of individuals in data sets (Fig. 3 and Appendix S2 in Supporting Information). Demographic data sets contained fewer than 300 individuals for 52% of the 63 examined plant species (minimum = 62, median = 263, maximum = 5765 individuals), and such small data sets occurred for both common and rare species (55% and 45% respectively).
Using demographic data from two perennial herbs with different life histories, we found that IPMs were more suitable for estimating λ from small data sets (<300 individuals) than traditional matrix population models, even if sampling error in demographic transitions was minimized by sampling equal numbers of individuals for matrix classes. A negligible effect of equal sampling on λ for Cirsium can be explained by the fact that most data were collected for early life stage transitions (seedlings and vegetative plants) to which λ of Cirsium was most sensitive (Ramula 2008). Sampling focussed on the most variable and most sensitive life stages tends to reduce the sampling error for demographic transitions (Gross 2002).
For both the matrix model and IPM, variance in λ estimates increased with a decreasing amount of data. Compared with large data sets, the matrix model underestimated λ for the smallest data sets (<300 individuals), while such a systematic bias did not occur for IPM. Underestimates produced by matrix models were probably because of parameterization problems for the seedling class, which had the lowest survival probability (<23%) for both study species (see Appendix S1); a typical feature for many perennials. The direction of bias produced by matrix models is likely to vary among species depending on life history. For instance, matrix models overestimated λ for small data sets of the perennial herb Heliconia acuminata that had quite high survival (>70%) across the life stages (Fiske et al. 2008). There are at least three factors that may affect the magnitude of the bias and variance of λ produced by matrix models. First, the magnitude of the bias and variance of λ for small data sets depends on the magnitude of vital rates in matrix classes. Fiske et al. (2008) found that only survival close to one produced unbiased λ for H. acuminata, while the magnitude of fecundity transitions had no effect on bias. Survival was generally greater for Primula than Cirsium, which may explain the smaller relative bias in λ for Primula. Secondly, the magnitude of the bias and variance of λ produced by matrix models partly depends on matrix dimensionality (Ramula & Lehtilä 2005). The number of individuals per class declines more rapidly for a large matrix than for a small matrix, leading to mis-estimates of demographic rates because all individuals survive or die. These mis-estimates are likely to have a considerable effect if they occur in transitions with the greatest contributions to λ. However, smaller matrices are not necessarily a solution for small data sets because of a potential bias in λ estimates. Finally, the magnitude of the bias of λ in relation to matrix dimensionality depends on a plant’s life form, with λ estimates of herbaceous species being more sensitive to matrix dimensionality than those of woody species (Ramula & Lehtilä 2005). Therefore, IPMs generally perform better than matrix models especially for herbaceous species, which usually exhibit lower post-seedling survival than woody species. Demographic data on herbaceous and woody species are needed to further examine the role of life form in model reliability. Although our results are limited to two perennial herbs, they are likely to be applicable to other herbaceous species where demographic fate can be modelled as a function of a continuous state variable such as size or age. This is often the case for monocarpic perennials and other perennial herbs (Metcalf, Rose & Rees 2003; Lauenroth & Adler 2008). Moreover, our findings can be applied to stochastic and density-dependent population models irrespective of the magnitude of population growth rate because IPMs allow the inclusion of density dependence and stochasticity in demographic processes in similar ways to matrix models (Childs et al. 2004; Ellner & Rees 2006; Rees & Ellner in press).
Species’ life history and the purpose of the study obviously affect model selection. For annual plants, relatively simple equations can often be used to describe population dynamics (e.g. Buckley et al. 2001; Freckleton et al. 2008), while they are unavailable for perennials where researchers must choose an approach from multiple population models with different structures. Both monocarpic and iteroparous perennial herbs with small demographic data sets (<300 individuals) are likely to benefit from the use of IPMs because matrix models have the greatest probability of producing biased population estimates for such data sets. Bayesian estimates or bootstrapping can be used to incorporate uncertainty into population estimates for matrix models (Caswell 2001; Clark 2003) but these techniques do not reduce bias in estimates. We therefore recommend the use of IPMs to assess population performance and management strategies particularly for rare or invasive perennial herbs where little demographic data are available. In general, IPMs should be preferred over matrix models if a data set contains a few hundred individuals, population dynamics of the study species are clearly unequally sensitive to different demographic transitions, and data represent the observed stage distribution in the field. Although IPMs can usually be parameterized from the data at hand by fitting the best model for each kernel variable, for small and sparse data sets, a priori knowledge of the demography of the study species should be used to avoid spurious regression equations (i.e. quadratic and cubic relationships). Spurious regression equations caused by a few exceptional observations may erroneously estimate demographic rates for individuals of certain sizes, increasing the bias of IPMs. For the current data sets, the greater bias and variance of λ produced by the best IPM compared with the constant IPM were because of quadratic regression equations that were allowed for the parameterization of the best IPM from the sub-sample of data at hand.
We conclude that while the bias of all demographic models is likely to increase with decreasing amounts of data, for herbaceous species where demographic fate changes with a continuous state variable, IPMs usually perform better than matrix population models producing less biased and less variable estimates of λ. Therefore, IPMs should be used when assessing population dynamics and management strategies for perennial herbs based on small demographic data sets.
The authors thank K. Lehtilä, R. Leimu and K. Syrjänen for letting us use Primula data, and two anonymous reviewers for helpful comments. This work was supported by the Swedish Research Council Formas, the Academy of Finland (SR) and an Australian Research Council Australian Research Fellowship DP0771387 (YMB).