IPMpack: an R package for integral projection models


Correspondence author. E-mail: charlotte.metcalf@zoo.ox.ac.uk


  1. Structured demographic models offer powerful methods for addressing important questions in ecology and evolution.
  2. Integral Projection Models (IPMs) are related to classic matrix models, but are more appropriate for modelling structured populations when the variable describing individuals' demography is continuous (e.g. size, weight, etc.).
  3. We present IPMpack, a free open-source software (R) package for building IPMs. The package estimates key population characteristics from IPMs, such as population growth rate in both deterministic and stochastic environments, age-specific trajectories of survival and reproduction, and sensitivities and elasticities to changes in underlying vital rates.
  4. IPMpack can be used for species across a range of life cycle complexity and can include continuous and discrete (e.g. seed bank, hibernation) state variables, as well as environmental covariates of interest. Methods for diagnostics, sensitivity analyses, plotting, model comparison and many other features allow users to move from data input through analysis to inference using an array of internal functions.
  5. IPMpack fills a need for readily usable tools for constructing and analysing IPMs and is designed to facilitate their use for experts and open up their use for those researchers who have little experience in the details of population models. A standardized IPM modelling framework will also facilitate cross-study and cross-species comparative demography, encouraging the exploration of broader ecological and evolutionary questions that can be addressed by population models.


Determining whether a population is growing or declining is central to conservation biology, species' range dynamics, invasion biology and biogeography. Although population trends can be estimated from the densities of individuals, understanding the mechanisms that drive those trends requires the quantification of basic vital rates (growth, survival and fecundity). Matrix population models (Caswell 1988, 2001) provided an intuitive and powerful tool for population biologists to estimate parameters important to population persistence and dynamics by modelling commonly collected demographic data on stage and/or age transitions. These models may result in biases, however, where underlying state variables are continuous (Picard, Ouédraogo & Bar-Hen 2010; Salguero-Gómez & Plotkin 2010), such as for example, height, weight, biomass. Integral projection models (IPMs) offer tools that can incorporate stage, age and continuous states into similar analysis of population dynamics (Easterling, Ellner & Dixon 2000; Ellner & Rees 2006). Although IPMs have been used on a number of organisms and questions (e.g. Rees et al. 2004; Ozgul et al. 2010; Coulson et al. 2011; Jongejans et al. 2011; Miller et al. 2012), for many ecologists, the construction of IPMs is not as transparent as the parameterization of classic matrix models. Several publications by Ellner and co-workers (Easterling, Ellner & Dixon 2000; Ellner & Rees 2006) include appendices with matlab- and r-code, but there is still a great need for an open-access package to assist researchers, both beginning and expert modellers, in IPM construction and the generation of basic and advanced output through a standardized workflow. The need for an open-source platform for the construction, diagnostics and analysis of IPMs will be important for this approach to reach interested scientists. Further, experienced users of IPMs may wish to use an accessible package for subsets of analysis, starting diagnostics or teaching.

Here, we present IPMpack, an R package intended to fill this need and assist a broader application of IPMs to important questions in ecology and evolutionary biology. In the following sections, we detail the theory and construction of IPMs relative to IPMpack and give an example of how IPMpack can be applied to a herbaceous perennial. We conclude by describing current and future developments of IPMs to be incorporated in future versions of IPMpack.

Integral projection models

An IPM is defined by a kernel, K, which represents probability densities of growth between discrete or continuous stages conditional on survival, and the production of offspring. In the simplest case, where the population is structured by a single continuous state variable such as size, then

display math(eqn 1)

where n(y,+ 1) is the size distribution y of both established and newly recruited individuals in census time + 1, n(x, t) the distribution across size of individuals at census time t, and L and U the respective lower and upper size limits modelled in the IPM. The kernel K can be broken down into two sub-kernels, P and F, where P represents transitions attributable to survival and growth, and the F kernel describes per-capita contributions of reproductive individuals given the recruit density function at the next census. To construct K, growth, survival and fertility functions underlying the P and F kernels are obtained from statistical models of the data. The model is then implemented by applying the midpoint rule (Ellner & Rees 2006; Zuidema et al. 2010) for numerical integration to obtain a high-dimensional matrix (>100 × 100). The basic framework can be extended to include clonal reproduction, and transitions to, from and between discrete stages (Ellner & Rees 2006). Note that the details of the theory for IPM tools equivalent to those broadly in use for matrix population models (e.g. passage time) are still under development, but in practise many of the developments from matrix population models can be applied to IPMs.

Integral projection models in IPMpack

The simplest IPM described above requires statistical models of growth, survival, fecundity and offspring size distribution. The data required to parameterize IPMs therefore include the size of individuals at two censuses to estimate growth rates, a record of which of those individuals die (and conversely, survive between censuses), information on the processes that lead to reproduction (e.g. presence or number of flowers, number of seeds, eggs or offspring), generally as a function of the continuous variable of interest, as well as information on the size distribution of offspring; and these data must be supplied on IPMpack. The reliance on statistical models means that an IPM typically contains fewer parameters and requires less data than an equivalent matrix model (Ramula, Rees & Buckley 2009); however, it also means that a key element of construction of an IPM is appropriate statistical model selection and expression of conditionalities; this is further discussed below in the context of IPMpack in the section ‘Vital rate models’.

IPMpack is flexible and can incorporate a wide variety of life histories, including both continuous and discrete life stages, as well as dependence in vital rates on covariates, but the overall structure and functions used depend on the details of these life histories. It is worthwhile to consider, then, the explicit structure of the life history of focus (Caswell 2001). Drawing a life cycle (see Appendix S1) can help reveal all relevant pathways through which individuals in one stage might contribute to the number of individuals in another stage at the next time step, usually a year later. We will now describe in more detail some of the components of building IPMs in IPMpack. The rest of this section is more technical, and for those interested in reading a more accessible example, know that skipping the rest of this section will not keep you from being able to use IPMpack.

The challenge in developing a generic package for building IPMs is that a huge array of statistical models is possible for construction of the kernel, reflecting a diversity of functional forms as well as error structures and transforms of response variables. Additionally, the model defined in eqn (eqn 1) may be combined with a number of discrete stages, reflecting for example a seed bank stage in the population. To meet this challenge, IPMpack relies partly on object-oriented code. Growth, survival and fertility classes are defined within IPMpack using the S4 object-oriented language features of R. The associated objects usually contain some form of linear or generalized linear model relating transforms of size (and possibly other covariates) to the vital rate of interest. For growth and survival objects, appropriate methods are defined that implement the model by applying the mid point rule to obtain the P component of the IPM (returning a P matrix). Fertility objects may include multiple size-dependent or size-independent vital rates reflecting statistical models of, for example, reproductive probability, number of reproductive structures (e.g. flowers in plants, basidia in fungi), number of propagules within reproductive structure (e.g. seeds for plants, eggs for birds). Note that it is crucial that users appropriately set up the data to adequately reflect conditionality in the fertility kernel; for example, if there are two columns, with one reflecting the probability of flowering (0s and 1s) and the other reflecting seed output (integers), it is important that where the probability of flowering is 0, seed output is set to NA, as otherwise, meaningless 0s in the seed output column will bias the regression. A range of constants can also be incorporated into the fertility object (e.g. probability of seed establishment). The fertility object must also include at least one probability density function describing the size of offspring recruiting into the population (several are possible if many discrete states are present). From the definition of the fertility object, functions exist to implement the F component of the IPM, returning an F matrix. A key feature in appropriately defining the F kernel is appropriately conditioning reproduction on survival. In some cases, fertility may be measured pre-census, so that survival to the next census period does not need to be accounted for in evaluating reproductive output; in other cases, fertility may be measured post-census, so that survival must be considered. The function that will build the F matrix has arguments that distinguish between these two scenarios.

To obtain objects of these classes (survival, growth and fertility classes), IPMpack contains utilities that allow users to submit data structured in a particular way from which survival, growth and fertility objects will be constructed. Growth, survival and fertility objects may all reflect dependence on covariates of vital rates (reflecting, for example, spatial, temporal or spatial variance and the environment). The level or values of covariates that the IPM is desired to reflect must then be supplied to the functions that implement the P and F matrices. If discrete stages are also required, IPMpack contains a utility function that will directly construct the required object that contains a matrix of discrete transitions, as well as parameters required to define discrete to continuous transitions. These features are demonstrated in the example below with the herb Hypericum cumulicola (Quintana-Ascencio, Menges & Weekley 2003). With the P and the F matrices constructed, a number of higher-level functions are available that can run diagnostics, supply population summary statistics and explore projections of future population states in deterministic and stochastic environments (Metcalf et al. 2009). Figure 1 shows a complete workflow for demographic modelling using IPMpack.

Figure 1.

Workflow diagram for IPMpack. The core progress from data input through analyses is detailed (boxes, middle row); the lower text elaborates on key steps in using IPMpack; optional output over the course of building IPMs is included above.

Implementation of IPMpack in the case of Hypericum cumulicola

In this section, we will describe construction and analysis of an integral projection model using IPMpack. This model is build around a fire-dependent, short-lived herbaceous species endemic to open areas in xeric Florida rosemary scrub, Hypericum cumulicola (Clusiaceae). Demographic data for H. cumulicola are fairly well resolved as annual censuses of several populations have been conducted at different locations within Archbold Biological Station, Highlands County, Florida (USA) since 1994. Here, we use the 1997–1998 census data from site ‘bald 1’ reported in Quintana-Ascencio, Menges & Weekley (2003), which was at the time unaffected by fire for >21 years. This subset includes 188 individuals (data are available as part of IPMpack and can be accessed using data(hyperDataSubset). For each individual, the size (stem maximum height; continuous stage) was measured in both years of data collection, and the number of fruits was counted in 1997. H. cumulicola individuals reproduce between June and September. Seeds can enter a permanent soil seed bank (which is a discrete stage) or germinate the next spring (Quintana-Ascencio, Dolan & Menges 1998). Seed bank stasis and emergence probabilities were inferred from a combination of experiments and field measurements. More details on the biology of the species and experimental design are described in Quintana-Ascencio, Menges & Weekley (2003).

The data were formatted according to the requirements of IPMpack, illustrated below and entered as a data-frame object (a table of columns and rows). Each row in the data-frame describes one or more individuals at the start and end of a time step, the 1997–1998 annual period in this case. The column number indicates the number of individuals that are referred to in a row (e.g. number of seeds). The column stage indicates the type of stage that an individual is in at the start of the census interval (e.g. ‘dormant’ if it is in the seedbank or ‘continuous’ if the individual is established and thus stem length was measured. NA indicates that the individual did not exist at the start of that annual period, that is, it had not yet been recruited). The column size gives the value for individuals in the continuous stage class (or NA otherwise). The columns stageNext and sizeNext do the same for individuals at the end of the annual period. The column surv contains binomial data on whether an individual that was present at the start survived to the end of the period. The fec columns contain reproduction rates that have been recorded per established individual: whether or not a plant is flowering, and if so, how many fruits are produced per-capita.

  Stage stageNext surv size sizeNext fec0Flowering fec1Fruits number

A data-frame file can also include factorial or continuous covariates reflecting, for example, environmental variables, corresponding to each row. Note that the stage and stageNext columns can contain any user-specified stages (e.g. ‘1-year-old seeds’, ‘hibernating adults’) as long as at least some individuals are categorized as ‘continuous’. If the user has no continuous data, then classic matrix models are more appropriate (Caswell 2001).

The next step is to quantify relationships that surv, sizeNext and fec have with size and build survival, growth and fertility objects that reflect these (we discuss vital rate models in more detail below). The IPMpack functions makeSurvObj, makeGrowthObj and makeFecObj allow the user to specify the desired combination of covariates related to size (e.g. size, size2, size3, log(size)…) used in predicting the vital rates, and for growth, the exact definition of the response variable (sizeNext for size at the next census period, incr for increment in size occurring between two census periods as typically done in tree demography, see Zuidema et al. 2010), as well as the form of the variance (constant or size dependent, for example for situations where the variance declines with size, Metcalf et al. 2009). For fertility objects, transformations of the response variable(s) and/or particular error distributions can also be defined. For life histories that also include multiple discrete stages, makeDiscreteTrans will construct an appropriate discrete transition object with appropriately formatted data. P and F matrices with the required size range and resolution can then be constructed via the functions createIPMPmatrix and createIPMFmatrix.

For H. cumulicola, we fit several different vital rates. The best-fitting models (i.e. lowest AIC) are plotted in Fig. 2. Survival is highest in individuals of intermediate size (Fig. 2a). The size of individuals that survive until the next year is positively related to their size at the beginning, although smaller plants (e.g. seedlings) are more likely to grow than larger individuals (Fig. 2b). The probability that plants flower at the first census is strongly related to their size, with most plants taller than 20 cm flowering (Fig. 2c). Among flowering individuals, the number of fruits is exponentially related to their size (Fig. 2d). Other vital rates were not measured for every individual but pooled from field experiments (Quintana-Ascencio, Menges & Weekley 2003) and are therefore included in the IPM as constants (i.e. size-independent): the mean and variance of seedling sizes, the number of seeds per fruit and the probabilities of seeds entering the seed bank, staying there or establishing as seedlings and of seedlings surviving their first months until the 1998 census.

Figure 2.

Some of the standard IPMpack output for an analysis of the herbaceous perennial plant Hypericum cumulicola. Survival to 1998 (a), growth (b), probability of flowering, solid line (c) and per-capita fruit production, dashed line (d) as a function of size in 1997. Panels (e) and (f) represent the IPM kernel (note that survival–growth transitions are of small magnitude relative to fertility transitions and thus do not appear) and elasticity kernel, respectively. Age-specific trajectory for survival (lx) and force of mortality (qx) (g), mean (solid line) and variance (dashed line) life expectancy (h), and passage time to a size threshold of 30 cm stem length (i).

A schematic representation of our IPM for H. cumulicola, all details of the structure of this particular IPM and the IPMpack code used can be found in Appendix S1. The resulting IPM kernel for H. cumulicola can be found in Fig. 2e. Note that the kernel does not show the discrete stage (seed bank) for display reasons, but it is included in all further analyses. The remaining panels show some of the potential output of IPMpack: an elasticity kernel (Fig. 2f), age-specific survivorship (lx) and reproduction (mx) curves (Fig. 2g), size-specific mean and variance life expectancy (Fig. 2h), and passage time to a threshold size (Fig. 2i). A more detailed step-by-step instruction to build IPMs with IPMpack can be found in the package's vignette. Several diagnostic tools can help check that the final IPM is sound; see for example diagnosticsPmatrix in the code in Appendix S1 and discussed below. Whereas version 1.4 of IPMpack already have the option to include clonal propagation (de Kroon, Plaisier & van Groenendael 1987) and dependence of offspring size and maternal size, future versions will facilitate the analysis of even more complex life histories (e.g. two continuous state variables, periodic models (Caswell & Trevisan 1994), etc.), while the number of analysis tools will continue to grow (e.g. life table response experiment analyses (Caswell 2001), and advanced transient dynamics (Stott, Townley & Hodgson 2011), population viability analyses (Morris & Doak 2002), etc.).

Building vital rate models

Vital rate models underlying the IPM are obtained by regressing growth, survival and fecundity on the relevant state variable (e.g. size at time t). It is key to note that it is generally possible to construct an IPM with poorly fit vital rate models, but this IPM may be dangerously misleading. Consequently, the vital rate models should use all of the available methods and require all of the attention that any statistical model requires. As mentioned above, this attention should extend particularly to appropriate representation of key conditionalities inherent in the demography (i.e. fertility is predicated on flowering, etc.). In some cases, a vital rate model for an IPM requires more attention to fitting than one focused only on the vital rate relationships themselves, because parts of the IPM may be highly sensitive to regions of the vital rate model that are fit with few data. For example, a polynomial term in a survival model that indicates a slight decline in the probability of survival at large sizes is generally significant only if there are sufficient data at large sizes and therefore may not appear in a statistically parsimonious model. The absence of this term can lead to unrealistically high estimates of longevity, as large individuals approach unrealistically high survival rates. We therefore warn users that sound statistical practices and attention to parameter uncertainty are critical to building IPMs correctly. IPMpack has two features to aid in model selection. The first are a range of model comparison functions that can fit any number of models for growth and survival functions and plot the results on a single figure. The second is a set of functions that build a list of IPMs representing the variance–covariance of parameters indicated by the fitted models (getListRegObjects, getListRegObjectsFec, getIPMOutputDirect), thus allowing bootstrapping results of interest across uncertainty in parameters. Other functions, such as sensParams(), output sensitivities of λ to changes in coefficients in the vital rate objects, so that coefficients that the importance of model coefficients to inference can be quantified explicitly.

IPM diagnostics

IPMpack includes a function diagnosticsPmatrix that provides a series of plots indicative of whether bin choice and size range is adequate. Applying this function as a preliminary step before obtaining demographic and evolutionary output from IPMs can help identify basic problems in the creation of the IPM matrices. The output figure has two separate plots. On the first plot, the left-most panel shows the range of the data and the range of the state variable fitted in the current IPM P matrix in black. If these are mis-matched, the limits of the data used in building the P matrix can be adjusted with the minSize and maxSize arguments in createIPMPmatrix. This first panel also indicates two other IPM P matrices that are constructed with the same vital rate models and will be used for comparison: one with an extended size range (in red) and one with an extended number of bins (in blue). A common problem in constructing IPMs is the loss of parts of the continuous distribution when binning, or at the boundaries of the IPM (Williams, Miller & Ellner 2012). The result is that the sum of the columns of the matrix will not match the fitted survival. The middle panel of the output figure indicates that this discrepancy is occurring, if the black, red or blue lines do not overlay the grey line showing where x = y. To address this, createIPMPmatrix has an argument correction. The option correction = ‘constant’ ensures that the columns sum to the fitted survival by multiplying every column in the IPM by the value that will ensure this; the option correction = ‘discretizeExtremes’ adds to the smallest and largest bin of the IPM any part of the probability density functions defining the IPM that go beyond these extremes (these corrections are also available in createIPMFmatrix). The right-hand panel indicates whether extending the size range included in the IPM P matrix or increasing the number of bins (by increasing nBigMatrix and thereby having narrower bins) does not alter basic predictions. The next plot shows the discretized IPM P matrix (histograms) and the theoretical density function for the current P matrix (top row) and for the IPM P matrix with a higher number of bins (bottom row). If the theoretical density function curve is very distant from the histograms, increasing the nBigMatrix argument may correct this discrepancy.


IPMpack is a new, flexible R package designed to facilitate the implementation of IPM on a variety of demographic data. The package includes functions for optimizing the functional form of the models incorporated in an IPM, diagnostics that test the structure of the IPM, a number of plotting functions and deterministic as well as stochastic projection options. IPMpack provides scientists new to IPMs and experienced population biologists with a suite of tools that facilitate the development, diagnostics and implementation of IPMs across a broad range of research questions.


We thank Justin Lessler for guidance on S4 objects and Cory Merow for helpful discussions and comments on previous versions of IPMpack and this manuscript. P.F. Quintana-Ascencio (University of Central Florida) and E. Menges (Archbold Biological Station) kindly provided Hypericum cumulicola individual data used here to illustrate the use of the R package. We thank the Evolutionary Biodemography Laboratory and the Modeling the Evolution of Ageing Independent Group of the Max Planck Society for Demographic Research (Rostock, Germany) for supporting the working group where this package was finished.