### Introduction

- Top of page
- Summary
- Introduction
- Integral projection models
- Integral projection models in IPMpack
- Implementation of
*IPMpack* in the case of *Hypericum cumulicola* - Building vital rate models
- IPM diagnostics
- Conclusions
- Acknowledgements
- References
- Supporting Information

Determining whether a population is growing or declining is central to conservation biology, species' range dynamics, invasion biology and biogeography. Although population trends can be estimated from the densities of individuals, understanding the mechanisms that drive those trends requires the quantification of basic vital rates (growth, survival and fecundity). Matrix population models (Caswell 1988, 2001) provided an intuitive and powerful tool for population biologists to estimate parameters important to population persistence and dynamics by modelling commonly collected demographic data on stage and/or age transitions. These models may result in biases, however, where underlying state variables are continuous (Picard, Ouédraogo & Bar-Hen 2010; Salguero-Gómez & Plotkin 2010), such as for example, height, weight, biomass. Integral projection models (IPMs) offer tools that can incorporate stage, age and continuous states into similar analysis of population dynamics (Easterling, Ellner & Dixon 2000; Ellner & Rees 2006). Although IPMs have been used on a number of organisms and questions (e.g. Rees *et al*. 2004; Ozgul *et al*. 2010; Coulson *et al*. 2011; Jongejans *et al*. 2011; Miller *et al*. 2012), for many ecologists, the construction of IPMs is not as transparent as the parameterization of classic matrix models. Several publications by Ellner and co-workers (Easterling, Ellner & Dixon 2000; Ellner & Rees 2006) include appendices with matlab- and r-code, but there is still a great need for an open-access package to assist researchers, both beginning and expert modellers, in IPM construction and the generation of basic and advanced output through a standardized workflow. The need for an open-source platform for the construction, diagnostics and analysis of IPMs will be important for this approach to reach interested scientists. Further, experienced users of IPMs may wish to use an accessible package for subsets of analysis, starting diagnostics or teaching.

Here, we present *IPMpack*, an R package intended to fill this need and assist a broader application of IPMs to important questions in ecology and evolutionary biology. In the following sections, we detail the theory and construction of IPMs relative to *IPMpack* and give an example of how *IPMpack* can be applied to a herbaceous perennial. We conclude by describing current and future developments of IPMs to be incorporated in future versions of *IPMpack*.

### Integral projection models

- Top of page
- Summary
- Introduction
- Integral projection models
- Integral projection models in IPMpack
- Implementation of
*IPMpack* in the case of *Hypericum cumulicola* - Building vital rate models
- IPM diagnostics
- Conclusions
- Acknowledgements
- References
- Supporting Information

An IPM is defined by a kernel, *K*, which represents probability densities of growth between discrete or continuous stages conditional on survival, and the production of offspring. In the simplest case, where the population is structured by a single continuous state variable such as size, then

- (eqn 1)

where *n*(*y*,* t *+* *1) is the size distribution *y* of both established and newly recruited individuals in census time *t *+* *1, *n*(*x*,* t*) the distribution across size of individuals at census time *t*, and *L* and *U* the respective lower and upper size limits modelled in the IPM. The kernel *K* can be broken down into two sub-kernels, *P* and *F*, where *P* represents transitions attributable to survival and growth, and the *F* kernel describes *per-capita* contributions of reproductive individuals given the recruit density function at the next census. To construct *K*, growth, survival and fertility functions underlying the *P* and *F* kernels are obtained from statistical models of the data. The model is then implemented by applying the midpoint rule (Ellner & Rees 2006; Zuidema *et al*. 2010) for numerical integration to obtain a high-dimensional matrix (>100 × 100). The basic framework can be extended to include clonal reproduction, and transitions to, from and between discrete stages (Ellner & Rees 2006). Note that the details of the theory for IPM tools equivalent to those broadly in use for matrix population models (e.g. passage time) are still under development, but in practise many of the developments from matrix population models can be applied to IPMs.

### Integral projection models in IPMpack

- Top of page
- Summary
- Introduction
- Integral projection models
- Integral projection models in IPMpack
- Implementation of
*IPMpack* in the case of *Hypericum cumulicola* - Building vital rate models
- IPM diagnostics
- Conclusions
- Acknowledgements
- References
- Supporting Information

The simplest IPM described above requires statistical models of growth, survival, fecundity and offspring size distribution. The data required to parameterize IPMs therefore include the size of individuals at two censuses to estimate growth rates, a record of which of those individuals die (and conversely, survive between censuses), information on the processes that lead to reproduction (e.g. presence or number of flowers, number of seeds, eggs or offspring), generally as a function of the continuous variable of interest, as well as information on the size distribution of offspring; and these data must be supplied on IPMpack. The reliance on statistical models means that an IPM typically contains fewer parameters and requires less data than an equivalent matrix model (Ramula, Rees & Buckley 2009); however, it also means that a key element of construction of an IPM is appropriate statistical model selection and expression of conditionalities; this is further discussed below in the context of IPMpack in the section ‘Vital rate models’.

*IPMpack* is flexible and can incorporate a wide variety of life histories, including both continuous and discrete life stages, as well as dependence in vital rates on covariates, but the overall structure and functions used depend on the details of these life histories. It is worthwhile to consider, then, the explicit structure of the life history of focus (Caswell 2001). Drawing a life cycle (see Appendix S1) can help reveal all relevant pathways through which individuals in one stage might contribute to the number of individuals in another stage at the next time step, usually a year later. We will now describe in more detail some of the components of building IPMs in *IPMpack*. The rest of this section is more technical, and for those interested in reading a more accessible example, know that skipping the rest of this section will not keep you from being able to use *IPMpack*.

The challenge in developing a generic package for building IPMs is that a huge array of statistical models is possible for construction of the kernel, reflecting a diversity of functional forms as well as error structures and transforms of response variables. Additionally, the model defined in eqn (eqn 1) may be combined with a number of discrete stages, reflecting for example a seed bank stage in the population. To meet this challenge, *IPMpack* relies partly on object-oriented code. Growth, survival and fertility classes are defined within *IPMpack* using the S4 object-oriented language features of R. The associated objects usually contain some form of linear or generalized linear model relating transforms of size (and possibly other covariates) to the vital rate of interest. For growth and survival objects, appropriate methods are defined that implement the model by applying the mid point rule to obtain the *P* component of the IPM (returning a *P matrix*). Fertility objects may include multiple size-dependent or size-independent vital rates reflecting statistical models of, for example, reproductive probability, number of reproductive structures (e.g. flowers in plants, basidia in fungi), number of propagules within reproductive structure (e.g. seeds for plants, eggs for birds). Note that it is crucial that users appropriately set up the data to adequately reflect conditionality in the fertility kernel; for example, if there are two columns, with one reflecting the probability of flowering (0s and 1s) and the other reflecting seed output (integers), it is important that where the probability of flowering is 0, seed output is set to NA, as otherwise, meaningless 0s in the seed output column will bias the regression. A range of constants can also be incorporated into the fertility object (e.g. probability of seed establishment). The fertility object must also include at least one probability density function describing the size of offspring recruiting into the population (several are possible if many discrete states are present). From the definition of the fertility object, functions exist to implement the *F* component of the IPM, returning an *F matrix*. A key feature in appropriately defining the *F* kernel is appropriately conditioning reproduction on survival. In some cases, fertility may be measured pre-census, so that survival to the next census period does not need to be accounted for in evaluating reproductive output; in other cases, fertility may be measured post-census, so that survival must be considered. The function that will build the *F* matrix has arguments that distinguish between these two scenarios.

To obtain objects of these classes (survival, growth and fertility classes), *IPMpack* contains utilities that allow users to submit data structured in a particular way from which survival, growth and fertility objects will be constructed. Growth, survival and fertility objects may all reflect dependence on covariates of vital rates (reflecting, for example, spatial, temporal or spatial variance and the environment). The level or values of covariates that the IPM is desired to reflect must then be supplied to the functions that implement the *P* and *F* matrices. If discrete stages are also required, *IPMpack* contains a utility function that will directly construct the required object that contains a matrix of discrete transitions, as well as parameters required to define discrete to continuous transitions. These features are demonstrated in the example below with the herb *Hypericum cumulicola* (Quintana-Ascencio, Menges & Weekley 2003). With the *P* and the *F* matrices constructed, a number of higher-level functions are available that can run diagnostics, supply population summary statistics and explore projections of future population states in deterministic and stochastic environments (Metcalf *et al*. 2009). Figure 1 shows a complete workflow for demographic modelling using *IPMpack*.

### Implementation of *IPMpack* in the case of *Hypericum cumulicola*

- Top of page
- Summary
- Introduction
- Integral projection models
- Integral projection models in IPMpack
- Implementation of
*IPMpack* in the case of *Hypericum cumulicola* - Building vital rate models
- IPM diagnostics
- Conclusions
- Acknowledgements
- References
- Supporting Information

In this section, we will describe construction and analysis of an integral projection model using *IPMpack*. This model is build around a fire-dependent, short-lived herbaceous species endemic to open areas in xeric Florida rosemary scrub, *Hypericum cumulicola* (Clusiaceae). Demographic data for *H. cumulicola* are fairly well resolved as annual censuses of several populations have been conducted at different locations within Archbold Biological Station, Highlands County, Florida (USA) since 1994. Here, we use the 1997–1998 census data from site ‘bald 1’ reported in Quintana-Ascencio, Menges & Weekley (2003), which was at the time unaffected by fire for >21 years. This subset includes 188 individuals (data are available as part of *IPMpack* and can be accessed using *data(hyperDataSubset)*. For each individual, the size (stem maximum height; continuous stage) was measured in both years of data collection, and the number of fruits was counted in 1997. *H. cumulicola* individuals reproduce between June and September. Seeds can enter a permanent soil seed bank (which is a discrete stage) or germinate the next spring (Quintana-Ascencio, Dolan & Menges 1998). Seed bank stasis and emergence probabilities were inferred from a combination of experiments and field measurements. More details on the biology of the species and experimental design are described in Quintana-Ascencio, Menges & Weekley (2003).

The data were formatted according to the requirements of *IPMpack*, illustrated below and entered as a data-frame object (a table of columns and rows). Each row in the data-frame describes one or more individuals at the start and end of a time step, the 1997–1998 annual period in this case. The column *number* indicates the number of individuals that are referred to in a row (e.g. number of seeds). The column *stage* indicates the type of stage that an individual is in at the start of the census interval (e.g. ‘dormant’ if it is in the seedbank or ‘continuous’ if the individual is established and thus stem length was measured. *NA* indicates that the individual did not exist at the start of that annual period, that is, it had not yet been recruited). The column *size* gives the value for individuals in the continuous stage class (or *NA* otherwise). The columns s*tageNext* and *sizeNext* do the same for individuals at the end of the annual period. The column *surv* contains binomial data on whether an individual that was present at the start survived to the end of the period. The *fec* columns contain reproduction rates that have been recorded per established individual: whether or not a plant is flowering, and if so, how many fruits are produced *per-capita*.

| *Stage* | *stageNext* | *surv* | *size* | *sizeNext* | *fec0Flowering* | *fec1Fruits* | *number* |
---|

1 | continuous | dead | 0 | 25 | NA | 1 | 15 | 1 |

2 | continuous | continuous | 1 | 31 | 29 | 1 | 184 | 1 |

3 | continuous | dead | 0 | 5 | NA | 0 | NA | 1 |

4 | continuous | continuous | 1 | 34 | 35 | 1 | 152 | 1 |

5 | continuous | continuous | 1 | 11 | 14 | 0 | NA | 1 |

6 | continuous | dead | 0 | 16 | NA | 1 | 80 | 1 |

… |

A data-frame file can also include factorial or continuous covariates reflecting, for example, environmental variables, corresponding to each row. Note that the *stage* and *stageNext* columns can contain any user-specified stages (e.g. ‘1-year-old seeds’, ‘hibernating adults’) as long as at least some individuals are categorized as ‘continuous’. If the user has no continuous data, then classic matrix models are more appropriate (Caswell 2001).

The next step is to quantify relationships that *surv*,* sizeNext* and *fec* have with *size* and build survival, growth and fertility objects that reflect these (we discuss vital rate models in more detail below). The *IPMpack* functions *makeSurvObj*,* makeGrowthObj* and *makeFecObj* allow the user to specify the desired combination of covariates related to size (e.g. size, size^{2}, size^{3}, log(size)…) used in predicting the vital rates, and for growth, the exact definition of the response variable (*sizeNext* for size at the next census period, *incr* for increment in size occurring between two census periods as typically done in tree demography, see Zuidema *et al*. 2010), as well as the form of the variance (constant or size dependent, for example for situations where the variance declines with size, Metcalf *et al*. 2009). For fertility objects, transformations of the response variable(s) and/or particular error distributions can also be defined. For life histories that also include multiple discrete stages, *makeDiscreteTrans* will construct an appropriate discrete transition object with appropriately formatted data. *P* and *F* matrices with the required size range and resolution can then be constructed via the functions *createIPMPmatrix* and *createIPMFmatrix*.

For *H. cumulicola*, we fit several different vital rates. The best-fitting models (i.e. lowest AIC) are plotted in Fig. 2. Survival is highest in individuals of intermediate size (Fig. 2a). The size of individuals that survive until the next year is positively related to their size at the beginning, although smaller plants (e.g. seedlings) are more likely to grow than larger individuals (Fig. 2b). The probability that plants flower at the first census is strongly related to their size, with most plants taller than 20 cm flowering (Fig. 2c). Among flowering individuals, the number of fruits is exponentially related to their size (Fig. 2d). Other vital rates were not measured for every individual but pooled from field experiments (Quintana-Ascencio, Menges & Weekley 2003) and are therefore included in the IPM as constants (i.e. size-independent): the mean and variance of seedling sizes, the number of seeds per fruit and the probabilities of seeds entering the seed bank, staying there or establishing as seedlings and of seedlings surviving their first months until the 1998 census.

A schematic representation of our IPM for *H. cumulicola*, all details of the structure of this particular IPM and the *IPMpack* code used can be found in Appendix S1. The resulting IPM kernel for *H. cumulicola* can be found in Fig. 2e. Note that the kernel does not show the discrete stage (seed bank) for display reasons, but it is included in all further analyses. The remaining panels show some of the potential output of *IPMpack*: an elasticity kernel (Fig. 2f), age-specific survivorship (*l*_{x}) and reproduction (*m*_{x}) curves (Fig. 2g), size-specific mean and variance life expectancy (Fig. 2h), and passage time to a threshold size (Fig. 2i). A more detailed step-by-step instruction to build IPMs with *IPMpack* can be found in the package's vignette. Several diagnostic tools can help check that the final IPM is sound; see for example diagnosticsPmatrix in the code in Appendix S1 and discussed below. Whereas version 1.4 of *IPMpack* already have the option to include clonal propagation (de Kroon, Plaisier & van Groenendael 1987) and dependence of offspring size and maternal size, future versions will facilitate the analysis of even more complex life histories (e.g. two continuous state variables, periodic models (Caswell & Trevisan 1994), etc.), while the number of analysis tools will continue to grow (e.g. life table response experiment analyses (Caswell 2001), and advanced transient dynamics (Stott, Townley & Hodgson 2011), population viability analyses (Morris & Doak 2002), etc.).

### Building vital rate models

- Top of page
- Summary
- Introduction
- Integral projection models
- Integral projection models in IPMpack
- Implementation of
*IPMpack* in the case of *Hypericum cumulicola* - Building vital rate models
- IPM diagnostics
- Conclusions
- Acknowledgements
- References
- Supporting Information

Vital rate models underlying the IPM are obtained by regressing growth, survival and fecundity on the relevant state variable (e.g. size at time *t*). It is key to note that it is generally possible to construct an IPM with poorly fit vital rate models, but this IPM may be dangerously misleading. Consequently, the vital rate models should use all of the available methods and require all of the attention that any statistical model requires. As mentioned above, this attention should extend particularly to appropriate representation of key conditionalities inherent in the demography (i.e. fertility is predicated on flowering, etc.). In some cases, a vital rate model for an IPM requires more attention to fitting than one focused only on the vital rate relationships themselves, because parts of the IPM may be highly sensitive to regions of the vital rate model that are fit with few data. For example, a polynomial term in a survival model that indicates a slight decline in the probability of survival at large sizes is generally significant only if there are sufficient data at large sizes and therefore may not appear in a statistically parsimonious model. The absence of this term can lead to unrealistically high estimates of longevity, as large individuals approach unrealistically high survival rates. We therefore warn users that sound statistical practices and attention to parameter uncertainty are critical to building IPMs correctly. *IPMpack* has two features to aid in model selection. The first are a range of model comparison functions that can fit any number of models for growth and survival functions and plot the results on a single figure. The second is a set of functions that build a list of IPMs representing the variance–covariance of parameters indicated by the fitted models (*getListRegObjects, getListRegObjectsFec, getIPMOutputDirect)*, thus allowing bootstrapping results of interest across uncertainty in parameters. Other functions, such as *sensParams(),* output sensitivities of *λ* to changes in coefficients in the vital rate objects, so that coefficients that the importance of model coefficients to inference can be quantified explicitly.

### IPM diagnostics

- Top of page
- Summary
- Introduction
- Integral projection models
- Integral projection models in IPMpack
- Implementation of
*IPMpack* in the case of *Hypericum cumulicola* - Building vital rate models
- IPM diagnostics
- Conclusions
- Acknowledgements
- References
- Supporting Information

*IPMpack* includes a function *diagnosticsPmatrix* that provides a series of plots indicative of whether bin choice and size range is adequate. Applying this function as a preliminary step before obtaining demographic and evolutionary output from IPMs can help identify basic problems in the creation of the IPM matrices. The output figure has two separate plots. On the first plot, the left-most panel shows the range of the data and the range of the state variable fitted in the current IPM P matrix in black. If these are mis-matched, the limits of the data used in building the P matrix can be adjusted with the *minSize* and *maxSize* arguments in *createIPMPmatrix*. This first panel also indicates two other IPM P matrices that are constructed with the same vital rate models and will be used for comparison: one with an extended size range (in red) and one with an extended number of bins (in blue). A common problem in constructing IPMs is the loss of parts of the continuous distribution when binning, or at the boundaries of the IPM (Williams, Miller & Ellner 2012). The result is that the sum of the columns of the matrix will not match the fitted survival. The middle panel of the output figure indicates that this discrepancy is occurring, if the black, red or blue lines do not overlay the grey line showing where x = y. To address this, *createIPMPmatrix* has an argument *correction*. The option *correction* = *‘constant’* ensures that the columns sum to the fitted survival by multiplying every column in the IPM by the value that will ensure this; the option *correction* = *‘discretizeExtremes’* adds to the smallest and largest bin of the IPM any part of the probability density functions defining the IPM that go beyond these extremes (these corrections are also available in *createIPMFmatrix*). The right-hand panel indicates whether extending the size range included in the IPM P matrix or increasing the number of bins (by increasing *nBigMatrix* and thereby having narrower bins) does not alter basic predictions. The next plot shows the discretized IPM P matrix (histograms) and the theoretical density function for the current P matrix (top row) and for the IPM P matrix with a higher number of bins (bottom row). If the theoretical density function curve is very distant from the histograms, increasing the *nBigMatrix* argument may correct this discrepancy.