Stage duration distributions in matrix population models

Abstract Matrix population models are a standard tool for studying stage‐structured populations, but they are not flexible in describing stage duration distributions. This study describes a method for modeling various such distributions in matrix models. The method uses a mixture of two negative binomial distributions (parametrized using a maximum likelihood method) to approximate a target (true) distribution. To examine the performance of the method, populations consisting of two life stages (juvenile and adult) were considered. The juvenile duration distribution followed a gamma distribution, lognormal distribution, or zero‐truncated (over‐dispersed) Poisson distribution, each of which represents a target distribution to be approximated by a mixture distribution. The true population growth rate based on a target distribution was obtained using an individual‐based model, and the extent to which matrix models can approximate the target dynamics was examined. The results show that the method generally works well for the examined target distributions, but is prone to biased predictions under some conditions. In addition, the method works uniformly better than an existing method whose performance was also examined for comparison. Other details regarding parameter estimation and model development are also discussed.


| 7937
OKUYAMA an egg stage, some eggs hatch (e.g., become larvae in some insects) before other eggs, even when they were laid at the same time (e.g., Fang, Okuyama, Wu, Feng, & Hsu, 2011;King, Brewer, & Martin, 1975). The distribution of stage duration is another important detail that affects population growth (de Valpine, Scranton, Knape, Ram, & Mills, 2014). However, in matrix models, it is uncommon to explicitly think of a probability distribution and instead use a method that captures some components (e.g., mean and variance) of a distribution.
One difficulty in building matrix models is that even when we know the true distribution of stage duration, incorporating the distribution precisely might not be possible. Although matrix models can describe accurate dynamics when within-stage age distributions are stable (Caswell, 2001), this assumption is not necessarily satisfied (Runge & Roff, 2000). More importantly, the effects of the distributions on population growth rate cannot be examined when specific distributions of interest cannot be expressed.
To illustrate the difficulty in modeling stage duration, a species consisting of two stages (juvenile and adult) is considered in which we are interested in modeling the distribution of juvenile duration.
A standard method for describing stage duration assumes that a juvenile either leaves the juvenile stage (i.e., becomes an adult) with probability γ or remains as a juvenile with probability 1 − γ for a given time step if it survives (Figure 1a). In other words, the stage transition (per time step) is a Bernoulli process with success probability γ, with the stage duration of a juvenile realized by the number of Bernoulli trials required to have one success, which is known as a geometric distribution. A geometric distribution might be appropriate in some cases, but is highly restrictive. For example, the expected duration of juvenile stage in the whitefly Bemisia argentifolii is similar when they are raised on eggplant (17.31 days) and on tomato (17.96 days), but the associated variances are different: 44.47 on eggplant and 77.00 on tomato (Tsai & Wang, 1996). A geometric distribution whose mean is 17.5 must have its variance as 288.75, which suggests that it is inappropriate for both cases. Furthermore, as shown in the example, distributions can have the same mean while having different variances. Geometric distributions cannot have different variances when they have the same mean. As such, cases in which the use of a geometric distribution is appropriate are limited.
Extensions of the geometric distribution are used to describe stage duration more flexibly. A natural extension is the sum of geometric distributions known as the negative binomial distribution (Caswell, 2001). A negative binomial distribution can be interpreted as the number of Bernoulli trials required to have k successes. Figure 1b shows an example with k = 3. To become an adult, a newly born juvenile must go through identical Bernoulli trials until three successes are achieved. In this example, the juvenile stage contains three stages known as pseudostages. Pseudostages are created for convenience (e.g., simply to make k > 1) and typically do not represent biologically meaningful stages such as age. Ages are implicit in the models considered in this study, but matrix models that consider both age and stage explicitly have also been developed (Caswell, 2012;Roth & Caswell, 2016, 2018. Mixtures of two negative binomial distributions have also been suggested to create even more flexible distributions (Birt et al., 2009). In the example shown in Figure 1c, there are two independent negative binomial distributions describing the duration of the juvenile stage, and each juvenile follows one of the two distributions for which the probabilities that a newborn enters the first and second negative binomial distributions are p and 1 − p, respectively.
Each negative binomial distribution is characterized by two parameters (k 1 ,γ 1 ) and (k 2 ,γ 2 ). Having five parameters (p, k 1 , γ 1 , k 2 , and γ 2 ), mixtures of negative binomial distributions are the most flexible among the distributions described here. In fact, the geometric distribution is a special case of the negative binomial distribution (i.e., k = 1) and the negative binomial distribution is a special case of the mixture distribution (i.e., p = 1 or p = 0). In this study, mixtures of two negative binomial distributions are referred to as a mixture distribution unless otherwise stated.
Although a mixture distribution is more flexible than geometric and negative binomial distributions, actual distributions of stage F I G U R E 1 Diagrams describing stage transitions given that individuals survive. Juvenile duration follows a (a) geometric distribution, (b) negative binomial distribution, and (c) mixture of two negative binomial distributions. Each arrow indicates an event, and the associated value (e.g., γ) is the probability that the event takes place given that an individual survives duration might differ significantly from it. Therefore, it is important to know how well a mixture distribution can approximate other potential duration distributions. A previous study found that a mixture distribution cannot approximate common distributions such as gamma and lognormal distributions effectively when the parameters of the mixture distribution are estimated heuristically (Lee & Okuyama, 2017). However, the fact that a mixture distribution does not perform well with heuristic parameter estimation (described in Appendix) does not necessarily indicate the failure of the mixture distribution when the parameters are estimated differently. The heuristic method is similar to the method of moments, which uses only information contained in moments (e.g., the mean and variance), and it might make little sense when the assumed distribution is known to be wrong (e.g., using a mixture distribution to approximate a gamma distribution), and when information not contained in moments influences population dynamics. Maximum-likelihood estimation accounts for other properties of distributions through a fuller utilization of data (e.g., not just the mean and variance). This study examined the performance of a mixture distribution in approximating other target distributions when model parameters were estimated using a maximumlikelihood method.

| Matrix population model
A species that experiences two life stages, such as the one shown in Figure 1, is considered. The matrix model uses a postbreeding census formulation (Case, 2000;Caswell, 2001) to keep track of the number of individuals in each stage and assumes that the duration of juvenile stage T J follows a mixture distribution. For example, Figure 1c describes the stage transitions when k 1 = 2 and k 2 = 3.
Additional parameters describing survival and reproduction must be specified to complete a full demographic model. σ J and σ A are the survival probabilities for juvenile and adult, respectively, and m is the expected number of female offspring produced by an adult female.
All parameters describe demographic processes that take place in one discrete time step (e.g., day, week, or year), and an appropriate time step should depend on organisms (Cull, 1980). The model assumes that males do not limit reproduction and keeps track only of females (Caswell, 2001).
Matrix population models can be

| Model parameters and analysis
Maximum-likelihood estimates (MLEs) of (p, k 1 , γ 1 , k 2 , and γ 2 ) were used to create matrix models. When f is the true distribution of stage durations (i.e., T J ~ f), a matrix model uses a mixture distribution to approximate f. For a given true (or target) distribution of juvenile duration f (specific distributions will be described below), the maximum-likelihood parameters of a mixture distribution were estimated from 1,000 random samples generated from f.
Once the MLEs are determined, a population matrix (e.g., Equation 2) can be fully specified with the three additional parameters σ J , σ A , and m. Specific parameter values are discussed below. In this study, A is an irreducible primitive matrix that makes the population ergodic according to the Perron-Frobenius theorem (Caswell, 2001). In other words, the population growth rate eventually converges to a fixed value regardless for any positive initial condition, and the asymptotic population growth rate (i.e., the finite rate of increase) is represented by the dominant eigenvalue of A. In one simulation run, random samples from f are used to parametrize A, and the finite rate of increase is estimated from A. Because the finite rate of increase fluctuates as a result of random sampling from f, the average from 100 simulation runs was used to represent the expected value of the finite rate of increase.

| Individual-based models (IBMs)
The matrix model described above assumes that juvenile duration follows a mixture distribution. When the true distribution, f, is not a mixture distribution, the matrix model is an approximation. Examining the performance of matrix models when f is not a mixture distribution requires knowing the true population growth under f. An IBM was created to obtain population growth for various instances of f.
Because matrix models describe simple demographic processes that (1) OKUYAMA occur in discrete time steps, corresponding IBMs can be created by simulating the demographic processes. For example, the survival of individuals is simulated as a Bernoulli trial with the survival probabilities σ J (for juveniles) and σ A (for adults). For each newly born individual, the duration of its juvenile stage is simulated from f. If the stage duration of a juvenile is x, the juvenile becomes an adult if it survives x time steps. Each adult reproduces m offspring on average, and the number of offspring was simulated from a Poisson distribution with mean m.
The finite rate of increase of a population can be estimated by simulating the IBM for many time steps. In particular, N(t + 1)/N(t) converges to the finite rate of increase, where N(t) is the number of individuals at time t (the sum of all individuals at time t). N(t + 1)/N(t) will fluctuate some even after convergence as a result of the stochastic nature of the model. The geometric mean of N(t + 1)/N(t) from the last 1,000 time steps of 2,000 total time steps was used to represent the finite rate of increase. It was confirmed that a burn-in period of 1,000 was sufficient to obtain convergence.

| Comparison of the matrix model and IBM results
Estimates which was assumed to be impossible in this study.
Negative binomial distributions can be defined in multiple manners. One form of negative binomial distribution was already described above (e.g., Figure 1b; also see Appendix). Another formulation uses two parameters μ and k, where the mean is μ and the variance is μ + μ 2 /k (Bolker, 2008). This form of negative binomial distributions is referred as over-dispersed Poisson distributions in this study to avoid confusion between the two negative binomial distributions.
For gamma and lognormal distributions, random samples were rounded up to the nearest integer (i.e. ceiling). Taking the ceiling eliminates zero and results in discrete random samples. Because matrix models are discrete time models, the realized stage durations must be discrete. As discussed above, a previous study found that mixture distributions (based on the heuristic method) fail to approximate gamma and lognormal distributions, and thus, these distributions present good tests for the current study. Furthermore, these distributions are among the most commonly used distributions for describing nonnegative random variables.
To examine the performance a mixture model in approximating OKUYAMA below, although σ J , σ A , and m influence the finite rate of increase, they do not qualitatively influence how the distribution of duration influences the finite rate of increase. Therefore, the survival and fecundity parameters are not crucial factors in this study. The source code in R used in the analysis is provided as Supporting Information.

| RE SULTS
When the distribution of juvenile duration in an IBM, f, is a mixture distribution whose parameters are determined by the heuristic method (Appendix), the IBM and matrix models result in identical population growth rates for all parameter combinations Both the heuristic and likelihood models performed generally well when f is a zero-truncated Poisson distribution (Figure 3). Among offspring that are born at the same time, those with short juvenile durations contribute to the population growth disproportionately more than those with longer juvenile durations. For example, if we compare a juvenile with a duration of 1 day and another with a duration of 5 days, the former juvenile will start producing offspring at the next breeding event, and there will already be grandchildren in the following time step even when the latter individual is still juvenile. Consequently, producing two offspring whose juvenile durations are 1 and 5 days, respectively, can result in a greater F I G U R E 4 Relationship between the distribution of juvenile duration and the finite rate of increase. The true distribution used in the IBM is a zero-truncated over-dispersed Poisson distribution with mean E(T J ) and variance Var(T J ). m = 10, σ A = 0.95, s J = 0.5

Var(T J ) E(T J )
Finite rate of increase model IBM likelihood heuristic population growth rate than producing two offspring whose juvenile durations are both 3 days, even though the average juvenile duration For a given combination of mean and variance, the gamma distribution results in a higher population growth rate than the lognormal distribution ( Figure 5). The heuristic model could not describe the difference between the two distributions because its parameters are completely determined by the mean and variance of f. In contrast, the likelihood model predicted the difference. Both models generally overestimate population growth rates, but the biases are much greater for the heuristic model than for the likelihood model. When  where g is the probability mass function (e.g., a mixture of two negative binomial distributions) that is used in a matrix model. Supposing that there are N individuals initially, and that S individuals survive to the next stage, this process can be described as where the second argument of the binomial distribution is the probability parameter that describes the probability that an individual survives the focal stage. The maximum-likelihood parameters of both g and σ can be obtained from these relationships. Ergon et al.
(2009) provide a method for estimating the latent distribution T from capture-recapture data. When a matrix model is defined based on a latent distribution (e.g., Equation 2), the estimation of the latent distribution is essential.
Relatively poor performance of the likelihood model when E (T J ) is short is a valid concern, because none of the parametric distributions considered in this study is unrealistic. Although this study combined all sexually immature stages (e.g., egg and larva stages) into one stage, stage-structured models may consider these stages explicitly (e.g., Bommarco, 2001;Lončarić & Hackenberger, 2013), making the duration of each stage short. Even when there are no distinct life stages such as larval and pupal stages, size-dependent mortality is commonly reported (e.g., Grutter et al., 2017;Remmel & Tammaru, 2009). To capture size-dependent mortality rates, a stage (e.g., juvenile stage) may be further subdivided into size classes (e.g., Crouse, Crowder, & Caswell, 1987), which also makes the duration of each class short. One way to improve approximation is to extend the mixture distribution. For example, mixtures of more than two distributions can be considered. It is useful to recognize that when there are n distinct stage durations (e.g., n = 4 when observed durations are always between 5 and 8 days), a multinomial distribution with probability parameters matching the proportions of observed durations can be considered a full model. Because a multinomial distribution with n possible outcomes can be expressed as a mixture of n constants (e.g., Figure 1c describes a binomial distribution with two possible outcomes (i.e., two or three) when γ 1 = γ 2 = 1), mixtures of sufficiently many distributions will describe any observed data accurately. One can conduct model selection against the full model to determine the complexity of the required model.
When modeling a distribution, using a commonly used model (including the mixture distribution) for convenience might not be well advised. A distribution can be customized when information regarding it is available (e.g., the multinomial distributions discussed above).
For example, there may be a minimal duration required for some stages (e.g., Dzierzbicka-Głowacka, 2004;Oyarzun & Strathmann, 2011). If a model predicts some individuals stay only 2 days in a stage although at least 3 days are needed to develop into the next stage due to some biological constraints, this inaccuracy can be a cause of important mismatch between model predictions and observations.
In a situation like this, adding a constant can assure the required duration in the stage and may describe the target distribution reasonably well. For example, even though the geometric distribution is very restrictive as discussed above, the sum of a constant and a geometric distribution is more flexible and can be expressed in matrix models. Although it is currently customary to report only the mean and variance of data, more detailed examination of stage duration would assist us in identifying important properties of duration distributions beyond the mean and variance, as well as appropriate models for approximating target distributions in matrix models.

ACK N OWLED G M ENTS
I thank Dr. Torbjørn Ergon and anonymous reviewers for insightful comments. This study was supported by the Ministry of Science and Technology of Taiwan (Grant ID: 105-2311-B-002-019-MY3).

CO N FLI C T O F I NTE R E S T
None declared.

AUTH O R CO NTR I B UTI O N
TO performed all work presented in this study.

DATA ACCE SS I B I LIT Y
The computer code in R used in the analysis is available in Supporting Information.

APPENDIX
This section describes the mixture distribution and the heuristic parameter estimation method discussed in the main text. When T is a random variable that follows a mixture distribution, where Y 1 and Y 2 are random variables that follow independent negative binomial distributions such that Y 1 ~ NegBin(k 1 ,γ 1 ) and Y 2 ~ NegBin(k 2 ,γ 2 ) in which the probability mass function of NegBin(k,γ) is that can be interpreted as the sum of k geometric distributions with parameter γ. p ∈ [0,1] is the mixture probability such that T is a single negative binomial distribution when p = 0 or p = 1. The mean and variance of the mixture distribution (Equation 5), respectively, are where E(Y i ) = k i /γ i and Var(Y i ) = k i (1γ i )/γ 2 i (i ∈ {1,2}). The heuristic method determines the parameters as follows. A mixture distribution with mean E(T) and Var(T) has where ⌊x⌋ and ⌈x⌉ are the floor and ceiling of x, respectively. The method further assumes that γ = γ 1 = γ 2 , leaving two remaining free parameters (p and γ) that can be determined by solving Equations 7 and 8 (i.e., two equations and two unknowns). When the data are available, E(T) and Var(T) are replaced by the sample mean and variance. (5)