Population dynamics of an Arctiid caterpillar–tachinid parasitoid system using state-space models


Correspondence author. E-mail:rkarban@ucdavis.edu


1. Population dynamics of insect host–parasitoid systems are important in many natural and managed ecosystems and have inspired much ecological theory. However, ecologists have a limited knowledge about the relative strengths of species interactions, abiotic effects and density dependence in natural host–parasitoid dynamics. Statistical time-series analyses would be more informative by incorporating multiple factors, measurement error and noisy dynamics.

2. We use a novel maximum likelihood and model-selection analysis of a state-space model for host–parasitoid dynamics to examine 21 years of annual census data for woolly bear caterpillars (Platyprepia virginalis) and their locally host-specific tachinid parasitoids (Thelaira americana).

3. Caterpillar densities varied by three orders of magnitude and were driven by density dependence and precipitation from the previous March but not detectably by parasitoids, despite variable and sometimes high (>50%) parasitism.

4. Fly fluctuations, as estimated from per cent parasitism, were affected by density dependence and precipitation from the previous July. There was marginal evidence that host abundance drives fly fluctuations as a generic linear effect but no evidence for classical Nicholson–Bailey coupling.

5. The state-space model analysis includes new methods for likelihood calculation and allows a balanced consideration of effect magnitude and statistical significance in a nonlinear model with multiple alternative explanatory variables.


Populations of herbivores are well known to fluctuate through time; in some years they flourish, whereas in other years they seem absent from the landscape. Parasitoids are clearly important sources of mortality for many insect herbivores (Southwood 1975; Cornell & Hawkins 1995), and the potentially tight coupling between herbivore hosts and parasitoid reproduction have led host–parasitoid systems to be a model for mathematical theory and the integration of theory and data in population dynamics (Murdoch 1994; Hassell 2000; Murdoch, Briggs & Nisbet 2003). However, despite an abundance of studies showing strong effects of parasitoids on insect hosts, we have little sense of the relative importance of parasitism compared with other ecological factors that also influence host populations. Factors may exert a strong influence on a population and yet may not be regulating if they fail to respond in a density-dependent manner (Murdoch 1994; Walker & Jones 2001). Long-term, field studies of both host and parasite abundances are required that estimate and test population models, such as those of several well-studied forest insects (Turchin et al. 2003; Kendall et al. 2005). In this study, we use a 21-year host–parasitoid time series to ask how strongly the host–parasitoid interaction, along with impacts of weather and intrinsic density dependence, appears to affect the dynamics of each. To accomplish this, we illustrate and give new tools for the state-space or hierarchical modelling framework for the analysis of population time series, which incorporates variability in both sampling and dynamics. These tools allow classical model selection and hypothesis testing rather than Bayesian results that are often given for hierarchical models. This study is organized so readers can focus on either the ecological or methodological aspect.

Ecological introduction

Ecologists have long debated the prevalence of consumer-control vs. donor-control, one aspect of top-down vs. bottom-up effects on community organization. Some have argued that herbivore numbers are controlled by their predators and parasites (Hairston, Smith & Slobodkin 1960). Analyses of life tables have supported this view (Southwood 1975; Cornell & Hawkins 1995), but these are limited to causes of mortality such as predation, disease and starvation while deemphasizing factors that affect birth rates such as quality of food and oviposition sites (Price et al. 1990), and they have involved problematic statistical analyses (Royama 1996). Time-series analyses combined with other approaches strongly suggest cases of herbivore populations that are controlled by higher trophic levels (Berryman 2002). However, these models are correlational and have not adequately incorporated realistic measures of plant quality and defence (Haukioja 2005).

Other workers have been impressed that herbivores are controlled by the resources available to them and that predator and parasite numbers follow those of their prey (Elton 1927; Lindeman 1942). Hawkins (1992) gives several arguments that hosts may drive parasitoid dynamics (control from below) rather than the other way around. Different workers can reach different conclusions even when examining the best studied systems, such as lynx and hare cycles in boreal Canada, depending upon whether they choose to focus on food quality for the herbivores (Keith 1983) or on predation and food quantity (Boutin et al. 2002). Despite many years of work and hundreds of studies, we still lack a consensus about when predators primarily control their insect prey and when availability of herbivorous prey primarily controls populations of their predators and parasitoids. This becomes a difficult empirical problem because the importance of species interactions compared with other abiotic and biotic effects varies from year to year.

Many previous workers have considered the importance of single factors that affect insect populations [e.g. weather (Andrewartha & Birch 1954), interspecific competition (Lawton & Strong 1981)]. More recent studies have considered multiple factors and alternative hypotheses (e.g. see Reynolds et al. 2007 for a study that tests several different climatic variables on caterpillar numbers). Studies that have compared long time-series of univoltine host–parasitoid systems to population models representing hypotheses of several different effects, such as abiotic factors (weather), density dependence and species interactions, have yielded important insights (e.g. Turchin 2003; Turchin et al. 2003; Bonsall et al. 2004; Kendall et al. 2005; Munster-Swendsen & Berryman 2005), but there are relatively few such studies. Our study considers the relative importance of several factors in the long-term dynamics of a natural univoltine host–parasitoid interaction; it is the first to use a state-space framework to incorporate both measurement error and stochastic dynamics for univoltine host–parasitoid models (see Gross, Ives & Norpheim 2005, for a multivoltine host–parasitoid state-space model).

In our system, both herbivore density and parasitism levels fluctuate greatly (over three orders of magnitude for the herbivore, with parasitism from nearly 0% to 70%), giving an impression that their dynamics must be connected. Despite the appearance of coupling, other factors could drive dynamics of either or both species. Rather than supporting or rejecting the existence of a role for parasitism, we assess its relative importance. We ask how strong a role does the host–parasitoid interaction play in the dynamics of each species in comparison with other factors, known (weather and density) or unknown.

Statistical introduction

Time-series analysis can evaluate the viability of hypotheses that predators drive prey populations and/or prey drive predator populations, but a major limitation of most time-series analyses has been the difficulty of incorporating multiple sources of variation. Estimates of predators (parasitoids) and prey (hosts) have uncertainty due to sampling variability, and estimates of population parameters must allow for variability in dynamics, such as from environmental stochasticity. Recently, state-space models have been developed in Bayesian and maximum likelihood frameworks to estimate models of noisy ecological situations (de Valpine & Hastings 2002; Calder et al. 2003; de Valpine 2003; Clark et al. 2005). The cited papers and a large statistical literature provide a strong case based on simulations and theory that the state-space approach is a solid foundation for incorporating multiple sources of variation, but the approach is still maturing and has not been applied to an univoltine host–parasitoid system.

Development of state-space models has moved contemporaneously with a broadening of views about statistical evidence in ecology. Earlier workers used inferential statistics to reject null hypotheses (Popper 1959; Platt 1964). More recently, ecologists have recognized that many hypotheses about populations and communities cannot be rejected in a meaningful way (Quinn & Dunham 1983; Gotelli & Ellison 2004). Regardless of whether we reject null hypotheses that weather, density dependence and parasitism do not affect herbivore populations, we all agree that all of these factors play some role, whether strong or weak. Ecologists recognize that many factors must be operating simultaneously to affect insect populations (Karban 1989; Hunter & Price 1992; Walker & Jones 2001), so estimates of the magnitude of each factor can be more informative than P-values. Only recently have methods become available to fit realistic models to data incorporating variation in both sampling and population dynamics.

To a large extent, such philosophical considerations are tied to practical issues: a Bayesian state-space model implementation yields lots of information but not maximum likelihoods or frequentist P-values, whereas the maximum likelihood estimation to approximate likelihood ratio P-values or AIC (Akaike Information Criteria) model comparisons requires different computational methods than a Bayesian posterior analysis. Although Bayesian and frequentist approaches have philosophical differences, they can nevertheless be interpreted together (Efron 2005; de Valpine 2009). We present frequentist results using maximum likelihood estimation, model selection using AIC, likelihood ratio hypothesis tests and likelihood profiles for confidence intervals. Maximum likelihood estimation of a hierarchical model can just as well be viewed as an ‘empirical Bayes’ approach. Accomplishing this for a nonlinear and/or non-Gaussian state-space model involves new and recently developed analysis steps for calculating likelihood values and using one-step-ahead predictions to give a simple summary of model fit (see below), which have often been omitted from state-space model results. Our analysis attempts to balance considerations of effect size, statistical significance, practicality and multiple testing (e.g. Ellison 2004; Stephens, Buskirk & del Rio 2007), neither throwing out nor overemphasizing one or another type of statement about models and data.

Study system

We have examined the population dynamics of Platyprepia virginalis caterpillars (woolly bears; Lepidoptera: Arctiidae) in 21 annual censuses from two habitats at the Bodega Marine Reserve in northern California. Platyprepia virginalis is a native univoltine moth found in isolated, locally dense populations throughout western North America. Eggs are placed in early summer on vegetation or on the ground, hatch in a few weeks and early instar caterpillars spend the late summer, autumn and early winter in the leaf litter (English-Loeb, Brody & Karban 1993). Early instars at our study site feed on decaying lupine leaves and flowers in the litter and occasionally eat live green leaves low in the vegetation. Late instar caterpillars feed on living vegetation in late winter and spring and become extremely polyphagous; individuals feed on multiple species during the course of a few hours. Last instars are highly mobile and readily move between host plants, feeding on those that they prefer (English-Loeb et al. 1993). At our study site, late instars prefer lupine (Lupinus arboreus), poison hemlock (Conium maculatum), thistles (Carduus, Cirsium, Sonchus 6 spp.) and fiddleneck (Amsinckia menziesii). In a survey of numerous P. virginalis populations, scattered widely across California and Nevada, we observed no host plant species that was found at all of the sites, although some lupine species was common at most sites suggesting that it is an important host (Karban & English-Loeb 1999). We have observed populations of P. virginalis at sites that vary considerably in winter temperature, making it unlikely that winter temperature plays a strong role in limiting population size at our study site. All of the sites where we have found caterpillars are relatively wet, suggesting that precipitation could influence annual variation in caterpillar abundance.

Lupine is the dominant woody vegetation in the grassland and native dunes at our study site (Barbour et al. 1973). Individual lupine bushes are short-lived and lupine cover fluctuates markedly among years (Davidson & Barbour 1977; Strong et al. 1995). Even in years when lupine is least abundant, there is never an absolute shortage of food and P. virginalis caterpillars do not defoliate existing lupine bushes. It is possible that less mobile early instars can experience food limitation in areas where lupine bushes die or if food quality rather than absolute shortage becomes limiting (Murdoch 1966). Common garden experiments suggest that lupine quality can affect densities of P. virginalis (Karban & Kittelson 1999). In addition, the local pattern of lupine abundance (particularly L. arboreus) matches the local distribution of caterpillars. Those areas of the reserve that have the most consistent cover of lupine stands estimated from aerial photographs since 1965 (Strong et al. 1995) also support the highest densities of caterpillars.

Platyprepia virginalis caterpillars are attacked by several species of parasitoids although only a tachinid fly (Thelaira americana) becomes abundant at the study site (Karban & English-Loeb 1999). Unlike most parasitoids, T. americana is only fatal about 50% of the time, although non-fatal attacks delay caterpillar development and reduce pupal weight (English-Loeb, Karban & Brody 1990; English-Loeb et al. 1993; Karban & English-Loeb 1997). This interaction contrasts with most host–parasitoid mathematical models, which assume parasitoids always kill hosts. Thelaira americana parasitoids are reported to use a range of hosts and to occur throughout North America, although the taxonomy of these flies is ambiguous (Arnaud 1978; O’Hara & Wood 2004). Considerable searching and rearing at the study site has failed to reveal an alternative host species locally. This parasitoid is a very strong flier, so the possibility remains that local populations could be affected by the dispersal of individuals developing in unknown alternate hosts at other locations.

Adult flies scatter young larvae (or eggs that quickly hatch) on leaves or on the soil surface apart from the host (Askew 1971). Adult flies are present at the study site from spring through late summer. Female flies of related species oviposit or larviposit on leaves on which the host has fed, or in response to odour of the host’s frass (Askew 1971). Young larvae must survive during the dry summer within a shield of sclerotized plates. Caterpillars do not ingest fly larvae; rather the first instar flies must rely on a host passing close to them. This is likely to be a stage of extremely high mortality for fly larvae. Young fly larvae enter the bodies of both suitable and unsuitable host species. They feed on haemolymph and fat inside their host’s body.

Materials and methods

Censuses of caterpillars and parasitoids

We estimated the densities of caterpillars from two habitats using two different census methods during the last week in March in each year from 1987 to 2007. We selected 10 large bushes of Lupinus arboreus on the foredune east of the boat house at the Bodega Marine Reserve. As bushes are short-lived (Davidson & Barbour 1977), the precise location of the plots shifted every year. For each bush, the number of caterpillars was visually estimated and the area occupied by its canopy was approximated by measuring the largest perpendicular radii. We also estimated the abundance of caterpillars found in five transects (10 m × 4 m, 20 m × 4 m in 1993) through habitat of mixed vegetation that was dominated by poison hemlock (Conium maculatum) approximately 150–300 m north of the lupine bushes. Conium maculatum is a host plant that is not used by most early instars but becomes a preferred species later in development, particularly if caterpillars are parasitized (Karban & English-Loeb 1997). As explained below, the counts and areas were used directly in the analysis (rather than converted to density per m2), with negative binomial count distributions related to the area of each sample and a yearly random effect allowing variation in the ratio of lupine : poison hemlock caterpillar density.

We estimated the percentage of caterpillars that were parasitized by dissecting caterpillars and scoring whether they contained fly larvae or not. We collected and dissected a mode of 30 caterpillars from both of the census sites during the last week in March of each year from 1985 to 2006. The number dissected and number parasitized were used directly in the analysis via a binomial model, so differences in the accuracy of estimates in different years are reflected in the model estimation.

We recorded monthly precipitation using a rain gauge at the study site (US Weather Bureau type manual rain gauge prior to 1992 and an optical rain gauge ORG-815, Optical Scientific, Gaithersburg, MD since 1992 with a Hydrological Services TB4 tipping bucket, Campbell Scientific, Ogden, UT since 2003). These estimates of precipitation were included in our models of caterpillar and fly dynamics as described below.

Modelling and estimation: state-space models to incorporate measurement error

We used a state-space model to fit caterpillar and parasitoid data that incorporates stochastic population dynamics and measurement variability of both caterpillar counts and percent parasitism. We employed two separate analyses: one to ask whether parasitism drives temporal population dynamics of caterpillars, and the other to ask whether the abundance of caterpillar hosts drives temporal population dynamics of flies. In the state-space framework, we combined a probability model for the true (unknown) dynamics of the population with models for the probability of observed data given any true population levels. The state-space method considers all possible true population trajectories from which the data might have been measured, weighted by the variations between the data and each population trajectory and the environmental stochasticity required to generate each trajectory. In each of the separate analyses, we included the measurement variability of caterpillar counts and percent parasitism, but in the caterpillar model we did not model fly dynamics, and vice versa. In our framework, it would be feasible to combine dynamics of both species into one analysis, but we kept them separate to simplify the interpretation of results. Twenty-one years is long for an ecological data set but short for statistical estimation, so we keep our suite of models accordingly limited.

Although their name has a mathematical ring that may seem confusing at first, state-space models are simply the logical extension of a statistical likelihood approach to incorporate measurement error in a time-series model. Extensive simulation and theoretical work has demonstrated that they are a better choice than ignoring measurement error. Without a state-space model, the unknown impact of measurement error would top our list of doubts about any results. Thus, the promise of state-space models is not to give higher predictive power or otherwise cleanse noise from data, but rather to give more believable conclusions in the face of noisy data by modelling the noise more explicitly. Because we use likelihood calculations for the state-space models, use of AIC for model comparisons is justified.

Results are presented with a balance of considerations of effect size, statistical significance and predictability. One criticism of hypothesis testing in ecology has been that null hypotheses are sometimes biologically uninteresting (Quinn & Dunham 1983; Yoccoz 1991; Gotelli & Ellison 2004). Here the null hypotheses that parasitism does not affect caterpillar dynamics and vice versa are on the face false, but nevertheless viewing them as hypotheses allows consideration of whether these effects are statistically lost amid noisy population dynamics. Estimates of statistical significance are complemented with more meaningful estimated effect sizes and confidence intervals, and overall predictability of dynamics is assessed by correlations of one-step-ahead predicted vs. observed population values. This is one of the first ecological, time-series analyses to use maximum likelihood estimation of nonlinear state-space models (de Valpine & Hilborn 2005; de Valpine & Rosenheim 2008), and the first to present AIC model comparisons and likelihood ratio hypothesis tests for such models. Although we use model selection and hypothesis testing in the same study, we do not attempt to test the significance of models in the AIC ranking (Burnham and Anderson 2002); rather, it is necessary to conduct the planned hypothesis tests in some model framework, and the best AIC model provides a sensible choice.

Caterpillar population model

For the caterpillar models, we considered intrinsic density dependence (i.e. not delayed density dependence due to the host–parasitoid interaction), precipitation and parasitism as factors that could affect population dynamics. Density dependence was considered as either a Ricker or Gompertz model, with dependence on either pre- or post-parasitism caterpillar population density, giving four possible models. Density dependence is a negative exponential function of population size in the Ricker model and of log population size in the Gompertz model, so that the Ricker model can more easily describe over-compensating and unstable dynamics than can the Gompertz model. Precipitation was included as monthly precipitation for each of the 18 months (one at a time) up to and including the March of each survey. Precipitation data were always centred around zero by subtracting the mean from all values. Parasitism was modelled as <100% fatal to caterpillars, which is appropriate for this system but differs from most theoretical models of host–parasitoid dynamics. The statistical significance of parasitism was evaluated by a likelihood ratio test comparing models with = 0 (equivalent to omitting parasitism from the model) to models with the maximum likelihood estimate of m.

The state variables for the model are the density of woolly bear caterpillars per square meter of lupine, Wt, and the per cent parasitism, PPt, both at time t. The log density of caterpillars is defined as wt = log(Wt). These represent the true, unknown state of the populations. The population dynamics parameters are as follows: r is the intrinsic population growth rate; m is the fraction of parasitized caterpillars that die due to parasitism; βW is a density-dependence coefficient; and βP is a precipitation coefficient (subscripts w and p stand for woolly bears and precipitation). The coefficients may take different roles in different models (e.g. βW denotes the density-dependent coefficient for either a Ricker or Gompertz model). The precipitation data (considered separately for each month) for year t are PRECIPt. Environmental stochasticity at time t is εt, which is normally distributed with mean 0 and variance σ2.

The model of caterpillar dynamics on a log scale is:


The function g(wt−1, PPt−1, m, βW) represents density dependence, for which the four models were: Ricker acting on post-parasitism caterpillars, with g(wt−1, PPt−1, m, βW) = βW exp[wt−1] (1 − m PPt) (the best model); Ricker acting on pre-parasitism caterpillars, with g(wt−1, PPt−1, m, βW) = g(wt−1, βW) = βW exp[wt−1]; Gompertz acting on post-parasitism caterpillars, with g(wt−1, PPt−1, m, βW) = βWwt−1 (1 − m PPt); and Gompertz acting on pre-parasitism caterpillars, with g(wt−1, PPt−1, m, βW) = g(wt−1, βW) = βWwt−1. E[wt] is the expected value of wt, so the first equation gives the model prediction for time t, and the second equation states how environmental stochasticity enters the dynamics. Under the hypothesis that parasitoids do not drive caterpillar fluctuations, the model with = 0 (and average mortality due to parasitoids interpreted as part of r) would fit the data nearly as well as a model that uses actual parasitism levels via m ≥ 0.

Fly population model

For the fly dynamics, we considered density dependence, precipitation and caterpillar populations, with either a tightly coupled (Nicholson–Bailey) host–parasitoid model or a generic linear formulation of the role of caterpillar densities. Before considering a model for fly dynamics, we needed to relate per cent parasitism to fly population density. Following the classic logic of the Nicholson–Bailey model, the relationship between per cent parasitism and fly density is: PP = 1 − exp[−a × fly density], where a is an attack rate parameter. This model assumes random mixing between parasitoids and hosts, i.e. a linear functional response. As flies distribute eggs or larvae to be encountered by caterpillars, the random mixing model is reasonable. A difficulty is that we had no absolute scale for fly densities based on per cent parasitism, so we defined our fly units to be ‘number of potential parasitism events’. This corresponds to modelling fly density in units of ‘a × fly density’ and recognizing that other parameters are implicitly scaled accordingly. This measure of fly density at time t is denoted Ft, and PPt = 1 − exp[−Ft]. It is conceivable that attack rate would have year-to-year variation, but we do not incorporate that into the models here.

For the models of fly dynamics, the state variables are Ft and Wt, with log fly density defined as ft = log(Ft) (Given any value of Ft, the resulting value of PPt enters the likelihood equation above for the parasitism dissection data.) Re-using the notation from the caterpillar model (with new definitions here), the parameters of the fly model are as follows: r is the instrinsic population growth rate; βF is a density-dependence coefficient; βP is a precipitation coefficient; and βW is a coefficient for the effect of caterpillar density; and qt is the ratio of caterpillar density in lupine: poison hemlock habitats in year t (see Measurement Models). Precipitation data and environmental stochasticity are defined in the same way as for the caterpillar model.

The fly population model on a log scale is:


In this model, the function g() defines fly density dependence and the function h() defines the role (if any) of caterpillar density. Ricker density dependence is given by g(ft−1, βF) = βF exp[ft], and Gompertz density dependence is given by g(ft−1, βF) = βFft. Nicholson–Bailey dynamics are given by h(ft−1, qt−1, wt−1, βW) = h(ft−1, wt−1) = wt−1 + log(1 − exp[−exp[ft−1]]). Note that the Nicholson–Bailey forces a term into the model with no parameter to control its strength. The generic linear model includes a term for fly reproduction and an effect of caterpillars, either via their density in lupine or poison hemlock habitat. The generic model with caterpillar density in lupine is h(ft−1, qt−1, wt−1, βW) = ft−1 + βWexp[wt−1], and in poison hemlock is h(ft−1, qt−1, wt−1, βW) = ft−1 + βWqtexp[wt−1].

For both caterpillar and fly models, we judged that our model sets cover an ecologically reasonable range of considerations, and additional attempts to ‘tweak’ models would represent over-fitting.

Measurement models

We model the number of caterpillars in an area A as following a negative binomial distribution, with different dispersion parameters for lupine and poison hemlock habitats, defined as sl and sh respectively. Choice of negative binomial was made based on the preliminary results that Poisson models with maximum likelihood estimation of density in each year had significantly inadequate goodness-of-fit (P < 0·0001 for each habitat), based on a parametric bootstrap using likelihood as a goodness-of-fit measure, for each habitat. Bootstrapped goodness-of-fit was acceptable for each habitat with negative binomial models (P = 0·10 for lupine; P = 0·08 for poison hemlock). The negative binomial model also gave large AIC improvements over Poisson for each habitat (ΔAIC = 29·1 for lupine; ΔAIC = 25·6 for poison hemlock).

We also needed to combine information from the two habitats. The simplest approach would be to assume a constant ratio of mean caterpillar density (per m2) in lupine : poison hemlock habitat, but the bootstrapped goodness-of-fit for this model was weak (P = 0·02), suggesting that the lupine : poison hemlock ratio was not constant across years. Therefore, we incorporated a random year effect in the ratio on a log scale. The ratio for year t is qt, and log(qt) follows a normal distribution with mean μq and standard deviation σq. The variance of the random effect was significantly greater than zero (χ2 = 6·3, P = 0·006, based on 0·5inline image + 0·5inline image mixture for a testing a variance term; Pinheiro & Bates 2004, p. 86), although the model still yielded a weak bootstrapped goodness-of-fit (P = 0·04).

Using a negative binomial and random year effects represent the two major standard considerations for modelling over-dispersion – quasi-Poisson estimation is not an option as we need proper probability distributions for the methods below, as would a Bayesian analysis – so we proceeded to analyse dynamics with this observation model. All analyses below were also run with a Poisson observation model without year effects and yielded similar biological conclusions, suggesting low sensitivity to the observation model. Including random year effects adds to the computational complexity of the model because it requires an additional unobserved state variable (log lupine : poison hemlock ratio) for each year.

Finally, we modelled the number of parasitized caterpillars from a dissected sample as binomially distributed.

Computational steps

For each model, we estimated maximum likelihood parameters using the Monte Carlo Kernel Likelihood method (de Valpine 2003, 2004). This method uses a Bayesian posterior sample from a Markov chain Monte Carlo algorithm as an intermediate step. After estimating parameters, the actual likelihood values are needed for likelihood ratio and AIC calculations, and obtaining them efficiently is not straightforward. This was accomplished using the method of Mira & Nicholls (2004, based on Chib & Jeliazhov 2001; see de Valpine 2008). After estimation of all parameters, these steps were repeated as each parameter was varied to obtain profile likelihood confidence intervals. Finally, the one-step-ahead prediction for time t was defined as the mean predicted state at time t given the data prior to t, using the maximum likelihood parameters. For caterpillars, the observed density was the average lupine density given both lupine and poison hemlock data for the maximum likelihood parameters of the observation model. This observed density was compared with that predicted by the model. Further details of model formulation and numerical methods are given in Appendices S1 and S2, respectively. Analysis uses implemented in R (R Core Development Team 2009) and C++.


Caterpillar dynamics

Caterpillar densities varied by three orders of magnitude during the 21 years that we collected data (Fig. 1). Caterpillar densities on lupine (their primary host at this location) were highly correlated with populations in the same year found in the mixed habitat dominated by poison hemlock (R2 = 0·74, P < 0·00l, log scale).

Figure 1.

 Caterpillar densities (top) and per cent parasitism (bottom) for 21 years. Caterpillar densities represent the mean conditional density in each year using the maximum likelihood negative binomial parameters and mean and variance for random year effects in the lupine : poison hemlock ratio. Circles: data; triangles: one-step-ahead predictions from the best models. For illustration, one-step-ahead predictions are shown starting from each data point and giving a prediction at the next time. However, predictions from the state-space model incorporate all previous data and do not assume that the previous data points are exact estimates of the populations.

The best model of caterpillar dynamics included Ricker density dependence, precipitation from the previous March and no important role of fly parasitism (Table 1). The confidence interval for βW strongly supported density-dependent population regulation under either a Ricker or Gompertz model (Table 2). Our goal here was not to test this hypothesis against a null hypothesis of a random walk, and standard likelihood ratio chi-squared statistics would not be valid for this purpose (Dennis & Taper 1994). Comparing density-dependent models using AIC revealed strong support for the Ricker over the Gompertz (Δlog-likelihood = 4·8; Δ AIC = 9·6, models B and E), so the Ricker was used for subsequent models. Models of density-dependent post- or pre-parasitism were virtually identical. The result that the Ricker model is more predictive than the Gompertz model is important because the Gompertz model is often used simply because it keeps state-space calculations linear (in wt); the methodology here allowed both models to be compared, and choosing the Gompertz for convenience would have been a mistake.

Table 1.   Model comparisons for caterpillar dynamics
LabelModel componentsResults
ParasitismDensity dependencePrecipitationΔLog-likelihoodΔ AIC
  1. Model H is used as the reference model to which the other models are compared. The Δlog-likelihood and ΔAIC values are the differences (in log-likelihood and AIC) between each model and model H. Higher log-likelihood but lower AIC values indicate better model fits.

AYesRicker (post-par)None−3·03·2
E[wt] = wt−1 + r + log(1 − m PPt−1) + βwWt−1(1 − m PPt−1)
E[wt] = wt−1 + r + βwWt−1
CYesRicker (pre-par)None−3·06·0
E[wt] = wt−1 + r + log(1 − m PPt−1) + βwWt−1
DYesGompertz (post-par)None−7·815·6
E[wt] = wt−1 + r + log(1 − m PPt−1) + βwWt−1(1 − m PPt−1)
E[wt] = wt−1 + βwwt−1
FYesGompertz (pre-par)None−7·715·4
E[wt] = wt−1 + + log(1 − m PPt−1) + βwwt−1
GYesRicker (post-par)March02·0
E[wt] = wt−1 + r + log(1 − m PPt−1) + βwWt−1(1 − m PPt−1) + βPPRECIPt−1
E[wt] = wt−1 + r + βwWt−1 + βPPRECIPt−1
Table 2.   Parameter estimates and profile likelihood confidence intervals for selected host and parasitoid models and the caterpillar sampling model
Model parameterMaximum likelihood estimate95% Confidence interval
Caterpillar (host) dynamics
 r1·58(0·73, 2·30)
 βW−2·71(−3·59, −1·79)
 βP0·11(0·03, 0·20)
 σ0·71(0·46, 1·11)
Fly (parasitoid) dynamics
 r−1·92(−2·67, −1·14)
 βF−1·24(−1·60, −0·88)
 βW1·28(−0·31, 2·89)
 βP−1·60(−2·67, −0·55)
 σ0·99(0·58, 1·30)
Caterpillar sampling model
 μq−0·81(−1·20, −0·52)
 σq0·60(0·35, 0·90)
 sl3·35(1·99, 6·23)
 sh8·33(4·39, 16·95)

Precipitation during the previous March of the surveys stood out with the highest statistical significance (Fig. 2; Table 1 models B and H, log-likelihood difference = 2·9, P = 0·02). Because we evaluated 18 possible precipitation variables, a strict view of multiple testing would prevent an overall conclusion of statistical significance. However, it is biologically sensible that wet Marches, during which caterpillars are most active and most polyphagous, could be favourable for caterpillars. Furthermore, March is the last of the wet months so that March precipitation may influence the quality of food available to early instar caterpillars during late spring and summer. For comparing models with and without parasitoids, we chose the previous March precipitation model based on parsimony, although we also tried models with no precipitation and found consistent results.

Figure 2.

 Log-likelihood improvement from including monthly precipitation effects on caterpillar dynamics, relative to model B (Table 1), which lacks precipitation as a factor (Table 1). Each of 18 months, counting back from the March of the surveys (far right) until two autumns prior to each survey, was considered separately in a model. The relationship between log-likelihood differences and chi-squared hypothesis tests is shown with threshold lines for P = 0·20, 0·10, 0·05 and 0·02.

We found no support for any of the models involving fly parasitism driving the number of caterpillars (Table 1). The log-likelihoods were nearly identical (within 0·1, including minor Monte Carlo error inherent in a stochastic algorithm) between models with and without parasitism mortality, whether including precipitation or not and for either Ricker or Gompertz density dependence. In fact, the maximum likelihood estimate of the parameter m was essentially zero, corresponding to no impact of parasitism at all, and the confidence interval extends to 1, indicating the lack of information in the data about the role of parasitism. In summary, parasitism by definition affects host caterpillars, but the effect is sufficiently weak that we lack statistical power to detect its impact clearly. We conclude that there is no evidence for even a weak driving role of fly parasitism on caterpillar fluctuations, despite the fact that parasitism levels were observed to reach 70% and obviously killed large numbers of caterpillars.

Comparison of the estimated magnitudes of the effects of density dependence and precipitation (Fig. 3) shows that density and precipitation had intermittent but sometimes strong effects. Each type of effect is centred around the median (across years) of the mean posterior effect in each year. The most informative ways to read Fig. 3 are to compare the spread of each effect across years and of the two effects within years. For example, the density effects, when compared across years, span a difference of about 5 (for the means), indicating a substantial impact of low density vs. high density on population growth.

Figure 3.

 Effect sizes in the caterpillar model. For each year, the effects of density (red ‘X’, left symbol) and precipitation in the previous March (green dot, right symbol) are shown. For example, the results for 1988 show the effects of 1987 density and March 1987 precipitation on predicting late-March 1988 caterpillar densities. Effects are centred (zeroed) relative to the median across years of the mean posterior effect size, so by definition half of the years show means above zero and half below zero for each type of effect. Symbols and coloured bars show effect at maximum likelihood estimate and boundaries of 95% confidence interval respectively. For the density effect, these are calculated using the mean conditional population state of the model. Black bars (over red symbols) with narrow end ticks show uncertainty in effect size due to uncertainty in the population state; they show the effect sizes from the central 95% of the conditional state distribution, using the maximum likelihood parameters.

Figure 4.

 Log-likelihood improvement from including monthly precipitation effects on fly dynamics, relative to model F (Table 3), which lacks precipitation as a factor (Table 2). Each of 18 months, counting back from the March of the surveys (far right) until two autumns prior to the surveys, was considered separately in a model. The relationship between log-likelihood differences and chi-squared hypothesis tests is shown with threshold lines for P = 0·20, 0·10, 0·05, 0·01 and 0·003.

A final evaluation of the caterpillar model is to use one-step-ahead predictions to assess model prediction error and diagnostics (Fig. 1). For this step, observed densities are the conditional average of wt at each time, using the estimated mean and standard deviation of the random year effects, so they are combined from all individual lupine and poison hemlock transect counts. One-step-ahead predictions are an imperfect measure of model performance because the model has been optimized in a way that incorporates the structure of the data more fully, but nevertheless they are useful as a practical, understandable metric of model fit.

The predictive ability of the best one-step-ahead model was 30% per cent better than using simply average log population (residual standard deviation of 0·93 vs. 1·33 respectively, on a log scale). The time-series version of ‘R2’ (total sum of squares − residual sum of squares)/(total sum of squares) is 0·49, but for time series the prediction accuracy (residual standard deviation) is more directly meaningful. Lack of precise predictions for a wild population affected by many possible ecological factors is not surprising; rather, a 30% gain from two basic factors (density and precipitation) and the surprising conclusion that parasitism offers no gain in predictive ability give a useful baseline understanding of the system.

Fly dynamics

The best model of fly population dynamics included Gompertz density dependence and July precipitation (Table 3). Both Gompertz and Ricker models had unambiguous support compared with no density dependence by inspection of the confidence intervals (Table 1 for Gompertz). Comparison of AIC values moderately supports the Gompertz over the Ricker, regardless of how caterpillar or precipitation effects were included (Table 3; for models C and D, with no caterpillar or precipitation effects, Δlog-likelihood = 2·6, ΔAIC = 5·2). The Gompertz was used for further analyses.

Table 3.   Model comparisons for parasitoid dynamics. Model H is the final selected model, so all values of Δlog-likelihood and ΔAIC are relative to model H. Higher log-likelihood but lower AIC values indicate better model fits
LabelModel componentsResults
Host couplingDensity dependencePrecipitationΔLog-likelihoodΔAIC
E[ft] = r + wt−1 + log(PPt−1) + βFft−1
E[ft] = r + wt−1 + log(PPt−1) + βFFt−1
E[ft] = βFft−1
E[ft] = r + βFFt−1
ELinear (Lupine)GompertzNone−5·69·2
E[ft] = r + βWWt−1 + βFft−1
FLinear (Poison hemlock)GompertzNone−4·57·0
E[ft] = r + βWqt−1Wt−1 + βFft−1
E[ft] = βFft−1 + βPPRECIPt−1
HLinear (Poison hemlock)GompertzJuly00
E[ft] = r + βWqt−1Wt−1 + βFft−1 + βPPRECIPt−1

The strongest precipitation effect was July of the summer prior to each survey (Fig. 4; Table 3 models F vs. H, log-likelihood difference =4·5, P = 0·003). This is statistically significant at type I error rate 0·05 even when corrected for 18 tests using a Dunn–Sidak or Bonferroni (conservative) procedure (Quinn & Keough 2002). Although July precipitation is extremely low at our site, it may serve as a surrogate for other climate conditions, which can range from cold and foggy to warm and sunny. Seven out of 18 precipitation months had significance levels <0·10, leaving little doubt that some aspects of precipitation are truly important, but for parsimony we retained the single clearly strongest month. Although we lacked a specific a priori hypothesis regarding the relationship between weather and fly dynamics, a relationship between these variables seems plausible.

Forcing caterpillars into the fly model in a Nicholson–Bailey formulation gave a worse fit than simply omitting caterpillars. However, including caterpillars linearly gave a marginally significant effect of caterpillar densities from transects through poison hemlock as a predictor of fly densities when precipitation was not included (Table 3 models C vs. F, log-likelihood difference =1·8, χ2 = 3·6, d.f. = 1, P = 0·06), or when July precipitation was included (Table 3 models G vs. H, log-likelihood difference =1·1, χ2 = 2·2, d.f. = 1, P = 0·14). We conclude that the evidence leans towards a role of host abundance in fly dynamics that is weakly distinguishable from noise but remains inconclusive.

Effect sizes in the fly model show that precipitation and caterpillar densities may play roles of comparable magnitudes in fly dynamics, with strengths varying across years, although precipitation shows some years with the largest effects (Fig. 5). Thus, this is a case where the statistical clarity of the role of caterpillars depended on the test used but its estimated effect magnitude clearly should not be ignored relative to precipitation and density dependence. It is difficult to interpret residual standard deviation for per cent data on a biologically meaningful scale, but Fig. 1 shows that many predictions capture the fluctuations reasonably, whereas some years are clear misses (e.g. 1994, 1995, 2007). The proportion of variance explained by the best model, on a scale of arcsine-square-root per cent parasitism, was 0·41. As above, these are imperfect summaries because the state-space likelihood calculations account for measurement variation and different dissection sample sizes. One-step-ahead residuals showed no obvious additional patterns in the data (Fig. 1).

Figure 5.

 Effect sizes in the best fly model. For each year the effects of fly density (red ‘X’, left symbol), caterpillar density (blue triangle, right symbol) and precipitation in the previous July (green dot, middle symbol) are shown. For example, the results for 1988 show the effects of 1987 fly density and caterpillar density and July 1987 precipitation on predicting 1988 fly density. Effects are centred around zero as in Figure 3. Colored bars give 95% confidence intervals due to parameter uncertainty, for mean conditional population states. For density, black bars with narrow end ticks give 95% confidence intervals due to population state uncertainty, for maximum likelihood parameters.


Ecologists are often in our situation of having a noisy time series and wanting to know what hypotheses it supports to motivate further studies, including experiments. Time-series analyses of insect populations have been important (Kendall et al. 1999), but they are fundamentally observational and sometimes controversial (Yoccoz, Nichols & Boulinier 2001; Lambin et al. 2002). Depending upon your expectations, a model explaining 49% of the variance in caterpillar populations and 41% of the variance in parasitism might appear very successful or very insignificant (Moller & Jennions 2002). Our goal was not to develop a highly predictive model but rather to account for sources of variation that are always present but usually ignored in an attempt to find patterns in very noisy field data. The biological insights reported in this study were possible because we used state-space modelling.

Over 21 years, caterpillar densities varied by three orders of magnitude, and parasitism levels were often high (Fig. 1). State-space models revealed significant density dependence in caterpillar numbers with an overcompensating Ricker model, indicating a tendency to overshoot the mean. We found a positive relationship between precipitation in March and caterpillar population growth (Figs 2 and 3). Density dependence, July precipitation and the availability of host caterpillars were comparable in the size and consistency of their effects on fly numbers (Fig. 5).

That parasitism tended to be driven by host abundance suggests that the size of fly populations is limited locally by the abundance of P. virginalis hosts and supports our observations that T. americana has no alternative host at this site. However, the Nicholson–Bailey model, which assumes that one generation of parasitoids results directly from the previous number of parasitized caterpillars, was not supported by the data. A more general model of parasitism received better support. The indirect transmission of parasites involving a vulnerable period in the environment and the observation that wet summers have more flies are consistent with this result of weak coupling between parasites and hosts. These empirical results suggest interesting directions for host–parasitoid theory, which has focused largely on tightly coupled dynamics.

Our a priori hypothesis, that parasitism drives host population dynamics, found no support in the data, leading us to consider alternatives for future work. The positive correlation of our data from two neighbouring habitats suggests that the sampling is effective and the dynamics are real. In space, caterpillars were generally found around wet vegetation at this and other sites, consistent with our precipitation results but suggesting that more detailed study of the role of precipitation may be informative. Other exogenous factors do not fit as well with our knowledge of the natural history. Caterpillar populations were not limited by an absolute shortage of food as lupine bushes were never defoliated or uncommon. Caterpillars may be affected by food quality (Karban & Kittelson 1999), although this complicated interaction is likely to vary with plant history, ontogeny, tissue type and environmental variables (Duffey & Stout 1996). Other Arctiids benefit by mixing their diets (Singer 2001) and mixed diets may increase the survival of P. virginalis when individuals are parasitized or under other conditions (Karban & English-Loeb 1997; Adler 2004). Experimentally excluding mammalian and invertebrate herbivores increased population densities of P. virginalis and may also affect dynamics (Huntzinger, Karban & Cushman 2008), although the mechanisms responsible for this result are not known. Early instar caterpillars are preyed on by ants, crab spiders, starlings and possibly deer mice, and may also be susceptible to entomopathogenic nematodes (R. Karban & D. Grumer, pers. obs.). In theory, multiple natural enemy interactions have the potential to generate erratic timing of outbreaks (Dwyer, Dushoff & Yee 2004), superficially consistent with the data here. All of these factors have the potential to play important roles in caterpillar dynamics.

These data had been recalcitrant to more traditional methods. The state-space method allowed us to evaluate the relative importance of density dependence, weather and parasitism on the population dynamics of the host caterpillar and parasitoid fly while incorporating multiple sources of variability. Although ecologists recognize that multiple factors simultaneously influence population and community dynamics, they have been limited in their ability to evaluate the relative contributions of these factors. The state-space modelling techniques used here should allow researchers to evaluate the causes of population dynamics of herbivore–parasitoid systems with greater power and resolution than methods that ignore important sources of variability in the data or that consider only limited types of statistical evidence and single factor explanations.


Many former students and teaching assistants of the Terrestrial Field Ecology class helped collect census data and helped develop hypotheses, particularly John Gross, Susan Harrison, Greg Loeb, Alison Brody, Anurag Agrawal and Lynn Adler. Mikaela Huntzinger, Claire and Jesse Karban, and Dan Gruner also helped do field work to unravel the natural history. Sandy Liebhold and Marcel Holyoak were extremely gracious and helpful in interpreting the patterns and Dan Gruner, Leo Polansky and Kohji Yamamura improved the manuscript. This is a contribution from the UC Bodega Marine Reserve. This work has been supported by NSF, most recently DEB 0639885, and by UC intercampus research funds.