Monitoring abundance and phenology in (multivoltine) butterfly species: a novel mixture model



  1. Data from ‘citizen science’ surveys are increasingly valuable in identifying declines in widespread species, but require special attention in the case of invertebrates, with considerable variation in number, seasonal flight patterns and, potentially, voltinism. There is a need for reliable and more informative methods of inference in such cases.
  2. We focus on data consisting of sample counts of individuals that are not uniquely identifiable, collected at one or more sites. Arrival or emergence and departure or death of individuals take place during the study. We introduce a new modelling approach, which borrows ideas from the ‘stopover’ capture–recapture literature, that permits the estimation of parameters of interest, such as mean arrival times and relative abundance, or in some cases, absolute abundance, and the comparison of these between sites.
  3. The model is evaluated using an extensive simulation study which demonstrates that the estimates for the parameters of interest obtained by the model are reliable, even when the data sets are sparse, as is often the case in reality.
  4. When applied to data for the common blue butterfly Polyommatus icarus at a large number of sites, the results suggest that mean emergence times, as well as the relative sizes of the broods, are linked to site northing, and confirm field experience that the species is bivoltine in the south of the UK but practically univoltine in the north.
  5. Synthesis and applications. Our proposed ‘stopover’ model is parameterized with biologically informative constituents: times of emergence, survival rate and relative brood sizes. Estimates of absolute or relative abundance that can be obtained alongside these underlying variables are robust to the presence of missing observations and can be compared in a statistically rigorous framework. These estimates are direct indices of abundance, rather than ‘sightings’, implicitly adjusted for the possible presence of repeat sightings during a season. At the same time, they provide indices of change in demographic and phenological parameters that may be of use in identifying the factors underlying population change. The model is widely applicable and this will increase the utility of already valuable and influential long-standing surveys in monitoring the effects of environmental change on phenology or abundance.


The size of wildlife populations, particularly of many insect populations, at a study site can change daily, with new individuals being added through birth and/or immigration and removed through death and/or emigration. Studying these changes is of great interest in conservation and monitoring, but a complete census is rarely possible and a sampling survey is all that can be achieved. For widespread species, ‘citizen science’ schemes are increasingly adopted, with potentially large numbers of volunteers able to cover large geographical areas over a longer period of time than is practical by any other means. Trends from such data are now available from across many (though by no means all) taxonomic groups. Modelling invertebrate data poses both problems and opportunities, with seasonal occurrence and individuals, possibly of several generations, having life spans considerably shorter than the sampling period.

Such data usually comprise a number of simple series of counts collected to a standardized protocol. Where individuals can be uniquely marked or identified by unique physical characteristics, a number of capture–recapture (or mark–recapture) modelling approaches have been developed for estimating the size of a population. Collectively, these have become known as ‘Jolly–Seber-type’ models (Jolly 1965; Seber 1965). Furthermore, recently developed ‘stopover’ models (Pledger et al. 2009; Matechou et al. 2013), which build on the Schwarz & Arnason (1996) mark–recapture approach (SA), explicitly model the unknown times of arrival and departure of the individuals and provide estimates of the arrival rates at the site during the sampling period, as well as indirect estimates of the average duration of stay of the individuals at the site, referred to as mean stopover duration. However, these models do not directly apply to data where the individuals are not uniquely identifiable. In these cases, the data commonly consist only of sample counts obtained on each of a number of occasions within a survey period, and the number of times an individual is detected is unknown.

In this paper, we introduce a new approach for data of this type. The merits of this new model are that underlying biological processes are explicitly parameterized. This permits the flexible estimation of parameters of interest such as absolute (or relative) abundance, mean arrival times and probabilities of retention and the formal comparison of these across a number of sites and/or years. Whereas existing models essentially estimate the numbers of sightings, we estimate relative or absolute abundance, with an implicit adjustment made for multiple encounters while estimating the relative sizes of the arrival groups, their mean arrival times and the mean stopover duration in a season. These additional parameters are potentially useful in explaining changes in abundance or phenology and can be functionally related to environmental covariates.

In the following sections, we introduce the model, with particular attention to issues of model-fitting and parameter identifiability, and we explore its performance via a large-scale simulation study. We present an illustrative example based on data from the UK Butterfly Monitoring scheme (UKBMS) (Botham et al. 2011) where we consider the case of seasonally emerging butterflies which are only countable – in adult form – during a fixed period. Specifically, we consider multivoltine species in which more than one generation appears during each season, with potentially considerable overlap in their times of flight.

Material and methods

The model

The model presented in this section borrows ideas from models for capture–recapture data and specifically from the SA parameterization of the Jolly–Seber model. SA uses the idea of a superpopulation, N, first introduced by Crosbie & Manly (1985), which is the total number of unique individuals that were present at the site during the study. For a study assumed to consist of T occasions, for example days, weeks, they explicitly model the arrival of the individuals at the site using the entry parameters, βj–1j = 1, …, T,  which are the proportions of N that were new arrivals at occasion j with math formula.

We assume here, without loss of generality, that the T sampling occasions are equally spaced. The data set, y, is a vector of sample counts of length T collected at the site with K ≤ T non-missing entries. For an individual animal to contribute to a count obtained on occasion j, yj, it has to have arrived before occasion j, to have remained until j and to be detected given that it is present on occasion j. The probability of remaining at the site until the next occasion, referred to as retention probability, can be time- or age-dependent, where age is defined to be the unknown time since entry to the site. We denote by φja the probability that individuals that have been at the site for a occasions, and are present on occasion j, will remain until occasion j + 1. The probability of detecting an individual that is present on occasion j, referred to as detection probability, is denoted by pj.

It is natural to treat entry j in y as the realization of a Poisson distribution with expectation λj. Each λj is a function of the super-population size and of the entry, retention and detection probabilities. Specifically, the expected number of individuals counted at the site on occasion j is equal to math formula a = k − b + 1, where b = 1, …, j are the possible times of entry to the population for an individual detected on occasion j. For example, math formula, that is, individuals detected on occasion 3 entered the population either before occasion 1 but remained until occasion 3, or between occasions 1 and 2 and remained until occasion 3, or between occasions 2 and 3.

The model likelihood is math formula. If sample j was missed, then pj = 0 and hence λj = yj = 0 and observation j does not contribute to the likelihood calculation.

As will be explained in the following section, the total number of parameters or combinations of parameters that can be estimated by the model is equal to K. Allowing the entry probabilities to freely vary by time introduces T–1 parameters to the model, a number of which can be practically equal to 0 for data sets of the type considered in this paper, since the period during which individuals arrive at the site may be much shorter than, and is assumed to be encompassed by, the sampling period. Therefore, we suggest that the entry probabilities are modelled using a mixture of M normal distributions instead. Each of these distributions relates to one arrival group, for example one distinct brood, and has its own (relative) weight wm, m = 1, …, M and mean, μm, and possibly its own variance math formula. Consequently, the proportion of N that were new arrivals on occasion jj = 1, …, T is equal to: math formula, where Fmj ) =P(X ≤ j ) when math formula. By the definition of the β parameters, Fm(0) = 0 and math formula; therefore, math formula and math formula. A demonstrating example when M = 2 with overlapping times between the two arrival groups is given in Fig. 1. Hence, regardless of the size of T, the model requires the estimation of 2M parameters in the case of homoscedastic mixture distributions and 3M–1 parameters in the heteroscedastic. This modelling approach for the entry parameters provides a smooth representation of the arrival pattern of the individuals to the site as well as estimates of the mean arrival times and relative sizes of the different arrival groups.

Figure 1.

(a) Two normal densities with μ1 = 4, μ2 = 7 and σ1 = σ2 = 1 with corresponding weights equal to 0·7 and 0·3. If an observation, x1, is drawn randomly from a N(4, 1) distribution, then 0·7 × P(4 ≤ x1 ≤ 5) is given by the grey shaded area (0·7 × 0·34), while if an observation x2 is drawn from a N(7, 1) distribution, then 0·3 × P(4 ≤ x2 ≤ 5) is given by the area inside the black lines (0·3 × 0·02). (b) The resulting β parameters. For example, β4, which is shaded in grey, is given by 0·7 × 0·34 + 0·3 × 0·02 = 0·244.

Similarly, we propose to model retention probabilities using parametric curves. An example is the logistic curve where math formula, where xj can correspond to the value of a time-varying environmental covariate such as temperature or simply to calendar time. Another option is to use the more flexible quadratic function where math formula. Alternatively, as mentioned above, φ can be modelled as a function of age, math formula.

Finally, detection probabilities can be modelled either as constant over time, as appropriate, or as dependent on a time-varying covariate v, such as temperature or sampling effort at the time of sampling, with math formula. As will be shown in the next section, in the first case, parameters N and p are only estimable as a product, Np, and therefore, the model does not provide estimates of the super-population size. In the latter case, simulations suggest that the model becomes more ‘data-hungry’ and richer data sets with higher counts from more sites and maybe better separated groups are required for its estimates to be reliable.

If data sets from multiple sites are available, then one can use an integrated modelling approach to analyse them simultaneously. The data set now consists of matrix Y with entry yij equal to the count obtained at site i on sampling occasion j. Specifically, for data sets collected at S sites, the likelihood becomes math formula where λij is the Poisson mean for site i on occasion j.

The number of estimable parameter combinations now increases, with a maximum of ST if there are no empty cells in Y, which allows for some of the assumptions for the model parameters to be relaxed. For example, the mean arrival times can be modelled in terms of a site-specific covariate, z: log (μim) = αm + βμ · zi, where αm, m = 1, …, M is the log mean arrival time of group m with zi = 0 and βμ is the shift of all means to a direction indicated by its sign when zi changes. A similar approach can be employed for modelling the relative weights of the arrival groups. The precision of the estimates is expected to increase as the number of sites increases if the expectations of the model for each site have a number of parameters in common. However, parameters N1, …, NS are again only estimated each as a product with p, N1p, …, NSp, when the latter is assumed constant for all sites and sampling occasions.

All of the simulation and data analysis results presented in this paper were obtained using R Core Team (2013), and computer code to perform the model-fitting is available from the first author upon request.

Parameter redundancy

If one or more parameters in a model cannot be estimated, then the model is termed parameter redundant and is non-identifiable. Catchpole & Morgan (1997) showed that for a model from the exponential family of distributions, such as the Poisson model described in the previous section, one can identify whether it will be parameter redundant for all data sets by calculating the rank of a matrix of first derivatives. Specifically, the number of estimable parameter combinations in the model for data from a single site is equal to the rank of matrix D with entries

display math

where λ is the vector of means of the model, in this case of length T, and θ is the vector of parameters, of length q. The rank of D, r, is less than or equal to T if there are no missing data and less than or equal to K otherwise. If r is less than q, then the model is parameter redundant and it is not possible to identify unique maximum-likelihood estimators for at least some of its parameters. If on the other hand r = q, then the model is termed full rank.

Symbolic algebra packages, such as Maple, can be used to calculate the entries of D as well as its rank. However, if the model structure is too complex, Maple can run out of memory when calculating r symbolically. To deal with this limitation, Choquet & Cole (2012) proposed a hybrid symbolic-numerical approach where the entries of D are found symbolically but its rank is calculated numerically for values of the parameters randomly chosen from the parameter space. If r is calculated less than q, then any zero entries in the numerical estimation of the left kernel of D suggest the parameters that are estimable. As Choquet & Cole (2012) point out, a point chosen at random from the parameter space, especially if this choice is poor, can result in r being estimated as smaller than the actual model rank, and they therefore suggest choosing around five sets of random values and repeating the procedure for each set. The model rank is equal to the largest value for r obtained from these repetitions.

For data collected at S sites, the number of estimable parameter combinations cannot exceed ST. If detection probability is assumed constant across all sites and all sampling occasions, then N1, …, NS only appear as a product with p and therefore are not estimable separately from it. This result is also verified by adopting the aforementioned symbolic-numerical methods for all specifications of the other model parameters introduced in the previous section.

Parameters N1, …, NS and p do, however, become separately estimable if p is modelled using a time-varying covariate. This finding is similar to that shown by Cole & Choquet (2013) who incorporated random effects to separate confounded parameters in capture–recapture models. Similarly to Cole & Choquet (2013), we have found that if either the covariate used to model p does not vary considerably across the samples, or its effect on p is not statistically significant, then the model becomes near-parameter redundant, which means that even though it is theoretically full rank, it actually behaves like a parameter redundant model in practice.

Although these results do not need to be reproduced when the models are fitted to data, the Maple (Maplesoft, Waterloo, Canada) code used to derive these results is available upon request from the first author. Simulations, performed to explore the different model specifications, suggest that as the model becomes more complicated, and especially when N and p are separately estimated, the data set needs to be richer for the model to perform adequately. As is usually the case with sparse data sets, results that hold in theory might not be true in practice in terms of the estimable parameter combinations in the model and results obtained by analysing sparse data sets should be treated with caution.

Model-fitting considerations

Different starting values for the parameters in mixture models can yield different local maxima since the surface of a mixture model likelihood may be multimodal. This implies that in this case different starting values for the mean arrival times of the groups could lead to different results. It is recommended that the optimization algorithm is started from a number of different values to ensure a wide search and to obtain a number of different local maxima from which to choose the best, that is, the one that results in the highest likelihood value.

The starting value for each arrival mean can be randomly sampled from the possible arrival times, which are all values between 1 and T. Starting values for the standard deviations of the arrival groups can be chosen to be large, for example 5–6 depending on the length of the study. This eliminates as much as possible the appearance of spurious maximizers which may result from the fact that the likelihood for mixtures of heteroscedastic normal distributions does not have a global maximum value and continues to increase when one, or more, of the values of the variances of the groups decreases. These maximizers often lead to singularities in the variance–covariance matrix.

For a detailed description of the issues of multimodality and spurious maximizers, see McLachlan & Peel (2000).


The expected number of individuals detected at site i on occasion j math formula is equal to math formula, a = j − b + 1, where math formula and math formula are the estimated entry and retention probabilities for site i, respectively.

The residual deviance of the fitted Poisson model can be used to assess its fit. However, when a number of cell counts in Y are low, the asymptotic distribution of the residual deviance may not be χ2 anymore, but the fit can be assessed less formally by plotting the observed and fitted values against or alongside one another.

Stopover duration

The mean stopover duration at site i, MSDi, is equal to math formula a = j − b + 1, where d = b, …, T are the possible exit times from the population for an individual that entered on occasion b. In the case of bivoltine insect species, this is the average duration across both broods. In cases where the population is closed to migration, this will generally be the average (adult) life span of an individual, although for the few species that overwinter as adults individuals emerge from, and may leave the study into, a state of diapause.



This section presents an extensive simulation study which examines the performance of the model for a wide range of assumptions for the parameters. The simulations are divided in two sections: in section A, the fitted models have pij = p ∀ij and therefore parameters N1, …, NS are only estimated as a product with p, while in section B, detection probabilities are logistically regressed on an artificial covariate, generated from a Unif[5,15] for all sites/occasions, and estimates of N1, …, NS, separate from detection probabilities, are obtained.

Section A: Constant detectability

Simulation A1 sets S = 10, T = K = 15, N = (609, 869, 659, 848, 553, 346, 871, 875, 227, 545), M = 2 with math formula, wi,1 = 0·4, wi,2 = 0·6 and σi,1 = σi,2 = 1 ∀i, pi,j = 0·2 and φi,j = 0·6 ∀ij. Figure 2(a) shows the counts obtained in one simulation run for all sites. Figure 2(b,c) demonstrates that the model provides satisfactory estimates for both sets of parameters.

Figure 2.

Simulation A1: (a) Obtained counts from one simulation run for all sites. (b, c) Box plots of derived estimates for Np and β from 100 replications and true values, indicated by the black diamonds.

This simulation is used as a baseline for evaluating several extensions of the model. The results are shown in Appendix S1 in the Supporting Information. In simulation A2, observations are deleted at random and hence at any site K ≠ T, resulting in around 20% of the data being missing (Fig. S1, Supporting Information). In simulations A3, A4 and A5, retention probabilities are, respectively, a function of calendar time, a function of age and a function of the square of calendar time (Figs S2, S3 and S4, Supporting Information). Note that in simulation A4, T is set equal to 20 since the second group remains for longer than in simulation A3. In simulation A6, w1 is logistically regressed on a fictitious covariate (Fig. S5, Supporting Information), while in simulation A7, the logarithms of μ1 and μ2 are regressed on a fictitious covariate (Fig. S6, Supporting Information). The case of heteroscedastic arrival groups is examined in simulation A8 (Fig. S7, Supporting Information), while, finally, simulation A9 sets M = 3 with μ = (2, 6, 10) and w = (0·4, 0·5, 0·1) (Fig. S8, Supporting Information).

The results suggest that the model performs well in all of these cases. When the relationship between math formula and calendar time is quadratic, there is greater uncertainty for the part of the curve that corresponds to the early sampling occasions, compared to the case when math formula is linearly dependent on time. Similarly, when math formula depends linearly on age, there is more uncertainty in the part of the curve that corresponds to the older individuals.

The validity of model selection criteria, such as the Akaike information criterion (AIC) (Akaike 1973), in choosing the number of mixture components is doubtful because of violation of regularity conditions (McLachlan & Peel 2000; Chapter 6). However, their use has gained support in the literature, for example in Cubaynes et al. (2012). We performed a small simulation study to examine the performance of AIC in choosing the right value for M when M is set equal to 1, 2 or 3. Specifically, we simulated data with M = 1, μ = 5, σ = 1 or with M = 2, w1 = 0·4,  w2 = 0·6, μ1 = 3,  μ2 = 7 and σ1 = σ2 = 1, or with M = 3, w1 = 0·4, w2 = w3 = 0·3, math formula and σ1 = σ2 = σ3 = 1 and all other parameters set as in the baseline simulation. For each set, we fitted models with M = 1, 2, 3 and used AIC to choose between them. The number of times each model is selected for each set, out of 100 simulations, given in Table 1, suggests that in this case AIC successfully selects the right value for M in the majority of cases.

Table 1. Each cell corresponds to the number of times each value of M was chosen by Akaike information criterion, out of 100 simulations, when the true value of M is the one indicated in the first column
True MChosen M

Section B: Varying detectability

If detection probabilities are allowed to vary according to a fictitious site- and time-varying covariate, then the model also requires richer data sets with higher counts in order to perform adequately. For example, if the average detection probability is 0·2 and all other parameters are as in simulation A1, then the median relative bias (MRB) in the estimates for N is around 9% for all sites. If detection probabilities are set on average equal to 0·7, then the MRB falls to around 5% for all sites. If N doubles for all sites compared to simulation A1, then MRB is around 3·5% when average detection probability is 0·2, and only 0·1% when average detection probability is 0·7. The results of the latter simulation (B1) are presented in Fig. 3 together with the counts obtained in one of the simulation runs at all sites.

Figure 3.

Simulation B1: (a) Obtained counts from one simulation run for all sites. (b, c) Box plots of derived estimates for N and β from 100 replications and true values, indicated by the black diamonds.

Further simulation results are shown in Appendix S2 in the Supporting Information. In particular, simulation B2 explores the case when 20% of the counts are missing (Fig. S9, Supporting Information), simulation B3 has φ logistically regressed on time (Fig. S10, Supporting Information), and simulation B4 has φ logistically regressed on age (Fig. S11, Supporting Information). Note that in simulations B3 and B4, the groups are better separated for the model to perform well with μ1 = 2 and μ2 = 9 and also in simulation B4 T = 20, as was the case in simulation A4.

Application to UKBMS count data

Butterfly counts are characterized by their high variability throughout the season, representing the different patterns of emergence for each species. Different species of butterfly exhibit varying levels of voltinism, with one, two or more generations per year.

The UKBMS consists of counts made weekly from the beginning of April until the end of September using the transect method, which is described in depth in Pollard and Yates (1993). Transects are typically 2–4 km long and walked within specified periods of the day and when weather conditions are suitable for butterfly activity. The scheme design allows for counts to be made throughout the season for butterfly activity, during which abundance will vary according to different seasonal patterns of emergence.

We apply our model to UKBMS data for the common blue Polyommatus icarus, collected in 2010. This species is known to exhibit bivoltine populations in the south of the UK, while populations become single-brooded in the north. However, a precise latitude at which this occurs or knowledge of how this boundary may have changed over time are both unknown (Asher et al. 2001).

We considered M = 2 homoscedastic normal mixture distributions for the arrival of the butterflies at the sites. For computational efficiency, data were limited to a random sample of 50 monitored sites, excluding sites where more than 6 counts were missing from the season or the sum of the counts made was less than 10. Common blue overwinters as a caterpillar and is therefore not seen in flight until late spring. The start of the season was defined as the week with the first positive count, with season length totalling 21 weeks.

Model comparison was made for varying parameter assumptions using AIC. The mixture means and weights were estimated as either constant or as a function of site northing. Retention probabilities were modelled as constant or as logistically dependent on calendar time, age or calendar time squared. Detection probabilities were set either as constant and common across sites or as logistically dependent on temperature at the site on the day of sampling, which is also recorded by the data collector. Missing temperature records were replaced by the average of neighbouring sites. Each model was started from ten different random starts for the parameters to determine the optimal local maximum and all covariates were standardized.

Table 2 provides the AIC values and the number of parameters of the models considered. The two models with the greatest support have the weights and mixture means dependent on northing and the logits of retention probabilities as dependent on the square of calendar time. The most favoured model also has the logits of detection probabilities dependent on temperature and an AIC value considerably lower than that of the second most favoured model, which has a constant detection probability across sites and time.

Table 2. Model comparisons from fitting a range of models to the common blue data. Here, denotes the value of the log likelihood evaluated at the maximum-likelihood estimate of the parameters
Model No. of parametersAkaike information criterion
  1. north, northing; temp, temperature.

wm(north)μm(north)σ(.)φ(t + t2)p(temp)−2880·01615882·03
wm(north)μm(north)σ(.)φ(t + t2)p(.)−2923·42595964·85
wm(.)μm(north)σ(.)φ(t + t2)p(.)−3161·16586438·31
wm(north)μm(north)σ(.)φ(a + a2)p(temp)−3232·41616586·81
wm(north)μm(north)σ(.)φ(a + a2)p(.)−3280·17596678·34
wm(.)μm(north)σ(.)φ(t + t2)p(temp)−3356·25606832·50
wm(north)μm(.)σ(.)φ(t + t2)p(temp)−3374·32606868·63
wm(north)μm(.)σ(.)φ(t + t2)p(.)−3559·7587235·39
wm(.)μm(.)σ(.)φ(t + t2)p(temp)−3837·59597793·17
wm(.)μm(.)σ(.)φ(t + t2)p(.)−4000·97578115·94

Parameter estimates and associated standard errors for the preferred model are given in Table 3, and they are similar to those derived from the second best model, shown in Table S1 in Appendix S3 in the Supporting Information.

Table 3. Parameter estimates from the most favoured model in terms of Akaike information criterion
Logit of detection probabilities, as a function of temperature (standardised)
Logit of relative weight of the first brood, as a function of northing (standardized)
Log of mean emergence times of the two broods, as a function of northing (standardized)
Intercept (1)1·6880·023
Intercept (2)2·7480·013
σ 1·2510·055
Logit of retention probabilities, as a function of time (t) and time squared (t2) (standardised)
Slope for t1·5620·266
Slope for t2−3·2000·297

The residual deviance of the selected model is D = 3952, with (n – p) = 924–61 = 863, which implies a moderate lack of fit and dispersion estimated as approximately 4·58. However, comparison of the observed counts with estimated fitted counts from the model shows reasonable correspondence for most sites, implying overdispersion rather than a failure in the model structure (Fig. S12, Supporting Information), and all standard errors have been adjusted for overdispersion.

The estimated retention probabilities, shown in Fig. 5(a), peak around week 11 of 21, before dropping off towards the ends of the season. They are estimated as approximately zero for the initial weeks of the season. The 95% confidence intervals constructed around the logit of retention probabilities demonstrate the greater uncertainty for the part of the curve that corresponds to the start of the season (Fig. S13, Supporting Information). This is because the obtained counts are considerably lower in the first few weeks and the first few columns of the data set are very sparse. A similar result was observed for simulated data.

The weighting of the first normal distribution increases with northing, with the second brood almost disappearing in the north (Fig. 5b). The means of the two normal distributions suggest a later time of emergence in the north (Fig. 5c). This is also demonstrated by the entry parameters: two relatively even broods at southern sites, with the first brood dominating at high northing, in addition to a later emergence (Fig. 4). The 95% confidence intervals shown in Fig. 5(b,c) are constructed using the Delta method in R package msm (Jackson 2011).

Figure 4.

Estimated entry parameters for common blue in the UK, summer 2010, at a sample of northing values.

Figure 5.

Point estimates obtained by the selected model together with 95% confidence intervals: (a) estimated retention probabilities (common across sites), (b) relative size of the first brood, (c) mean arrival times of the two broods, by northing, and (d) estimated super-population sizes (circles), N, of the 50 sites. The black dots in (d) are the point estimates of Np obtained by the second best model in terms of Akaike information criterion.

Finally, the estimated super-population sizes for all 50 sites are shown in Fig. 5(d), together with their asymptotic 95% confidence intervals back-transformed from the log-scale. The black dots are the estimates of Np, that is, the product of the super-population sizes and detection probability, when that is assumed constant and common across all sites, derived by the second best model, as shown in Table 2. As expected, and especially for sites with higher estimated N, these two point estimates are far apart, with N greater than Np.


Simple series of annually replicated counts, made to a standardized protocol, are essential for conservation monitoring. Through butterfly counts of this kind, for example, have been identified not only widespread declines in some species, but also several marked successful responses to targeted management (Thomas, Simcox & Hovestadt 2011). Optimal methods of analysing such data would also shed light on the factors behind the population changes, by estimating demographic and phenological changes alongside the population trends.

Many approaches have been adopted for comparing relative abundances at different locations, or different points in time. Simple Poisson models are often applied to data in the form of multiple, often short and incomplete, time series arising from standardized survey protocols. A specific problem with butterfly data, such as those of the UKBMS, is the need to account for the seasonal patterns – inevitable in the counts – which means that expected counts, even at an individual site, vary greatly within a season. To date, this seasonal pattern has usually been estimated via a generalized additive model (GAM: Rothery & Roy (2010); Dennis et al. (2013)). Such models, however, rely on interpolating any missing values and, as such, estimate the total numbers of sightings, rather than individuals.

A GAM approach is clearly nonparametric and empirical. Our proposed method is still Poisson-based (although other distributions might be considered if preferred), but seeks to explain the changing counts within a season via models which are both biologically realistic and of considerable value in conservation management. Missed visits within the season are easily encompassed due to the use of parametric functions to constrain the model parameters, and we can now estimate more flexibly a number of quantities, in addition to abundance or relative abundance. Additionally, we can ensure that indices of abundance are not biased due to differences in seasonal flight periods or multiple sightings of individuals within a season. The retention/survival probability φ is a demographic variable that can be converted to the estimated flight period (adult life expectancy) of an individual; through the arrival parameter β alone, we are informed about phenological change (the average time of arrival or emergence) and, importantly, the relative strengths of two (or more) broods in multivoltine species. Extensive simulations have shown that the model can be expected to perform well with data of a scale readily achievable in practice.

Note that the method includes the pioneering work of Zonneveld (1991) as a special case, in which data from single sites are modelled individually with a constant rate of survival. The number of parameters estimated restricts the capacity for useful inference at a single site and season (see Calabrese 2012). Extension to simultaneous analyses at multiple, potentially many, sites which may be expected to share at least some parameters allows ‘borrowing strength’ and improved inference on key ecological parameters – the assumption of constant survival can be tested in a robust framework, for example. Although we have chosen normal distributions for the arrival times, alternatives are readily adopted: Zonneveld (1991) considered logistic, and Cornulier et al. (2009) also used asymmetric distributions within mixture models to permit a degree of skewness in the hatching dates of birds from monitored nests. Clearly, the model is also straightforward to apply to univoltine species, though we have concentrated on the bivoltine case here due to its special interest and difficulties, and to illustrate the connection with the widely used stopover models in other contexts. In the latter, arrival and departure of (marked) individuals from a location are usually considered as immigration and emigration, though the mathematics is clearly analogous. The butterflies in our data are of course not individually identifiable, and this prohibits the estimation of genuine abundance when detection probability is assumed constant, something which is possible in conventional stopover models. Nonetheless, in this case the model confounds abundance and detectability, so if the latter can be assumed constant, comparable measures of relative abundance arise and are of use in management and monitoring.

The ability to estimate changes in phenology, demography and voltinism along with abundance provides a rigorous statistical basis for comparisons of these, and their relationships with environmental covariates or with one another. Thus, for example, in a species such as the common blue, we are able not only to estimate abundance or relative abundance at each site, but to apportion this between two broods. Use of a covariate (site northing) confirms, and quantifies, long-standing field experience that the species is effectively single-brooded in the far north of its range although the two broods are comparable in the south, and that the broods emerge later with increasing northing, presumably as a consequence of later spring/summer conditions. Given that, for simplicity, survival is often assumed constant in modelling butterfly populations, for example Soulsby & Thomas (2012), and in the absence of evidence for senescence, it is interesting that for common blue models with age dependence in survival fare poorly. However, the evidence for variation with time across the season is pronounced.

Butterfly data are regarded as especially useful environmental indicators. They are, for example, relatively easy to collect via ‘citizen science’ schemes as they are visible, popular with the public and, in the UK at least, species are few and largely easy to locate and identify, compared to many invertebrate groups. The method is, however, equally applicable to other seasonal invertebrate species.

The sensitivity of butterflies to climatic or land-use changes makes them useful indicators of the effects on wider biodiversity, as shown by the adoption of the UKBMS into governmental indicators of biodiversity trends in the UK (Defra 2011) and beyond (van Swaay et al. 2008). Given the demonstrated utility of butterfly surveys in studying, for example, climate change (Roy & Sparks 2000; Roy et al. 2001) or consequences of agricultural practice (Woodcock et al. 2012; Jonason et al. 2011) and, for specialist species, habitat fragmentation (Brückmann, Krauss & Steffan-Dewenter 2010), we believe that the greater flexibility and robustness of the models described here will greatly increase the value of such surveys in future.

The method has important applications in conservation biology as it enables absolute abundance of an insect species seasonal population to be estimated from strip transects. This is without the need for intrusive and labour-intensive marking techniques or technically demanding distance sampling that involves counting butterflies in distance bands. Phenology and abundance are modelled simultaneously, and mean date of emergence, which is a new statistic for butterflies, can be used to monitor species responses to climate change. Altitude and aspect are also thought to affect butterfly phenology, and it is straightforward to include these in the model in future. Detectability was modelled as a function of temperature at the site on the day of sampling, but other covariates such as habitat type, recorder effort, experience or age can be also incorporated if available.


We are grateful to Diana Cole for providing us with Maple code for the Choquet and Cole (2012) hybrid symbolic-numerical method and for all the useful discussions on parameter identifiability and to Steve Buckland and Byron Morgan for all their insightful suggestions. This work was part-funded by EPSRC grants EP/I000917/1 and EP/P505577/1. The UKBMS is operated by the Centre for Ecology & Hydrology and Butterfly Conservation and funded by a multi-agency consortium including the Natural Resources Wales, Defra, the Joint Nature Conservation Committee, Forestry Commission, Natural England, the Natural Environment Research Council and Scottish Natural Heritage. The UKBMS is indebted to all volunteers who contribute data to the scheme.