Insect population curves: modelling and application to butterfly transect data

Authors

  • Richard L. Soulsby,

    Corresponding author
    1. Department of Engineering Science, University of Oxford, Parks Road, Oxford OX1 3PJ, UK
      Correspondence author. E-mail: richard.soulsby@eng.ox.ac.uk
    Search for more papers by this author
  • Jeremy A. Thomas

    1. Department of Zoology, University of Oxford, The Tinbergen Building, South Parks Road, Oxford OX1 3PS, UK
    2. Centre for Ecology and Hydrology, Wallingford, UK
    Search for more papers by this author

Correspondence author. E-mail: richard.soulsby@eng.ox.ac.uk

Summary

1. Monitoring the abundance of insect populations is increasingly valuable to understand their population dynamics, plan management strategies and assess the attainment of conservation targets. As monitoring is often constrained in time and space, it is important to have a model of the temporal variation in numbers to enable limited field data to be fitted into a broader framework, and to estimate the total detectable population on a site. We use adult butterflies as a study system, because they are widely monitored, sensitive indicators of environmental change.

2. We derive a generic algebraic expression for the shape of the population curve that describes how the number of adult butterflies observed at a site varies through the flight period. This novel expression contains parameters corresponding to total detectable population, start-date and length of eclosion period, and mean life span. These can be optimised to fit observed series of counts (e.g. butterfly transect data, moth light-trap counts) for individual species, sites and years, using a reliable grid-refinement optimisation procedure.

3. Assumptions about survivorship and death rate are supported by field observations of the life span of two species tested. The modelled population curve (POPFIT) closely fits observed transect data for three further species.

4. We give two simplified methods of estimating the total transect population: from the Index of Abundance and from the peak population. We discuss extrapolation of transect results to estimate total site population, taking account of observed sampling efficiencies.

5. Our new method compares favourably with previous models for assessing patterns of population change at monitored sites and can assist with the planning of optimal monitoring strategies. As well as population estimates, it allows standardised phenological parameters to be derived from field data and provides a standard curve for theoretical studies. The method is applicable to all insect species whose adults have distinct, non-overwintering generations and to all other monitored organisms with similar demographies.

Introduction

There is a need to obtain accurate measurements of the abundance of multiple species of insects across regions and ranges, for example to describe patterns of change, to identify and understand the drivers of species’ population dynamics (Nowicki et al. 2008; Thomas, Simcox & Hovestadt 2011) or to monitor the attainment of national and international conservation targets (Nayar 2010). Whereas freshwater systems are well monitored by professional entomologists in many developed countries, extensive monitoring of terrestrial insects depends largely on schemes that utilise enthusiastic, skilled amateur recorders. In practice, this often restricts wide-scale, long-term monitoring to butterflies; although in the UK, a national scheme exists for moths and is proposed for dragonflies and bumblebees (Thomas 2005; Conrad, Fox & Woiwod 2007).

The transect walk is the most widely used scientific method of monitoring butterfly numbers at a site (Pollard et al. 1975). A fixed route through the site is walked once a week under prescribed weather conditions, from 1st April to 29th September inclusive in the UK, by a recorder who counts the number of each species seen within a constant distance (usually a moving 5 × 5 × 5 m box). The UK Butterfly Monitoring Scheme (UKBMS) started in 1976 and currently samples c. 900 sites annually: results, analysis and interpretation of transect walks over the first 15 years were presented by Pollard & Yates (1993). The method is increasingly applied across Europe, where comprehensive schemes have been established in 14 nations, as well as in China and, more locally, in the US and elsewhere (Van Swaay et al. 2008).

From species data in standard field guidebooks, about 80% of the resident British butterfly species, up to 87% of European species, and a similar proportion in other temperate latitudes exhibit discrete flight periods for each generation, with daily counts of the number of adults in a sufficiently large population forming a well-defined and relatively smooth curve vs. time (the population curve). Moreover, the shape of this curve is rather similar for a wide variety of cases: for different species, generations, sites, years, and population sizes (Pollard & Yates 1993; Rothery & Roy 2001). The timing, duration and peak number vary with species, site and year (e.g. according to annual climatic conditions) but can be treated as parameters in a single general-purpose shape of curve.

In current butterfly and moth monitoring schemes, the results are expressed as relative changes in abundance obtained by summing the weekly mean count for each species, providing an Index of Abundance for every generation. The advantages of this simple method were considered to outweigh its disadvantages, such as the uncertainties generated by missing counts. Pollard & Yates (1993) advocated devising a generic mathematical formula for the time-variation in counts, which could be fitted to data with missing weeks so that the Index of Abundance could still be calculated. Such a formula would also be valuable for exploring the reliability of population indexes based on smaller numbers (Zonneveld 1991; Rothery & Roy 2001; Gross et al. 2007; Nowicki et al. 2008) and for calculating key demographic parameter values used in pure and applied ecology.

A model of this type was presented by Manly (1974; subsequently M74), to represent the population curves of individuals at successive life stages of insects. He assumed that the time at which insects enter a life stage follows a normal distribution, and that the survival rate per day after entry to the stage follows an exponential decay with time, that is, a constant age-specific death rate. He obtained a mathematical solution for the generalised form of the population curve expressed as an integral (also see Rothery & Roy 2001). Zonneveld (1991; subsequently Z91), who devised another model of this class, also assumed a constant death rate but chose a logistic distribution for the rate at which adult butterflies emerge (eclose). He assumed that the stochastic variation in individual counts is Poisson-distributed and used a maximum-likelihood method to fit the model to field data. Use of the Z91 model was greatly eased by its incorporation into free software, the Insect Count Analyzer (INCA, 2002).

General additive models (Rothery & Roy 2001) serve a similar purpose of smoothing and interpolating transect data but do not yield a ‘universal’ mathematical formula for the population curve nor allow quantities such as total population size and life span to be estimated.

Applications of the M74 and Z91/INCA models fall into two types:

However, in a review of butterfly monitoring methods, Nowicki et al. (2008) commented that the Z91 model remains difficult to apply and consequently rarely used. It has yet to be adopted by major schemes, perhaps because, while providing a good fit to the population curves of several example species, it fails to generate estimates when numbers are low or counts are irregular or infrequent, a common occurrence in transect data sets. One practical and conceptual disadvantage of the M74 and Z91 models, which may affect their adoption, is that their population-curve equations were presented as integrals that can only be evaluated numerically.

Here, building on the M74 and Z91 models and in the hope of promoting wider use of such models, we present a new algebraic equation to describe the shape of the population curve (POPFIT). It contains four readily interpretable parameters that can be fitted to transect data from individual species, sites and years. As with earlier models, our mathematical representation is not applicable to the 13–20% of temperate butterfly species that are continuously brooded, are predominantly migratory, have overlapping broods, or over-winter as adults, nor to sites with very small observable populations (peak daily count <15). The model is equally applicable to other taxa, including UK moths (Conrad, Fox & Woiwod 2007), whose discrete populations are regularly sampled.

Materials and methods

Derivation of POPFIT model

The objective is to derive a general-purpose mathematical expression defining a curve that approximates closely to observed sequences of insect counts vs. time, and to find a method of optimising the parameters involved to obtain the best fit to data.

Consider a colony of a species with discrete non-overwintering adult generations, occupying an isolated area of suitable habitat with a statistically adequate population. INCA (2002) recommends peak daily count ≥15, but smaller numbers may suffice if the counts form an orderly curve. We assume no immigration or emigration so that the only processes determining the daily numbers of adults are eclosion and death. Observed emigration rates for non-migratory species are typically <5–10% of the population (Thomas 1983). A low rate of emigration or immigration would cause a small increase or decrease in the apparent death rate, respectively. A single generation is modelled, whose total detectable population N is defined as the number of adults successfully eclosing during the flight period (excluding deaths before their wings are fully functional). Typically, as adults start to eclose, their number increases rapidly over several days to a peak (npk ≤ N) and then decreases slowly as individuals die, until none remains.

For brevity, we will call N the ‘population’, but by this, we mean the detectable population. We note that this may be smaller than the true population by a factor of between one and five (Isaac et al. 2011), depending on species and site, because the probability of detection decreases with distance from the observer (and may vary in dynamic habitats), some insects may be hidden by vegetation (depending on habitat structure), and in some species, the females are harder to detect than the males. Thus, N (and the transect population N*) could be thought of as a population index, especially for rare species where some may be undetected before dying. In many cases, the main interest is in relative comparisons, where such concerns are less relevant.

The population curve is represented mathematically by n(t), the population n present at time t. Time is measured in days and is treated as a continuum. The number alive on a given day depends on both the previous eclosion history and the survival rates of adults. Following similar reasoning to M74 and Z91, the rate of change of n(t) can be expressed as a differential equation:

image(eqn1)

where t′ = tt0 is the time since the start of eclosion (t0), TE is the length of time over which eclosion occurs, and (1/TE)E(t/TE) is an eclosion function whose integral over the eclosion period 0 ≤ t′ ≤ TE is one. The first term on the right-hand side corresponds to the eclosion rate, the second term to the death rate for exponentially decreasing survival [n(t)∼exp(−t/T) if E(t/TE) = 0] with mean life span T.

We express the normalised eclosion rate as:

image(eqn2)

The sine-cubed function differs only slightly from the normal (M74, Rothery & Roy 2001) and logistic (Z91, INCA) distributions within the central 90% of the cumulative curve (Fig. 1a) but has certain advantages over them:

Figure 1.

 (a) Comparison of three cumulative eclosion distributions: normal, logistic and sine-cubed. The curves are matched at the 5th and 95th percentiles. (b) Family of population curves (eqn 3) showing scaled population n(t)/N vs. scaled time t′/TE, for values of T/TE = 0·05–0·3.

  • 1 It is zero outside the eclosion period 0 ≤ t′ ≤ TE and hence eliminates the possibility of an individual eclosing before it has pupated: normal and logistic distributions have a finite (albeit small) probability that this could occur,
  • 2 The solution of eqn 1 becomes an algebraic expression, which is conceptually more attractive than previous integral expressions, and confers appreciable computational benefit.

Differences between the eclosion distributions are so slight (Fig. 1a) that we do not expect the results of the model to depend strongly on which one is used, as found by M74 with three alternative distributions in his model. It is also unlikely that a best-fit distribution could be selected by comparison with field observations (e.g. from emergence traps), which are scarce and variable. The benefits outlined above therefore outweigh the greater familiarity of other distributions.

The population curve, n(t), is given by the solution of eqn 1 (see Appendix S1 in Supporting Information), with eqn 2 for E(t/TE), and an initial condition of n(t0) = 0 (i.e. no previous-generation adults present at start of eclosion). Written in terms of a scaled time = πt′/TE, and a dimensionless parameter TE/(πT), the population curve is obtained as a three-part equation, for times before, during and after the eclosion period:

image(eqn3)

The population curve is a function of four physical parameters: the total population N, the start-date (t0) and length (TE) of eclosion, and the mean life span T. The proportion of the population alive at time t, n(t)/N, is a function of only the scaled time X and parameter a, or alternatively t/TE and T/TE. The shape of the curve is determined only by a (or T/TE), as noted by Z91.

The family of curves of n(t)/N for six values of T/TE (Fig. 1b) has a broadly similar shape to a set illustrated by Z91. Taking T/TE = 0·15 as an example, n(t)/N rises to a maximum of 0·29 at time 0·60TE and falls to effectively zero at 1·5TE. Thus, 29% of the total population are alive on the day of peak numbers, which occurs slightly later than the time of peak eclosion (0·5TE), and the decline in numbers takes about 50% longer than the rise. Larger values of T/TE have progressively larger ratios of peak to total population, a later time of the peak population and a greater asymmetry between rise and fall; smaller values of T/TE have the converse.

The four parameters (N, t0, TE, T) can be optimised to fit eqn 3 to a sequence of observed counts. Following Z91, we assume that each count is an independent Poisson-distributed sample of the population and optimise the four parameters so as to maximise the overall likelihood that for each count the mean (and variance) are equal to the theoretical value n(t) given by eqn 3. We use a global search in the four-dimensional parameter-space with successive grid refinements to locate the global maximum of the log-likelihood function. This procedure reduces the possibility of locating only a local maximum, which might occur with gradient-based optimisation methods, and it does not need prescribed starting values other than lower and upper bounds for the parameters. Standard errors and cross-correlations of the optimised parameters were calculated by inverting the information matrix, following Z91. Appendix S1 gives computational details.

Extension from site to transect data

The approach can be applied to transect data, as well as to an entire site (Z91). Suppose, that the transect count, n*(t), on a particular day represents a proportion c of the detectable population on the whole site, n(t), so that n*(t) = c.n(t). If c is constant throughout the flight period, it follows that the total detectable population N* in the transect corridor is related to the total site population by N* = c.N. The coefficient c combines the ratio of the area of the transect corridor (width W, length L) to the area A of the site, and the search efficiency ε (discussed later). Extending Thomas’s (1983) approach,

image(eqn4)

An estimate of the total transect population, N*, of a species can be converted via eqn 4 into the (more useful) total site population, N, provided that the value of ε is known. We assume that t0, TE and T deduced from transect data are representative of the whole site, so they are not identified separately.

Index of abundance

The Index of Abundance, IA, defined by the sum of weekly transect counts of a species over a generation (Pollard & Yates 1993), is often taken to be a surrogate measure of the total site population N. It can be shown (Appendix S1) that, for a transect walked every Δt days, N and IA are related through:

image(eqn5)

Combining eqns 4 and 5 yields = (AΔtWLT).IA, where Δt = 7 days for the standard Pollard Walk.

Proportion of total population alive on peak day

The number of adults alive at the peak of the flight period, npk, is a key parameter when assessing the resilience of populations to stochastic events and Allee effects. Moreover, if npk is obtained through transect counts at a site, it provides a quick method of estimating total population size. Thomas (1983) noted that N is approximately three times npk, based on observations of npk/N for four species at a variety of sites. This approximation was extended empirically by Nowicki et al. (2005), who found for six butterfly and moth species over a range of sites and years that the ratio npk/N had an approximately linear relationship with the ratio of mean life span to flight period length.

In eqn 1, the peak population occurs for dn/dt = 0, when the two terms on the right-hand side balance. This does not have a simple solution, but npk/N can be obtained from eqn 3 by computation for selected values of a. An approximate relationship can be obtained by noting in Fig. 1b that for small values of T/TE, the value of n at t′/TE = 0·5 is only slightly smaller than npk. Thus, for small T/TE, eqn 1 reduces to npk/≈ (N/TE).E(½), and hence, substituting E(½) = 3π/4 (eqn 2),

image(eqn6)

Equation 6 thus offers an alternative method of estimating N from npk. Thomas’s (1983) typical value of npk/= 1/3 corresponds to T/TE = 0·141 in eqn 6.

Results

Testing the model with field data

To illustrate the use of the POPFIT model, we fit eqn 3 to observed transect counts for three species for various sampling regimes, sites and numbers of years, by optimising the four parameters N*, t0,TE and T. The goodness-of-fit of the optimised curves to the data is measured by the relative root-mean-square error, RRE = (r.m.s. error)/(r.m.s. count). Note that, t0 is the date at which the sine curve is zero: the actual date that the first adult ecloses will be some days later, especially for small populations.

Pollard et al. (1975) reported a series of counts of Aphantopus hyperantus L. made in 1973 on a woodland transect route in eastern England (Fig. 2a). Several counts were made each week and, on some days, more than one count by different recorders. We fitted eqn 3 to their data, obtaining optimised values (±SE) of N* = 616 ± 81, t0 = day-number 174 (23 June) ±0·4 day, TE = 35 ± 1·4 days and = 3·6 ± 0·5 days, with RRE = 18%. The maximum observed count was 134, so 22% of the total population were alive on the peak day.

Figure 2.

 Transect counts (symbols) and fitted curves (eqn 3) for (a) Aphantopus hyperantus in 1973 (data from Pollard & Yates 1993); (b) Hesperia comma in 2007; (c–f) Melanargia galathea in 1997 at UK Butterfly Monitoring Scheme (UKBMS) sites 32, 117, 2245 and 1059, respectively. Note 5 × expanded Count axis in (f).

In 2003–2010, R.L.S. counted Hesperia comma L. approximately twice-weekly on a transect route that sampled an isolated English chalk grassland site. Fitting eqn 3 to each year’s data showed that over these 8 years (Table 1, Fig. S2), N* varied between 133 and 455 individuals, t0 between day-numbers 191 (10 July) and 211 (30 July), TE between 23 and 50 days, and T between 3·0 and 10·3 days. The SE of T and t0 are 1–2 days, of TE are 3–5 days and the CVs of N* are 0·1–0·6. The RREs of the fitted curves are 6–29%. Figure 2b illustrates a good fit of the curve to data, and the other 7 years (Fig. S2) also give reasonably good fits.

Table 1.   Estimated parameters ± SE obtained by fitting eqn 3 to transect data for Hesperia comma
Year N* t 0 (day-no.) T E (days) T (days)RRE (%)
  1. N*, total transect population; t0, start-date of eclosion; TE, length of eclosion period; T, mean life span; and RRE, relative root-mean-square error of fit.

2003269 ± 110199 ± 140 ± 43·4 ± 1·429
2004175 ± 53200 ± 135 ± 44·4 ± 1·313
2005135 ± 53211 ± 236 ± 54·1 ± 1·621
2006340 ± 194195 ± 242 ± 43·0 ± 1·720
2007244 ± 46205 ± 129 ± 34·5 ± 0·811
2008133 ± 15199 ± 123 ± 310·3 ± 1·017
2009455 ± 195196 ± 150 ± 33·2 ± 1·418
2010285 ± 41191 ± 234 ± 57·5 ± 1·16

We applied the model to weekly transect counts of Melanargia galathea L. obtained from the UKBMS for four sites in southern England. High quality data are available for 1997 for all four sites and cover a wide range of population sizes. Equation 3 gave generally good fits to the data (Fig. 2c–f, RRE = 1·7–28%), with estimates of the parameters varying between sites of N* = 84–1940, t0 = day-number 155–170, TE = 42–55 days and = 3·3–7·1 days. Thus, the method works satisfactorily for a wide range of population sizes, although stochastic scatter of the counts around the curve is larger in relative terms for small populations (e.g. Fig. 2f).

UKBMS site 32 has data for M. galathea for every year since 1976. We fitted eqn 3 to the weekly transect counts for 33 years (1976–2008), yielding time series of the fitting parameters (Fig. 3). (Parameters t0 and TE are plotted in a modified form, for comparison with outputs from the INCA model, discussed later.) Fits were obtained for every year, although in 4 years, the CVs of one or more parameters exceeded 0·5, and in 1 year they could not be calculated (Appendix S1). Nonetheless, the parameter estimates in all those years conformed reasonably well to the general pattern.

Figure 3.

 Comparison between models POPFIT and INCA of estimated parameters from fits to Melanargia galathea data at UK Butterfly Monitoring Scheme (UKBMS) site 32. (a) Transect population N* (with SE for POPFIT), (b) time t5 at which 5% of adults have eclosed, (c) time interval (t95t5) between 5% and 95% of adults eclosing, (d) mean life span T (POPFIT, with SE) and inverse death rate 1/α (INCA).

Index of abundance and total population

Using the 33 years of transect data for M. galathea at site 32, plus the three other sites for 1997, we found (Fig. 4a) that the 1 : 1 relationship between N*T/7 from the fitted curves and IA calculated from the counts data holds closely (R2 = 0·992, < 0·01), as expected mathematically (eqn 5). A direct comparison of N* with IA (Fig. 4b) also shows reasonably close correspondence (R2 = 0·766, < 0·01) and is equivalent to a constant T of 7/1·48 = 4·7 days. Figure 4b has less scatter than an equivalent plot by Haddad et al. (2008) for Neonympha mitchelli (R2 = 0·55). An adequate practical estimate of N* could thus be obtained, for species like M. galathea, directly from the transect IA and converted to the site population N using eqns 4 and 5.

Figure 4.

 Population estimates for Melanargia galathea at site 32, 1976–2008, and sites 117, 1059 and 2245 for 1997 showing good correlations between (a) N*Tt and IA, with Δt = 7 days; (b) N* and IA.

Intercomparison of POPFIT and INCA models

We used the 33-year M. galathea data set in a comparative test of our model (POPFIT) and the freely available Insect Count Analyzer software INCA (2002). INCA failed to fit a curve for 12 of the 33 years, giving an error message: ‘Estimation failed during second phase’, whereas POPFIT provided a fit with parameters estimated for every year (but without SE in 1 year).

Both models have four parameters, of which N* appears in both models, while μ, β and α in INCA fulfil similar roles to t0, TE and T, respectively, in POPFIT. To compare the models, we take 1/α to be analogous to T, and calculate t5 and (t95t5) from both models, where t5 and t95 are the times at which the 5th and 95th percentiles of the cumulative eclosion functions occur. Thus, t5 is a measure of the timing, and (t95t5) a measure of the length, of the eclosion period. For POPFIT, t5 = t0 + 0·240TE and (t95t5) = 0·520TE; for INCA, t5 = μ−2·94β and (t95t5) = 5·89β. The four parameters N*, t5, (t95t5) and T agree reasonably well between the two models (Fig. 3), for those cases where INCA provides a fit. The difference between the estimates of N* from the two models is smaller than the sum of their SE in all but three of the 21 comparable years.

The SE of N* generated by POPFIT are smaller than those for INCA in 13 of these 21 years. Standard errors for other parameters from the two models are not directly comparable: accordingly, we compare the CVs for N*, TE and T from POPFIT with those for corresponding parameters N*, β and α from INCA, and the SE of t0 with μ (for which CVs are not meaningful). The frequency distributions of the CVs and SE (Fig. 5) are fairly similar for the two models, although POPFIT has more cases than INCA with CV < 0·2 (or SE < 1 day). For both models, the distributions of the CVs of the highly correlated parameters N* and T (and α) are very similar (Fig. 5a,d), with almost half the POPFIT cases having CV < 0·2 for both N* and T.

Figure 5.

 Comparison between models POPFIT and INCA of uncertainties in parameter estimates (frequency distributions) and failure rates for fits to 33 Melanargia galathea data sets. (a) Transect population N*, (b) measures of timing of start (t0, POPFIT) and mean (μ, INCA) of eclosion, (c) measures of duration of eclosion (TE, POPFIT; β, INCA), (d) life span (T, POPFIT) and death rate (α, INCA).

Discussion

POPFIT is not the first model to describe insect population curves, but we consider it has certain advantages owing to its simplicity of expression without loss of accuracy, its straightforward interpretation of the four population parameters involved, and in providing a more reliable method of fitting the curve to a sequence of counts. Here, we compare it with some attributes and applications of earlier models. We also examine the assumption utilised in all these models of a constant death rate. Differences between the eclosion functions were discussed earlier in Methods.

Death rate

The assumption of a constant death rate made by M74, Z91 and POPFIT greatly simplifies the mathematics and is consistent with observations of montane Colias species in California (Watt et al. 1977). Here, we provide further evidence based on field measurements made in 1972–82 using emergence trap and mark-release-recapture (MRR) techniques on populations of Maculinea arion L. and M. nausithous Bergsträsser (Thomas, Simcox & Clarke 2009). The survival data for M. arion over five successive years (Fig. 6) are fitted well by log-linear lines (0·940 < R2 < 0·979, < 0·01) corresponding to a Type II survivorship curve. No emigration was detected to any neighbouring site, so it is probable that the exponentially decaying survivorship can be attributed primarily to a constant death rate. The mean life spans T deduced from the slopes of the fitted lines (Table 2) are 3·0–4·9 days, and the corresponding 24-h death rates [1−exp(−1/T)] are 0·19–0·28 per day. The 5-year average of these life spans [3·90 ± 0·34 (SE) days] compares well with that for the statistical mean life spans [3·81 ± 0·51 days] (Table 2). Similar results were found for 1 year of M. nausithous observations (Table 2, Fig. S1a).

Figure 6.

 Adult population estimates for Maculinea arion showing Type II survivorship curves (constant loss rate): lines fitted by least-squares exponential regression.

Table 2.   Life spans of Maculinea arion and M. nausithous
Species (year)<τ> (days) T (days)Death rate R 2
  1. <τ> = arithmetic mean life span; = time-scale by exponential regression to data; 24-h death rate (day−1) = 1−exp(−1/T).

arion (1973)4·614·90·190·979
arion (1974)4·004·60·200·940
arion (1975)3·033·80·230·957
arion (1976)3·713·20·270·977
arion (1977)3·713·00·280·959
nausithous (1983)4·483·60·240·980

To test the robustness of the constant death rate assumption, we tried fitting second-order polynomials of log(no. surviving) vs. age to the six Maculinea data sets. Deviations from linearity were small and inconsistent (Fig. S1a), and the mean coefficient of (age)2 was not significantly different from zero (P > 0·1, t-test 5df).

Haddad et al. (2008) questioned Z91’s assumption that butterfly death rates remain constant throughout the flight season. We tested this directly against the M. arion field data by fitting regression lines to the longevities of 257 individuals against the day on which each individual eclosed (Fig. S1b). No correlation was found between life span and eclosion-day for either a linear fit (R2 = 0·014) or a quadratic fit.

We conclude that an age-independent death rate, constant throughout the flight season, best represents the mortality of the species examined. In the absence of contrary evidence, this conclusion could provisionally be extended to other butterfly species, although tests for individual species would be desirable if applying POPFIT. If the death rate is not constant, the population curve can be obtained by stepwise numerical integration of eqn 1, but not generally as an algebraic expression.

Reliability

Reported tests of the INCA software on real and simulated butterfly counts data found that it failed to fit a curve for 15–40% of cases (Haddad et al. 2008; Marschalek & Deutschman 2008). In our comparative tests of the POPFIT and INCA models against the 33 years of M. galathea transect data, INCA failed in 36% of cases, whereas POPFIT always provided results (although in 1 year, the SE and correlations could not be calculated because the information matrix was nearly singular).

The greater reliability of POPFIT compared with INCA might be due either to the finite-length sine-cubed eclosion function, or to the grid-search optimisation method. We therefore tested whether the Z91 population curve (with logistic-based eclosion function) could be fitted more reliably by our grid-search optimisation than the INCA method, again using the M. galathea data. This required numerical integration (by quadrature) of Z91’s logistic-based integral. Parameter estimates were obtained in all 33 cases, but with the penalty of a 37-fold increase in computing time compared with POPFIT because of the demands of the numerical integrations (6·8 h c.f. 11 min for 33 cases on an Apple-Mac). Furthermore, in 18 cases, standard errors and cross-correlations could not be obtained because the information matrix was nearly singular (c.f. one case for POPFIT). A comparative test using numerical integration with our sine-cubed eclosion function gave numerically identical results to our analytical solution (eqn 3) but with a 30-fold increase in run-time.

Thus, it appears that both the finite-length eclosion function and the grid-search optimisation contribute to the greater reliability of POPFIT, and its analytical expression for the population curve reduces computational run-time by a factor of about 30.

Confounding of N* and T

As Z91 and Nowicki et al. (2008) noted, N* and T are strongly, but spuriously, inversely correlated through eqn 5, because IA is fixed for a particular data set. This limits the scope for optimising values of N* and T independently when fitting our (or any similar) model to data. Stochastic or weather-related irregularities in the data, especially near the end of the flight period, can unduly influence the value of T, which inversely affects N*. Nowicki et al. (2008) illustrated this problem with three very similar-looking population curves whose N* and T varied individually through a factor of two but whose products N*T were closely similar.

In the M. galathea fits, two of the 33 years have correlations between N* and T whose modulus exceeds 0·99. A very high correlation suggests that the data are over-parameterised with a four-parameter model. An alternative three-parameter fit could be made, for example, by fixing T at a typical value for the species (INCA 2002). However, it may be more consistent to apply the same model to all sites and years in a series, while heeding the CVs obtained.

Sampling frequency

The practical problem of devising optimal cost-effective monitoring and analysis schemes has been discussed extensively (Mattoni et al. 2001; Zonneveld, Longcore & Mulder 2003; Nowicki et al. 2005, 2008; Gross et al. 2007; Haddad et al. 2008). The frequency of repeat surveys is a major element of the monetary or voluntary effort cost. Many studies advocated counts being made every one to three days, yet that frequency is impracticable for volunteer-based schemes (e.g. the UKBMS) which mainly adopt the Pollard standard of weekly walks. Models like ours generally perform better with closely spaced data, for example Haddad et al. (2008) found that the failure rate of INCA increased from 15% with daily sampling to 25% with semi-daily sampling: yet, our tests of POPFIT generated satisfactory curves with weekly transect data.

Search efficiency

The search efficiency ε is a measure of how representative transect counts are of the whole site population. It can exceed one if the transect route over-represents the butterfly density on the site or if individual butterflies are counted more than once. Thomas (1983) verified that the relationship n*(t) = c.n(t) holds with a constant, but species-dependent, value of ε by comparing estimates of n obtained from MRR experiments with n* obtained from transect counts on the same day, for six species at several sites and at various times in their flight periods (Appendix S2). For these species and sites, ε lies in the range 0·3–3, so that the site population can be calculated from the transect population to within a factor of three by assuming ε = 1 in eqn 4, and more accurately if an estimate of ε is known for the species in question.

We assume that the factor c is constant, but in practice, the value of ε, and hence c, could vary with weather conditions and the date and time of day. This could influence the interpretation of population curves through application of POPFIT. However, this assumption is consistent with other methods of analysis of butterfly counts, for example, MRR surveys, Index of Abundance, and models of M74 and Z91, all of which would suffer similar uncertainties of interpretation.

Applications

We have proposed a new approach (POPFIT) for obtaining practical estimates of the total transect population N* from standardised counts, such as those obtained from UKBMS walks. The model fits eqn 3 to the counts data to derive estimates (with SE) of N*, t0 (eclosion starting date), TE (length of eclosion period) and T (mean adult life span). This builds on the existing methods of M74, Z91 and INCA.

POPFIT is better suited than other models in this class to automated fitting to transect data en masse, because it is more reliable. The reliability and gap-filling properties of POPFIT compare favourably with the current approach employed by European butterfly monitoring schemes of using IA as a surrogate of N, and POPFIT additionally provides confidence limits. The applicability criteria must of course be respected, and checks made to identify cases where the fit is poor (large RRE or CVs).

Comparing the POPFIT and INCA methods, (a) INCA is a freely available, user-friendly, well-documented software, whereas POPFIT is at present only a research-level code; (b) INCA runs about three times faster than POPFIT, but application of the more successful grid-search optimisation to the Z91 formula was much slower (30 times); (c) INCA failed to provide estimates in 36% of cases, which is a disadvantage for automated fitting to multiple data sets, whereas POPFIT did not fail for any of the data sets tested although it was unable to provide SE in 3% of cases; (d) comparable output parameters had similar estimates from the two models in most cases; and (e) the CVs were generally slightly smaller for POPFIT than INCA.

Two simpler, but less informative, methods are also presented for estimating N*:

  • 1 Derive N* from the Index of Abundance IA, using eqn 5 with a typical value of T for the species;
  • 2 Derive N* from the peak population npk, using eqn 6 with a typical value of T/TE for the species.

Both methods are derived from easily obtainable data, but they rely on knowing values of T or T/TE which must be assumed constant, and they do not provide confidence intervals for N*.

The total detectable site population N can be derived from the transect population N* using eqn 4, with specified search efficiency ε if known, or otherwise a default value of ε = 1. If POPFIT is used, the values of t0, TE and T are representative of the site as well as the transect route. Equation 3 can also be used as a standard curve in theoretical studies of insect populations, for example, to aid the design of optimal monitoring schemes.

POPFIT can be applied to any scheme that monitors organisms whose adults (or another conspicuous life stage) have discrete, non-overwintering generations, including moths (Conrad, Fox & Woiwod 2007), dragonflies, bumblebees and hoverflies (Thomas 2005). The estimates, with confidence limits, that it can reliably generate of absolute, relative or changing population sizes are increasingly important in applied biology, for example in monitoring: the densities of agricultural and forestry pests; change in species, like pollinators, that provide essential ecosystem services; and the attainment of national conservation targets and the results of management on individual sites. Other demographic parameters generated, such as the number of individuals alive at the peak of the flight period, are invaluable when assessing the resilience of a population to short-lived events, including its vulnerability to collectors or its suitability as sources for conservation translocations (Thomas 1983, 2005). The model also provides a tightly prescribed means of comparing variation in phenology in species, for example in populations over time in response to climate warming, or geographically when detecting sub-specific adaptations to the local environment. Finally, an objective fit is also essential when these databases are employed for more academic research, such as exploring population dynamic processes in species with different life-history traits.

Acknowledgements

We thank David Roy (CEH) for helpful comments and supplying data for M. galathea, and the dedicated butterfly recorders who gathered the original transect data. J.A.T. thanks the FP6 BiodivERsA Era-net project CLIMIT and the German FMER for funding.

Ancillary