Opportunistic citizen science data of animal species produce reliable estimates of distribution trends if analysed with occupancy models

Authors


Summary

  1. Many publications documenting large-scale trends in the distribution of species make use of opportunistic citizen data, that is, observations of species collected without standardized field protocol and without explicit sampling design. It is a challenge to achieve reliable estimates of distribution trends from them, because opportunistic citizen science data may suffer from changes in field efforts over time (observation bias), from incomplete and selective recording by observers (reporting bias) and from geographical bias. These, in addition to detection bias, may lead to spurious trends.
  2. We investigated whether occupancy models can correct for the observation, reporting and detection biases in opportunistic data. Occupancy models use detection/nondetection data and yield estimates of the percentage of occupied sites (occupancy) per year. These models take the imperfect detection of species into account. By correcting for detection bias, they may simultaneously correct for observation and reporting bias as well. We compared trends in occupancy (or distribution) of butterfly and dragonfly species derived from opportunistic data with those derived from standardized monitoring data. All data came from the same grid squares and years, in order to avoid any geographical bias in this comparison.
  3. Distribution trends in opportunistic and monitoring data were well-matched. Strong trends observed in monitoring data were rarely missed in opportunistic data.
  4. Synthesis and applications. Opportunistic data can be used for monitoring purposes if occupancy models are used for analysis. Occupancy models are able to control for the common biases encountered with opportunistic data, enabling species trends to be monitored for species groups and regions where it is not feasible to collect standardized data on a large scale. Opportunistic data may thus become an important source of information to track distribution trends in many groups of species.

Introduction

Information on changes in species distribution is widely used to document changes in biodiversity. This information, along with other factors, is crucial for compiling the International Union for Conservation of Nature (IUCN) Red Lists (Van Swaay & Warren 1999; Maes et al. 2012). To measure changes in distribution, all observation records of a species are usually combined to provide a grid covering the area in question, after which the change in the number of occupied sites over time is quantified (Kuussaari et al. 2007; Maes et al. 2012).

Citizen science data are considered to be a potentially valuable source of information on changes in species distribution and biodiversity (Schmeller et al. 2009). Most researchers recognize the value of citizen science data collected with a standardized field protocol (Schmeller et al. 2009). But often citizen data are collected without standardized field protocol and without a design ensuring the geographical representativeness of the sampled sites. Recently, the number of such ‘opportunistic’ citizen science records has increased greatly through the online citizen science databases in many countries, with data entry facilitated through internet portals (e.g. waarneming.nl and observado.org). However, the value of these opportunistic citizen science data is less clear for several reasons.

First, because opportunistic data are rarely collected through a designed scheme, they suffer from uneven geographical distribution of surveyed sites (geographical bias; Dennis, Sparks & Hardy 1999; Dennis & Thomas 2000; Hassall 2012). Second, the lack of standardization of observer efforts within sites leads to variable search efforts, which may bring bias into the data (observation bias; e.g. Kuussaari et al. 2007). Third, many observers do not report all species observed, but only those they find interesting (reporting bias). Finally, observers are unable to detect all species occurring in a site (detection bias; MacKenzie et al. 2006). Detection bias is not unique to opportunistic data though and may equally apply to standardized data (MacKenzie et al. 2006). These imperfections in opportunistic data may lead to artificial trends or mask existing trends (Dennis & Thomas 2000; Kuussaari et al. 2007; Kéry et al. 2010; Szabo et al. 2010). For instance, Snäll et al. (2010) demonstrated a poor match between trends in opportunistic data and standardized monitoring data for birds in Sweden.

In an effort to make opportunistic data usable, many authors have developed methods to cope with the biases (Botts, Erasmus & Alexander 2012). Some authors compared only well-surveyed sites in two periods to estimate changes in species distribution (e.g. Kuussaari et al. 2007; Maes et al. 2012). Others used a statistical model with proxies for observation effort such as the duration of field visits (Kindberg, Ericsson & Swenson 2009) or the number of species observed (Szabo et al. 2010). A rather new method is the use of occupancy models. These models were created to account for detection bias (MacKenzie et al. 2006), but we believe they are particularly promising when it comes to handling unknown and varying observation efforts (Van Strien et al. 2010; Van Strien, van Swaay & Kéry 2011).

Occupancy models separate occupancy (the presence of a species in a site) from detection (the observation of the species in that site) when analysing field survey data (MacKenzie et al. 2006; Royle & Kéry 2007; Royle & Dorazio 2008). The models require detection/nondetection notes that are arranged in so-called detection histories per site per year, such as ‘01’ for a site where the species under focus is detected during the second visit, but not during the first visit. From the frequency of the different detection histories of the sites, occupancy and detection can be inferred. As an example, suppose an observer has surveyed 200 sites twice and found ‘00’ for 125 sites, and ‘01’, ‘10’ and ‘11’ for 25 sites each. This frequency of detection histories is most likely to be generated by occupancy and detection probability both being 0·5. Next year, the same observer surveys the 200 sites again, resulting in 100 times ‘00’, 10 times ‘01’, 10 times ‘10’ and 80 times ‘11’. The most likely estimates for occupancy and detection probability now are 0·5 and 0·9 respectively. Although the species has been observed in more sites than in the first year (100 vs. 75), the higher frequency of ‘11’ reveals that there is no change in occupancy, but a change in detection caused for instance by an increased observation effort. In this way, by correcting for detection bias, occupancy models may simultaneously correct for observation bias.

Based on this theoretical consideration, Kéry et al. (2010) suggested that occupancy models are useful to derive unbiased trends in distribution from opportunistic data. Kéry et al. (2010) used day-lists of species as opportunistic data and all observers involved were committed to report any of the species from a fixed set of species to prevent reporting bias. Van Strien et al. (2010) extended the approach of Kéry et al. (2010) by claiming that also day-lists which suffer from reporting bias can be used, if an adapted model is being applied (see below for explanation). By correcting for detection bias, these models then simultaneously correct for reporting bias as well.

Although correction methods for opportunistic data as mentioned above are widely used in the literature, evidence showing their validity is surprisingly scarce (Botts, Erasmus & Alexander 2012). Here, we describe the results of a large-scale validation study in which we compared trends produced by an occupancy model applied to opportunistic data with the golden standard, that is, trends obtained from independent and standardized monitoring data. In an exploratory study, Van Strien et al. (2010) compared trends of six dragonfly species using data which partly came from different sites. Here, we expand this comparison substantially to 37 butterfly and 40 dragonfly species. To further improve this comparison, we included only sites and years for which we had both opportunistic data and monitoring data available, thereby excluding any effect of geographical and temporal differences between the data sources. For butterflies, this resulted in data for 1990–2010 and for dragonflies in data for 1999–2011.

Material and methods

Study Species

We included in our study all butterfly and dragonfly species occurring in the Netherlands, with the exception of some species for the reasons provided below. Occupancy models restrict the analysis to data collected in a period of closure. In this period, a site is assumed to be either occupied or unoccupied, but must not become permanently abandoned or colonized during the period of surveys within the year (MacKenzie et al. 2006). Hence, we excluded migratory butterfly species from the analysis. In addition, occupancy models require data from at least some tens of sites (MacKenzie et al. 2006), so we also excluded very rare species, here defined as inhabiting less than 35 1 × 1 km sites in the study period. Finally, species that require targeted search efforts were excluded, such as Phengaris alcon, Apatura iris and Thecla betulae. For such species, it is impossible to derive information about their nondetection from the opportunistic data while such information is required for occupancy models (see below for explanation). Nomenclature follows Fauna Europaea (www.faunaeur.org).

Data

The following three data sets were used.

National Database Flora and Fauna – This database comprised opportunistic sightings of many different species groups in the Netherlands. The recent facilities for easy data entry on the internet have led to a sharp rise in the number of records, mainly through the sites www.vlindernet.nl/landkaartje, www.telmee.nl and www.waarneming.nl. The recording database contained over 60 000 annual presence records of butterfly species in 1990–2000 and almost twice that number in 2001–2010 (Table 1). The number of records for butterflies was 3–4 times greater than the number of records for dragonflies. For both butterflies and dragonflies, the number of visits per site has also risen (Table 1). Almost 40–50% of all day visits produced single records; the rest were day-lists with records of multiple species (Table 1).

Table 1. Characteristics of opportunistic presence-only data collected per year for two species groups in two periods. Only a selection of all available opportunistic data was used in this study. The total number of 1 × 1 km sites in the Netherlands amounts 37 065
 Annual number of recordsAnnual number of sites surveyedAnnual number of visits per site% visits producing
Comprehensive day-listsShort day-listsSingle records
Butterflies
1990–200062 11765093·6252945
2001–2010118 35210 6564·4233047
Dragonflies
1990–200016 05022842·1332938
2001–201150 57954093·1282844

Dutch Butterfly Monitoring Scheme – This scheme started in 1990 and applies the method developed for the British Butterfly Monitoring Scheme (Pollard & Yates 1993). Counts were conducted along fixed transects which were typically 1 km long. Observers recorded all butterflies within 2·5 m on either side and within 5 m ahead and above them. Weekly surveys were conducted between 1 April and 30 September when weather conditions met specified criteria (Van Swaay, Termaat & Plate 2011).

Dutch Dragonfly Monitoring Scheme – This scheme started in 1999. Counts were conducted along fixed transects which were typically 250 m long and located along water bodies (Van Swaay, Termaat & Plate 2011). Observers recorded all dragonflies within 5–7 m. Surveys were done once every 2 weeks between 1 April and 30 September when weather conditions were suitable.

We estimated annual occupancy, that is, the proportion of suitable 1 × 1 km grid squares occupied per year. A square is defined as suitable if the species had been recorded there at least once in 1990–2010 (butterflies) or 1999–2011 (dragonflies). Data from each of the three sources were quantized in data at a 1 × 1 km resolution. Some 1 × 1 km sites have more than one monitoring transect. The number of monitoring transects has increased over time in both the butterfly and the dragonfly scheme. Consequently, in earlier years, there were no counts for those transects that entered the schemes in later years. For each species group, we selected only those site and year combinations from the opportunistic data for which we also had monitoring data available. This resulted in an average of almost 300 1 × 1 km sites per butterfly species and about 150 1 × 1 km sites for the dragonfly species (Table 2). In these sites, about seven replicated visits per site were available in the monitoring data and about four in the opportunistic data (Table 2).

Table 2. Characteristics of monitoring data and opportunistic data used in this study. Data per species group were collected on the same sites and years in 1990–2010 (butterflies) and 1999–2011 (dragonflies). The number of sites and visits varies between species because species differ in rarity and in closure period
 Number of speciesMean no. of sites per species (±SE)Mean no. of visits per site in monitoring data (±SE)Mean no. of visits per site in opportunistic data (±SE)
Butterflies37288 ± 32·47·2 ± 0·264·2 ± 0·13
Dragonflies40148 ± 13·56·5 ± 0·254·3 ± 0·09

Generating Nondetection Records

Occupancy models require detection/nondetection data collected during replicated visits. Valid replicated visits are only those visits made in a period of closure within the year (MacKenzie et al. 2006). To meet the closure assumption, we restricted the data to time windows of about 40–60 days (mean 52) for butterflies and 50–150 (mean 95) for dragonflies, which varied for different species according to their flight period. For butterfly species hibernating as adults, we used the period of the first summer generation. For butterflies with more than one generation, we used the period of the second generation. For the only dragonfly species with more than one generation in our study, Sympecma fusca, we used the period of the spring generation.

Nondetection data for each species were extracted from the monitoring scheme data and consisted of all visits made without any recorded sightings of the species under consideration. It was less straightforward to obtain nondetection data from the opportunistic data, because these were not based on a standardized field protocol. Nevertheless, we deduced nondetection records from the sightings of all other butterfly or dragonflies species in the opportunistic databases, following Kéry et al. (2010) and Van Strien et al. (2010). Any observation of the species under consideration within the closure period was taken as a 1 (detection), whereas we rated 0 (nondetection) if any other species but not the species under consideration had been reported by an observer at a particular 1 × 1 km site and on a particular date within the closure period. This procedure was repeated for all species in order to obtain detection histories for each species.

Statistical Analysis

We applied the same dynamic occupancy model (MacKenzie et al. 2006) as Van Strien et al. (2010) and Van Strien, van Swaay & Kéry (2011) to estimate annual occupancy probability, adjusted for detection probability for each species separately. A detailed description of this model is given by Royle & Kéry (2007) and Royle & Dorazio (2008). The occupancy model consists of two hierarchically coupled submodels, one for occupancy and one for detection, the latter being conditional on the occupancy submodel. The occupancy submodel contains two parameters, one pertaining to the probability of persistence φ and one to the probability of colonization γ and computes the annual occupancy probability ψ per site recursively through:

display math(eqn 1)

Thus, whether site i occupied in year t-1 is still occupied in year t is determined by the persistence probability, and whether site i unoccupied in year t-1 is occupied in year t depends on the colonization probability. All occupancy probabilities per site together yield the annual proportion of occupied 1 × 1 km squares. In addition, we measured the trend in φ, γ and ψ as the slope of the linear regression line through the annual estimates of φ, γ and ψ, respectively. These slopes are descriptive statistics highlighting the overall change in φ, γ and ψ. To take into account the uncertainty in the annual estimates, all slopes were estimated as derived parameters (Kéry 2010).

The detection submodel estimates the detection p per visit. Because we expected detection to vary between visits, we included covariates in the submodel for detection. We used Julian date as a covariate for p because the detection of the species may vary over the season, mainly due to a changing number of adult butterflies and dragonflies. We also included the length of species lists as a covariate for detection in the opportunistic data, because we expected lower detections in shorter lists. In almost all 1 × 1 km sites in the Netherlands, there are more than three species to be found and often considerably more. Consequently, any list with fewer than four species may be considered as the result of limited observation effort (as in Szabo et al. 2010) or of reduced readiness to report sightings or both. We distinguished as list length categories: (1) single records of species on one site and date without records of other species; (2) short day-lists, that is, records of two or three species made by a single observer on one site and date and (3) comprehensive day-lists, that is, records of more than three species per observer, site and date. Comprehensive day-lists contain many records of the most common and thereby ‘uninteresting’ butterfly and dragonfly species, suggesting that these day-lists do not suffer much from incomplete recording. These list categories do not reflect skill levels of observers; the same observer may collect a single record on one day and a comprehensive day-list on another day. Effects of both Julian date and list category were included in the detection submodel via a logit link:

display math(eqn 2)

where pijt is the probability to detect the species at site i during visit j in year t, αt is the annual intercept, β1 and β2 are the linear and quadratic effects of the date of visit j, and δ1 and δ2 are the effects of short day-lists and comprehensive day-lists, relative to single records. A considerable number of butterfly and dragonflies species have advanced their flight period (Dingemans & Kalkman 2008; Van Strien et al. 2008). Preliminary tests revealed that varying β1 and β2 between years had negligible effects on occupancy, so for simplicity's sake, we used the same β1 and β2 for all years. We computed the trend in annual detection probability αt in a similar way as the trend in φ, γ and ψ.

We fitted the models in a Bayesian mode of inference using JAGS (Plummer 2009) on the computer cluster LISA (https://www.surfsara.nl/systems/lisa), with essentially the same WinBUGS code (Spiegelhalter, Thomas & Best 2003) as given by Royle & Dorazio (2008; p. 309), but in addition we estimated αt as a random effect. We chose conventional vague priors for all parameters, using uniform distributions with values between 0 and 1 for all parameters except β1, β2, δ1 and δ2 (values between −10 and 10) and αt (values between 0 and 5 for the standard deviation of the normal distribution used as prior for the random year effect), and we used sufficient iterations to achieve convergence of the models. Although the number of surveyed sites in the data has increased, the increase does not affect the trend in distribution. The same sites were in the analysis for all years; the model estimated missing values for sites not surveyed during some years from sites with surveys in those years. We interpreted the standard deviation of the sample from the posterior distribution of each parameter computed by JAGS as the standard error of that parameter and used the 95% credible intervals to describe the precision of parameter estimates (Kéry 2010). Model fits were evaluated using Bayesian p-values. This value is near 0·5 for an adequately fitting model and values close to 0 or to 1 indicate inadequate fits (Kéry 2010). Most of our values (80%) were between 0·40 and 0·60, suggesting that model fits were generally adequate.

Results

Detection

As expected, for most species detection, probability was lower at the start and the end of the closure period. A few species showed no relation with Julian date, these were mostly species with considerably longer flight periods than the closure period applied, for example Pieris napi. The monitoring data revealed that mean detections differed considerably between species. Some species, such as Favonius quercus, had a much lower detection probability because they are more difficult to find. In contrast, conspicuous and abundant species as Maniola jurtina have high detection probabilities (see Table S1 in Supporting Information). Detections in monitoring data and comprehensive day-lists were highly correlated, both for butterflies and dragonflies (Fig. 1; Pearson's r = 0·81 and 0·80 for butterflies and dragonflies respectively). In addition, mean detections obtained for monitoring data were not (butterflies) or only slightly (dragonflies) different from those in comprehensive day-lists (Fig. 2; paired t-test; > 0·05 and < 0·01 for butterflies and dragonflies, respectively). But detections in short day-lists and single records data differed substantially from those in monitoring data, for both species groups (Fig. 2; paired t-test; < 0·01). This confirms our expectation that incomplete recording is manifested as a lower detection probability of species.

Figure 1.

Relationship between detection probability of species in monitoring data and in comprehensive day-lists for butterflies (n = 37; closed circles, solid line) and dragonflies (n = 40; open circles, dashed line). Detection probabilities of species are average values per visit for the entire study period.

Figure 2.

Mean detection probability (±SE) for different day-list categories of butterflies (1990–2010; n = 37) and dragonflies (1990–2011; n = 40).

The lower detection rate did not apply evenly to all species in the single records data and short day-lists though. For rare butterfly species, such as Papilio machaon, Pyrgus malvae and Boloria selene, detections were less diminished than for common butterflies (Fig. 3a; Pearson's r = −0·80 and −0·84 for short day-lists and single records, respectively; < 0·05). The same was true for dragonflies with, for example, the less common Aeshna viridis having a less diminished detection in short day-lists and single records as compared to more common dragonflies (Fig. 3b; Pearson's r = −0·56 and −0·62 for short lists and single records respectively; < 0·05). This indicates that observers showed selective reporting behaviour.

Figure 3.

Quotient of detection probability in (a) butterflies and (b) dragonflies of short day-lists (closed circles, solid line) or single records (open triangles, dashed line) with detection in comprehensive day-lists in relation to species rarity. Rarity is measured as the total number of 1 × 1 km sites in which the species occurred at least once during the study period. Quotient with value 1 means no difference in detection between either short day-lists or single records and comprehensive day-lists.

In the majority of butterfly species (in 23 out of 37 species), we found a trend in detection in the monitoring data, mainly declines (see Table S1). In opportunistic butterfly data, we found trends in detection in 14 out of the 37 species. For dragonflies (see Table S2), we also found a considerable number of trends in detection in monitoring data (in 14 out of 40 species), but few such trends in opportunistic data (four out of 40 species).

Occupancy

We compared trends in opportunistic data with those in monitoring data, both analysed by the occupancy model. Mean distribution trend of all butterfly species based on opportunistic data (0·0006 ± 0·004) did not differ from the trend in monitoring data (−0·0002 ± 0·003; n = 37; paired t-test; = 0·24). This was also true for dragonflies: mean trends were 0·011 ± 0·010 and 0·008 ± 0·008 for opportunistic and monitoring data, respectively (n = 40; paired t-test; = 0·13). Moreover, the trends were highly correlated (Fig. 4a,b; Pearson's r = 0·90 and 0·78 for butterflies and dragonflies respectively; < 0·05). Two examples of the concordance in trends are shown in Fig. 5. The butterfly Lasiommata megera declined according to both monitoring data and opportunistic data (Fig. 5a). The dragonfly Anax imperator increased according to both data sets (Fig. 5b).

Figure 4.

Relationship between distribution trends of species in opportunistic data and monitoring data for (a) butterflies and (b) dragonflies.

Figure 5.

Annual occupancy probability (±95% credible interval) for (a) the butterfly Lasiommata megera and (b) the dragonfly Anax imperator in monitoring data (closed circles, solid line) and in opportunistic data (open triangles, dashed line).

However, not all species showed a perfect match between monitoring and opportunistic data trends. We found 15 butterfly species with significant trends in monitoring data. Using the opportunistic data, 10 out of these 15 trends were also assessed as being significant and having the same direction and one species trend was found significant while this trend was not found in the monitoring data. For dragonflies, we found 11 species with significant trends in monitoring data. In the opportunistic data analysed by the occupancy model, five out of these 11 trends were considered significant and in the same direction. Four significant additional trends were assessed which were not found in the monitoring data, although the sign of the trend was similar for three out of these four.

Mean occupancy across all years per species did not differ between opportunistic data and monitoring data (butterflies n = 37; 0·64 ± 0·02 and 0·63 ± 0·02; paired t-test = 0·83; dragonflies n = 40; 0·67 ± 0·02 and 0·68 ± 0·02; paired t-test = 0·33).

Discussion

Match of Distribution Trends

Our main finding was that the distribution trends provided by opportunistic and monitoring data were well-matched, if both were analysed by occupancy modelling. Strong positive and negative trends, that is, above 0·01 or below −0·01, in monitoring data were rarely missed when using the opportunistic data (Fig. 4a,b). This finding confirms our earlier results obtained for a few dragonfly species (Van Strien et al. 2010). The match we found was much better than obtained by Snäll et al. (2010) who compared changes in opportunistic data for birds with bird monitoring data. But Snäll et al. (2010) did not adjust for observation bias, detection bias and reporting bias, and we have applied an advanced procedure to correct for these biases in opportunistic data.

Detection

Surprisingly, many butterfly and dragonfly species had a trend in detection in standardized monitoring data (Table S1 & S2). This finding underlines the recommendations of MacKenzie et al. (2006) to account for detection even in standardized monitoring data, at least for inferences on distribution trends. The butterfly monitoring data revealed many declines in detection, presumably at least partly provoked by a decline in population abundance, because quite a few species with a declining trend in detection also declined in abundance (Table S1 & S2; Pearson's r = 0·49 between trend in detection and trend in population abundance; < 0·05). There was no obvious association with abundance trends for the dragonfly data.

Trends in detection were also found in opportunistic data, but these were not associated with changes in abundance (Table S1 & S2). We believe this is because in opportunistic data detection is affected by many factors which are of little importance in standardized data, especially variation in field method and field efforts and in completeness of recording. Shares of single records and short lists are considerable in opportunistic data (Table 1), thus incomplete recording is extensive. We found no evidence of incomplete recording in comprehensive day-lists: generally, detections in these day-lists were not different or even higher than detections in monitoring data (Fig. 2). In addition, we found substantial selective recording behaviour: there were significant preferences for rarer species in short lists and single records data (Fig. 3a,b). Hardly, any method suggested in the literature to analyse opportunistic data takes into account the selective recording of species. An easy treatment might be to simply remove all single records and short day-lists, but this would cause a substantial reduction in the data. In our case, for instance, removing all single records and short day-lists would mean than records collected in about 75% of all field visits would be wasted (Table 1). Instead, occupancy models are a good alternative for the analysis of opportunistic data. By estimating detection, these models enable researchers to take into account observation bias in opportunistic data, as well as incomplete and selective recording behaviour (reporting bias) and other factors influencing detection, such as abundance changes (detection bias).

Model Assumptions

The good match between trends based on monitoring and opportunistic data suggests that model assumptions were not violated to a critical degree in our case. However, some assumptions need to be addressed (see also MacKenzie et al. 2006; Van Strien, van Swaay & Kéry 2011). First, the closure assumption may not hold, particularly for dragonflies which are less sedentary than butterflies. A lack of closure may cause positive bias in the occupancy estimates (Rota et al. 2009), but only if occupancy is taken to mean ‘permanent presence’. If species randomly move in and out of the sampling units, then the occupancy parameter should be interpreted as the proportion of sites ‘used’ by the target species during the period over which closure is assumed (MacKenzie et al. 2006). Second, we ignored the fact that often not the entire 1 × 1 km site was visited but only a spatial subunit. Some species will not be available for detection during a visit to a subunit if they are permanently absent from that particular subunit. According to Kendall & White (2009), this may induce bias in occupancy estimation, but not if the spatial subunits within the site are resampled with replacement. We consider the collection of opportunistic data by many different observers comparable to sampling with replacement. Therefore, we do not expect bias in our occupancy estimates.

Perspectives

In conclusion, opportunistic butterfly and dragonfly data can be successfully used for inferences on distribution trends if analysed with occupancy models to treat observation bias and other biases affecting detection. Other methods can also deal with varying observation effort, at least to some extent (Botts, Erasmus & Alexander 2012), but there is little evidence that other methods than occupancy models are capable to cope with reporting and detection bias. We therefore recommend that opportunistic data are analysed using occupancy models where possible.

Occupancy models can, however, only be used if replicated visits in a period of closure are available. Because of this requirement, Botts, Erasmus & Alexander (2012) considered occupancy models to be not very useful in practice. This may be true for historic data, because historic data often lack replicated visits of sites within the same year. But this is not true for many present-day online databases. We used on average 288 sites for butterfly species and 150 sites for dragonfly species (Table 2) and had about four replicated visits per site to our disposal. The total amount of opportunistic data available for these two species groups, however, was even 10–20 times greater (Table 1). So, even if far fewer observers were to contribute to opportunistic data collections than in the Netherlands, many opportunities may still be available for running occupancy models. Occupancy models are currently not easy to apply to big data sets because they are computer intensive, especially if Bayesian methods are applied, as in this study. However, we expect such computer-intensive methods to become much easier in the near future.

Opportunistic data produce more variable occupancy estimates than standardized monitoring data, so their annual confidence intervals are slightly larger (see Fig. 5a,b) and trend estimates are less precise. The smaller standard errors derived from monitoring data are a merit of the standardization of the field work. But this is true if the same number of sites is used, while there are annual opportunistic records for many more sites (Table 1). When all available opportunistic data were used, the standard errors of trend estimates were considerably reduced. Mean standard errors of trends in monitoring data were 0·003 and 0·008 for butterflies and dragonflies, respectively (based on 37 and 40 species with mean number of 1 × 1 km squares 288 and 154, respectively; Table 2). Using all available opportunistic data from the same periods, mean standard errors of trends were 0·001 and 0·002 for butterflies and dragonflies, respectively (based on 37 and 40 species with mean number of 1 × 1 km squares 6206 and 4245), thus considerably lower.

Although not relevant to this study, opportunistic data may suffer from geographical bias owing to nonrandom selection of recording locations. Opportunistic data are, however, numerous and widespread in the Netherlands, with currently 14% (dragonflies) and 29% (butterflies) of all 1 × 1 km sites surveyed annually (Table 1). In this situation, geographical bias will be of limited relevance, because for many species the data collection will sample nearly all 1 × 1 km sites in the Netherlands once every few years. With fewer data or when particular areas or habitat types remain under-sampled, poststratification and weighting of sites according to their share in the statistical population under study may be a useful option. Van Swaay, Plate & van Strien (2002) and Van Turnhout et al. (2008) applied such a procedure in monitoring schemes to treat under-sampling and over-sampling of areas and habitat types.

It is often stated that the monitoring of biodiversity requires well-designed schemes and standardized field work (e.g. Silvertown 2009; Robertson, Cumming & Erasmus 2010). Such schemes will indeed guarantee unbiased and precise trend information, if analysed adequately. But here we showed that also nonstandardized data may deliver unbiased and precise information on trends, if analysed adequately. This makes it possible to monitor species trends for species groups and regions where it is not feasible to collect standardized data on a large scale. We believe that this may not only work for butterflies and dragonflies, but also for other animal species groups which are collected using day-lists, such as grasshoppers, resident birds and day-active mammals. Opportunistic data may thus become an important source of information to track trends in distribution in multiple species groups, both at national and supranational level.

Acknowledgements

We thank Marijn Prins, Marcel Straver, Calijn Plate and Wim Plantenga for data preparation, Willy van Strien for critically reading the draft and Rita Gircour for improving the English. The opportunistic data for the Netherlands were obtained from the National Database Flora and Fauna, maintained by the National Authority for Data concerning Nature. These data are owned by the Dutch Society for Dragonfly Studies, Dutch Butterfly Conservation, and the European Invertebrate Survey – the Netherlands. Most records are currently collected through the internet portals Waarneming.nl and Telmee.nl. The Dutch Butterfly Scheme is a joint scheme of Dutch Butterfly Conservation and Statistics Netherlands and is financed by the Ministry of Economic Affairs and the National Authority for Data concerning Nature in the framework of the Dutch Network Ecological Monitoring programme. The work of A.J.vS. is facilitated by the BiG Grid infrastructure for eScience (www.biggrid.nl). This work would not have been possible without the help of many voluntary field workers.

Ancillary