Observer retention, site selection and population dynamics interact to bias abundance trends in bats

1. Many long-term wildlife population monitoring programmes rely on citizen scientists for data collection. This can offer several benefits over traditional monitoring practices as it is a cost-effective, large-scale approach capable of providing long time series data and raising public environmental awareness. Whilst there is a debate about the quality of citizen science data, a standardised sampling design can allow citizen science data to be of a similar quality to those collected by pro -fessionals. However, many programmes use subjective, opportunistic selection of monitoring sites and this introduces several types of bias, which are not well understood. 2. Using bat roost counts as a case study, we took a ‘virtual ecologist’ approach to simulate the effect of opportunistic site selection and uneven observer retention on our ability to accurately detect abundance trends. We simulated populations with different levels of temporal variability and site


| INTRODUC TI ON
A fundamental tool of conservation biology is the establishment of long-term monitoring programmes, which allow policy-makers to evaluate the extent of biodiversity changes and the effectiveness of conservation strategies (Buckland & Johnston, 2017;Magurran et al., 2010;Yoccoz, Nichols, & Boulinier, 2001). These programmes require a large amount of financial and human resources in order to produce robust data that can be used to assess the anthropogenic impact on biodiversity (Bird et al., 2014;Schmeller et al., 2009).
Monitoring biodiversity through data collected by volunteer observers has a long-standing history (e.g. the Audubon Society's Christmas Bird Count, the British Breeding Bird Survey, NABU Vogelzählung), and in recent decades has become the standard for long-term monitoring programmes (Burgess et al., 2017;Callaghan, Rowley, Cornwell, Poore, & Major, 2019;Schmeller et al., 2009). Such citizen science programmes offer many benefits compared with traditional monitoring approaches. They are cost-effective, large-scale, provide continuous time series data and also raise public environmental awareness (Conrad & Hilchey, 2011;Dickinson et al., 2012).
Despite the success of many citizen science monitoring programmes, discussion about the quality of data collected by observers has been ongoing (Brown & Williams, 2018;Conrad & Hilchey, 2011;Gardiner et al., 2012;Riesch & Potter, 2014). Commonly voiced concerns are data inaccuracies, such as omission and misidentification, as well as variation in sampling effort through time and space (Dickinson, Zuckerberg, & Bonter, 2010;Falk et al., 2019;Lewandowski & Specht, 2015;Maldonado et al., 2015). Both data inaccuracies and sampling effort can be related to the wide variation in observer skill and retention that is present in citizen science observers (Aceves-Bueno et al., 2017;Belt & Krausman, 2012;Jiguet, 2009;Moyer-Horner, Smith, & Belt, 2012). In response, there have been numerous studies highlighting the importance of collecting metadata, such as descriptors of the sampling process, as well as using suitable modelling approaches to achieve the full potential from data collected by observers (Bird et al., 2014;Isaac & Pocock, 2015;Johnston, Fink, Hochachka, & Kelling, 2017;Kéry et al., 2010;Ruiz-Gutierrez, Hooten, & Campbell Grant, 2016). Thus, a standardised sampling design in combination with appropriate analytical methods enables citizen science data to be of similar quality as those collected by professionals (Brown & Williams, 2018;Engel & Voshell, 2002;Van Strien, Van Swaay, & Termaat, 2013 ;Fitzpatrick, Preisser, Ellison, & Elkinton, 2009), and if the population size is highly variable, these high-abundance sites will then likely revert back to their long-term mean population size, seemingly declining ('Regression to the mean', Palmer, 1993; Figure   S1 in Appendix S1). Meanwhile, sites that were initially low abundance are likely to have increased on average, but if low abundance sites are not among the set of monitored sites then these increases will not be captured in the monitoring programme. This issue of regression-to-the-mean has been cited as a potentially major source for bias in monitoring programmes with opportunistic site selection (Boyd, 2013;Buckland & Johnston, 2017;Fournier, White, & Heard, 2019) and is particularly likely to create problems when only one species is monitored per site.
Another potential bias in single-species surveys could arise if populations exhibit low site fidelity (e.g. the movement of wetland birds between lakes; Ruete, Pärt, Berg, & Knape, 2017). Migration of individuals from a monitored site to an unmonitored site would result in an apparent decline in numbers since the increase at the unmonitored sites would not be measured (Buckland & Johnston, 2017). Furthermore, in cases where the entire population moves to an unmonitored site, the observer may conclude that the population has abandoned the site permanently or become extinct, potentially causing them to stop monitoring. As a result, any subsequent reoccupation of the site will be missed. Repeated visits within seasons can mitigate the problems of low site fidelity, but this is not practical for all monitoring programmes. Given the popularity of opportunistic, single-species surveys for biodiversity monitoring, it is important to establish how subjective site selection, population dynamics and observer retention interact to bias data from single-species surveys.
Counts of individuals at opportunistically selected roost sites is a common approach used to monitor bat populations (Order: Chiroptera). Gaining large-scale, long-term information about bat populations is vital for conservation as they are one of the most diverse order of mammals (Russo & Jones, 2015), with more than 20% of all mammal species being bats (Simmons, 2005). Bats provide valuable ecosystem services (Kunz, Braun de Torrez, Bauer, Lobova, & Fleming, 2011) and are important indicators of anthropogenic impact on climate and habitat quality (Jones, Jacobs, Kunz, Willig, & Racey, 2009). Here we use data from the UK's National Bat species' temporal variability and site fidelity to assess and account for the effects of biased site selection and observer retention.

K E Y W O R D S
abundance trends, bats, biased site selection, citizen science, observer retention, population monitoring, site fidelity, temporal variability Monitoring Programme (NBMP) Roost Count, coordinated by the Bat Conservation Trust, to investigate the issues of biased site selection and observer retention. The Roost Count is one of four long-term monitoring surveys that form the NBMP (which also includes field surveys and counts at winter hibernacula, Table S1 in Appendix S2).
The NBMP Roost Count provides an ideal case study as it comprises single-species surveys with opportunistically selected monitoring sites, covers species with varying levels of site fidelity and is undertaken voluntarily by citizen scientists who report that the presence of bats is an important factor in their desire to continue monitoring. Between 2013 and 2017 a third of NBMP Roost Count participants who stopped monitoring did so due to bats no longer being present (Boughey & Langton, 2017). In addition, abundance trends derived from the Roost Counts differ substantially when compared with trends derived from other methods, notably for the common and soprano pipistrelles Pipistrellus pipistrellus and P. pygmaeus (Table S2 in Appendix S2). Trends from NBMP surveys are designated as Official Statistics by the UK Government and used to inform policy, aid bat conservation and contribute to the UK biodiversity indicators (Bat Conservation Trust, 2018). Given the importance of these data, a key concern is whether the discrepancy between roost count trends and those derived from other survey methods can be resolved by understanding how the data are affected by bias caused by the interaction of population dynamics, site selection and observer behaviour.
We use data from the NBMP Roost Count to parameterise a simulation study using a 'Virtual Ecologist' approach where both data and observation processes are simulated (Zurell et al., 2010). This method has been shown to be suitable to evaluate and improve the design of monitoring programmes (Rhodes & Jonzén, 2011;Weiser et al., 2019;Weiser, Diffendorfer, Lopez-Hoffman, Semmens, & Thogmartin, 2020;White, 2019). Here we use it to test the general hypothesis that biased site selection and variable observer retention influence our ability to accurately detect abundance trends.
We specifically investigate the effect of these biases on populations with different levels of temporal variability and site fidelity and ask whether these biases could explain the divergence of trends derived from NBMP roost counts and field surveys. We highlight circumstances that monitoring programmes need to be aware of, and approaches that could be employed, in order to avoid negatively biased abundance estimates.

| Bias evaluation
To parameterise our simulation, which explores the effect of systematic biases on observed abundance trends, we first evaluated potential sources of bias within the NBMP Roost Count. We used data from 1997 up to and including 2017 for five bat species (P. pipistrellus roost counts: n = 9,357, P. pygmaeus Roost Counts: n = 5,873, Eptesicus serotinus Roost Counts: n = 1,566, Myotis nattereri Roost Counts: n = 1,344, Rhinolophus hipposideros Roost Counts: n = 6,640).
Starting in 1997, Roost Count surveys are carried out annually by observers who follow an established monitoring protocol. Counts are made at a self-selected summer roosting site inhabited by one of seven bat species (Table S1 in Appendix S2). Emerging bats are counted from 15 min prior to sunset on two different dates at least five days apart, one in each of two ten-day survey periods prior to parturition (R. hipposiderous counts start in late May, R. ferrumequinum counts take place in July, all other species are counted in June).
We first investigated whether Roost Count observers surveyed larger colonies (an association of individuals sharing one or several roosts) more often than would be expected by chance. We assessed this in two ways. Firstly, for P. pipistrellus and P. pygmaeus, we created separate abundance trends with initial colony sizes below and above their long-term mean species abundance. We excluded other species due to insufficient sample sizes. We split the data into colonies that were larger or smaller (in the first year of the survey) than the average colony size for each species. Abundance trends were then created by fitting a generalised additive model (GAM) with a Poisson error distribution, a site term and a smoothing term (Barlow et al., 2015).
Degrees of freedom for the smoothing term were chosen according to the default suggestion of 0.3 times the number of survey years (following Fewster, Buckland, Siriwardena, & Stephen, 2000).
Population indices were derived from the fitted curve with 1999 as a baseline. Secondly, for five species, we compared the frequency distribution of colony sizes at the first and last year of the survey across all roost sites using two-sided Kolmogorov-Smirnov tests.
If colonies are entering the NBMP at unusually large size, then we would expect the distribution of colony sizes to be different in the first and last years.
Roost abandonments (where the colony does not return to their summer roosting site after hibernation) may have a significant impact on an observer's interest to continue monitoring. We quantified the yearly probability of roost abandonment within the NBMP Roost Count dataset, for the five species with a sufficient number of roost abandonments recorded (Table 1). Abandonments are also not necessarily permanent as the bat colony may return in subsequent years (i.e. they represent temporary emigration of the population, rather than extinction events; Simon, Hüttenbügel, & Smit-Viergutz, 2004).
For each roost, the yearly probability of roost abandonment P a was defined as: Occupancy change was calculated as the number of times the status of a roost changed from occupied to unoccupied for at least a year during the monitoring period (Table S3 in Appendix S2). Length of survey was defined as the number of years the roost was monitored (excluding roosts that were only monitored for 1 year). A value for each species was calculated by averaging the probability across all roost sites. Our estimate may also include a small number of true population P a = Occupancy Change Length of Survey .
extinctions as well as roost abandonments. For two species with sufficient data (P. pipistrellus: n = 26, P. pygmaeus: n = 15), we also calculated the average length of abandonment and the likelihood of reoccupation of a roost. The probability of reoccupation is likely an underestimate because observers often stop monitoring soon after a roost abandonment, thus potentially missing a reoccupation. We also calculated the average number of years that observers continued monitoring after a roost abandonment across all species (n = 111). Some roosts were not occupied when monitoring first started and were excluded from these calculations as it was uncertain how long the roost had been abandoned for prior to monitoring.

| Simulation framework
We simulated observers monitoring bat roosts using a 'Virtual Ecologist' approach where both data and observation processes are  Table S2 in Appendix S2). We assume that the population dynamics of the 400 simulated colonies are independent from one another because, like the roosts monitored by the NBMP Roost Count, they comprise a small and independently selected subsample of the total bat population (see Mathews et al., 2018). We simulated initial colony sizes using a  (b) Observation module-We also simulated the observation process by subsampling the data created in the population module. We F I G U R E 1 Diagram of the 'Virtual Ecologist' approach where both data and observation process are simulated. The approach contains four main components: (a) a population module that simulates the growth and movement of virtual bat colonies-this creates the actual state of the virtual system, (b) an observation module which simulates the process of roost counts surveys-this creates the observed state of the virtual system; (c) a statistical model which creates abundance trends for both system states and (d) a comparison of the differences between the actual and observed abundances of the virtual system TA B L E 1 An estimate of the mean yearly probability of a roost being abandoned by its colony. Observers record the status of the roost's occupancy each year that a roost is monitored. We counted the number of times that the status of an individual roost changed from occupied to unoccupied for at least a year and divided this by the total number of years that the roost was monitored. This was then averaged across the species Mean yearly probability of roost abandonment ( Figure S2 in Appendix S1).
These datasets provided the 'observed' state of the virtual system where knowledge about population dynamics was subject to ADSS and observer retention.
(c and d) Trend analysis and assessment-Data of four different population scenarios were simulated, for both combinations of population variation (high and low) and roost abandonment (high and low). Each of these datasets created by the population module was subsampled with high, medium and low ADSS. Then each of the subsets was subsampled by high, medium and low observer retention to create the observation model datasets (Data flow: Figure S3 in Appendix S1). This was repeated 1,000 times for each of the four population models. Following Barlow et al. (2015), abundance trends for all datasets were produced by fitting a GAM to each dataset with a Poisson error distribution, a site term to allow for differences in relative abundance between sites and a smoothing term to model the trend over time. Degrees of freedom for the smoothing term were chosen according to the default suggestion of 0.3 times the number of survey years (following Fewster et al., 2000). Population indices were derived from the fitted curve, taking the base year to be 1. To measure the difference between the actual and observed trends, the root mean square error (RMSE) was calculated. All analyses and simulations were carried out using r 3.5.2 (R Core Team, 2018).

| Can biases in the NBMP Roost Count be detected?
We find no evidence that NBMP Roost Count observers tend to select roosts that are above their long-term mean abundance. The abundance trends of colonies that were initially either below or above their species long-term mean abundance did not differ in trend direction for either pipistrelle species (Figure 2), and there was no significant difference in the frequency distribution of colony size at the start and end of their respective time series (Table S4 in Appendix S2). We find that there is a high risk that observers will cease monitoring in response to roost abandonment and therefore miss subsequent reoccupations. The highest mean yearly probability P A of a roost being abandoned for at least 1 year was found in E.
serotinus, while R. hipposideros had the lowest probability (Table 1).  Figure S4 in Appendix S1 for full results).

Roost reoccupation in
The RMSE provides a more detailed insight into these simulated biases ( Figure 4).

| D ISCUSS I ON
Our findings suggest that trends derived from NBMP Roost Count surveys (the survey we use to inform our simulation) are negatively biased as a result of the interaction between low site fidelity in some species and observer retention. The magnitude of this negative bias varies between species depending on their degree of site fidelity; species with low site fidelity are more likely to be affected.
We found that four of our five investigated species (P. pipistrellus, P. pygmaeus, E. serotinus, M. nattereri) displayed relatively low levels of site fidelity. Thus, trends for these four species are likely to be negatively biased to a greater degree. Rhinolohphus hipposiderous showed a much higher level of site fidelity, so for this species, the impact of

F I G U R E 3
The mean population trends of 1,000 simulations for each of the four population modules and one observation module. Population modules are 'Low variation + Low roost abandonment', 'Low variation + High roost abandonment', 'High variation + Low roost abandonment', 'High variation + Low roost abandonment'. Graphs show the estimated trend from GAMs based on an index of 1 at base year 1. Green line indicates the actual population trend, while the other colours show the observed, biased population trends. The trends are biased through medium abundance dependent site selection (ADSS) and medium observer retention this negative bias will be reduced.

Rhinolophus ferrumequinum and
Plecotus auritus are also monitored by the NBMP Roost Count, but the site fidelity of these species could not be assessed due to small sample sizes. Further research is needed to understand the most appropriate way to correct this bias, for example regarding our ability to statistically distinguish between a true site extinction and a temporary roost abandonment. Overall, given the simplifications of population dynamics and species behaviour in our simulation, future studies should also focus on increasing realism.
A priority for NBMP would be to communicate clearly to observers the value of continued monitoring at a seemingly abandoned roost site, especially for species with low site fidelity. Unlike in the simulations, abundance-dependent site selection and the issue of 'regression to the mean' appear unlikely to be a major source of bias in NBMP Roost Count trends, since we found no difference in the frequency distributions of the populations of any of the species at the start and end of their respective time series, and that trends did not differ between high and low abundance sites. While this does not prove that sites were selected in an unbiased way, it does highlight that abundance dependent site selection may not be problematic if inter-annual population fluctuations are small (Fournier et al., 2019). We recorded instances of a bat colony reoccupying an abandoned roost for all species studied here. Reoccupation was most frequently observed in the Pipistrellus species; for example, in over a third of the recorded instances of abandonment P. pipistrellus subsequently reoccupied the roost, and this is likely an underestimate due to the tendency of observers to cease monitoring after an abandonment. This emphasizes the importance of the roost to the colony, even if it is not currently occupied.
It is important to note that the behaviour of roost abandonment and reoccupation is very variable, depending on species and study area. A colony's site fidelity is influenced by a number of factors such as the relative abundance and longevity of roost sites (Kunz, 1982;Lewis, 1995). Species that commonly roost in permanent structures such as buildings tend to be more loyal to their roosting site, meaning they are more likely to reoccupy a site after hibernation (Simon et al., 2004;Thompson, 1992). Species roosting in ephemeral structures such as trees tend to be less loyal and may form 'fission-fusion' societies (Kerth, 2008;Kerth & Konig, 1999; but see August, Nunn, Fensome, Linton, & Mathews, 2014) and may not return to a roosting site after hibernation (Kunz & Lumsden, 2003;Lewis, 1995), which is likely a response to loss of roosting sites (Bondo et al., 2019). Some bat species also move temporarily between roost sites as individuals or small groups. When monitoring a known roost, these shortterm movements (days, weeks) of individuals may lead to under-or overestimation of colony sizes. Roost monitoring schemes therefore often employ repeated counts of which the peak count is used in analysis.

F I G U R E 4
Root mean square errors (RMSE) measuring the differences between the actual and observed trends of 1,000 repeated simulations for all module combinations. Population modules are 'Low variation + Low roost abandonment', 'Low variation + High roost abandonment', 'High variation + Low roost abandonment', 'High variation + Low roost abandonment'. The observation modules could be biased by three levels of abundance dependent site selection (ADSS) and three levels of observer retention While it is important to take account of the error associated with individuals or small groups of bats moving between roosts, it is not clear that this behaviour leads to systematic bias within roost monitoring schemes, and was therefore not included in this study.
Our simulation of a virtual citizen science monitoring programme has shown that biased site selection and observer retention influence the ability to accurately detect abundance trends in populations with various degrees of temporal variability and site fidelity.
Biased site selection, when sites enter the programme at high abundance, caused the biggest differences between actual and observed trends in populations where inter-annual variation in abundance was high, due to the population reverting back to its long-term mean (regression-to-the-mean problem; Buckland & Johnston, 2017;Palmer, 1993). Observer retention, that is the likelihood to continue monitoring after a site had been temporarily abandoned by the population, mostly affected the differences between actual and observed trends in populations with low site fidelity. These results have shown that both biased site selection and observer retention can negatively bias the observed population trends, but that the magnitude of bias is dependent on temporal variability in abundance and the site fidelity of the species.
Overall, our simulation has revealed several synergistic effects that monitoring programmes should address to improve the reliability of trend estimates. Firstly, there is a strong interaction between roost abandonment and observer retention: the combination of high roost abandonment with low observer retention produced the most biased trends in our simulation. This bias occurs because our virtual observers ceased monitoring following a temporary emigration event, assuming that the population had gone extinct. This extreme bias is likely to be restricted to single-species monitoring programmes since the probability that multiple species will temporarily emigrate at the same time is likely to decrease exponentially with the number of species monitored. A similar effect on observer retention may arise if detectability of a species is low, which is determined by a number of factors like detection method and survey effort (Guillera-Arroita, 2017). Our result means that single-species programmes should pay particular attention to observer retention, and explore ways to avoid this bias. One possibility would be to periodically revisit apparently abandoned sites to record potential reoccupations.
Secondly, programmes that allow participants to select monitoring sites should assess the temporal variability in the local abundance of their target species, to evaluate the potential for regressionto-the-mean problems. We found that regression-to-the-mean can be an important source of negative bias when two conditions are fulfilled. On its own, abundance dependent site selection is not a sufficient condition for this bias to occur. Negative biases emerge only if the inter-annual variability in population size is large compared with the variation between populations. Thus, monitoring programmes that use non-random site selection have a potential for their data to be biased, depending on the temporal variability of their monitored populations. Consequently, there is a great need for the development and testing of statistical methods to address this type of bias (Fournier et al., 2019). In the meantime, programmes should be encouraged to collect and report metadata (Bird et al., 2014;Tulloch & Szabo, 2012) which records the observers' site selection process and reasons for stopping monitoring.
Assessing the accuracy and reliability of data collected in citizen science monitoring programmes is of great importance for ecologists, conservationists and policy-makers alike. It is now wellknown that citizen science monitoring programmes benefit from standardised sampling designs and adequate observer training. Didham et al. (2020) have identified seven potential challenges for time series trends derived from citizen science data, which include several of the biases we mention here (such as regression-to-themean and abundance dependent site selection). They note shortcomings in our ability to quantifying the collective impact of these biases. Our simulations demonstrate the possibility of measuring the synergistic effects of several of these biases. Overall, our study has highlighted the need for monitoring programmes to also be aware of their study species' temporal variability and site fidelity in order to assess and account for the effects of biased site selection and observer retention.

ACK N OWLED G EM ENTS
We acknowledge financial support from NERC (PhD studentship NE/ P010539/1 to L.I.D.). We thank Charlotte Hawkins and Ken Norris for valuable comments on previous versions of the manuscript. We also thank the reviewers whose comments helped improve and clarify

DATA AVA I L A B I L I T Y S TAT E M E N T
The roost count data from the National Bat Monitoring Programme that were used in this study are available via the Dryad Digital