Assessing the usefulness of citizen science data for habitat suitability modelling: Opportunistic reporting versus sampling based on a systematic protocol

To evaluate the potential of models based on opportunistic reporting (OR) compared to models based on data from a systematic protocol (SP) for modelling species distributions. We compared model performance for eight forest bird species with contrasting spatial distributions, habitat requirements and rarity. Differences in the reporting of species were also assessed. Finally, we tested potential improvement of models when inferring high‐quality absences from OR based on questionnaires sent to observers.


| INTRODUC TI ON
Habitat suitability models are important tools to predict species' distributions (Elith & Leathwick, 2009), and for conservation and management (Franklin, 2013;Lawler, Wiersma, & Huettmann, 2011). These models require occurrence data collected across a variety of habitat types and covering a broad spatial extent . Such large biodiversity surveys are costly, and volunteer "citizen science" recording can constitute a promising alternative (Devictor, Whittaker, & Beltrame, 2010) capable of producing large datasets with value for a wide range of ecological applications . Citizen science data have been successfully used to improve knowledge in many research areas such as mapping the distribution of invasive species (Delaney, Sperling, Adams, & Leung, 2008), predicting seasonal dynamics of pathogens (Altizer, Hochachka, & Dhondt, 2004) or assessing the effect of the environment on breeding success (Rosenberg, Lowe, & Dhondt, 1999). These data can also be used to model species' habitat suitability or to describe population trends and range change (Kery et al., 2010;Mair, Harrison, Räty, et al., 2017;Snäll, Kindvall, Nilsson, & Pärt, 2011). Moreover, citizen science allows the development of a science-society interface, raising awareness of nature conservation issues and allowing the dissemination of knowledge in society (Johnson et al., 2014;Price & Lee, 2013).
Citizen science is however a broad concept that can take different forms (Brown & Williams, 2018;Pocock, Tweddle, Savage, Robinson, & Roy, 2017) from highly systematic protocols (SP) to systems only based on opportunistic reporting (OR) with no sampling design (Pocock et al., 2017;Tulloch, Possingham, Joseph, Szabo, & Martin, 2013). A trade-off often arises between data quantity and quality (Devictor et al., 2010). At one end of the gradient are data from SP based on a network of sites, which are regularly surveyed, usually during specific time periods (Pocock et al., 2017). Representative sampling is ensured through a regular, stratified or random spatial distribution of sites and a precise protocol. A complete species list is recorded for each sampled site. Such monitoring programmes can produce high-quality data with little or no bias. At the other end in OR, volunteers are free to select the sampling period and location (Pocock et al., 2017). They may record only some of the species they found. Due to its greater flexibility, this type of citizen science creates large amounts of data, but with limitations (e.g. the data are "presence-only") and biases, for example uneven sampling effort across space or time, or an oversampling of rare or emblematic species (Bird et al., 2014). Consequently, care is required to analyse OR data to avoid false inferences about species distributions.
While justifiable concerns exist around the quality of OR data, the potential biases may be counter-balanced by the much larger sample size of data collected by volunteers (Hochachka, Martin, Doyle, & Krebs, 2000;Schmeller et al., 2009). Moreover, various solutions exist to analyse this kind of data, such as presence-background methods (e.g. Maxent) or inferring non-detections from records of other species, thus making the data suitable for occupancy-detection models (Bradter et al., 2018;Isaac, van Strien, August, de Zeeuw, & Roy, 2014;van Strien, van Swaay, & Termaat, 2013).
However, Bradter et al. (2018) found that logistic regression with high-quality inferred absences performed at least as well as more specialized methods.
Despite increasing amounts and availability of data from OR, there is a lack of evaluation of their reliability for habitat suitability modelling of species with different ecology, geographical distributions and observation biases. Particularly, it is important to compare results from OR against data from SP to evaluate the reliability of OR (Mair, Harrison, Jönsson, et al., 2017). Such comparative evaluation is needed to provide guidance on the best use of existing data and to direct future data collection. Indeed, although data from OR have drawbacks they may nevertheless fill a gap for some species or habitat types poorly covered by SP. Rather than seeing these OR data as a poor, low cost alternative, we advocate to investigate their potential advantages and uniqueness, testing their performance in relation to species characteristics and observation processes. A reason is the enormous amounts available, for example currently >1.3 billion reports on www.gbif.org. This further motivates validating data from OR, defining their limitations and assets, and providing recommendations on how to deal with biases and improve model quality.
The overall aim of this study is to evaluate models based on OR compared to models based on SP concerning overall predictive performance for multiple species that are reported differently by observers. We studied eight forest bird species and applied habitat suitability models to two independent citizen science datasets, one OR and one SP covering the same large area (the whole of Sweden) and time period (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013). We evaluated the predictive performance of all models, and at different spatial scales. Additionally, we evaluated the congruence of the spatial predictions between OR and SP. We then investigated the relative importance of species characteristics, ecological requirements, geographical distribution, rarity, detectability, observer identifiability and interest, on the predictive performance of the habitat suitability models. Finally, we investigated whether the observed differences between models based on OR and SP can be explained by potential sampling bias between habitat types.

| Study species
We selected eight forest bird species according to the following criteria: (1) they are of particular interest to bird watchers, and hence, K E Y W O R D S biodiversity monitoring, citizen science, forest birds, habitat suitability models, opportunistic reporting, presence-only, pseudo-absences, standardized protocol, volunteer recording citizen science data have a high potential to fill data gaps; (2) they are relatively well-studied, allowing assessment of whether the specieshabitat relationships suggested by models are realistic; (3) they vary in rarity, but are not so common as not to be consistently reported by some active reporters, a pre-requisite for our method of obtaining inferred absences; and (4)  The occurrence data covered the whole country, but two species occur only in the north (SJ, ST), three only in the south (RBF, LTT, LSW) and three have a more widespread distribution (HG, TTW, GHW). Most of these species are more or less specialists of old-growth forest (e.g., cavity nesters) and are negatively affected by intensive forest management (but to a lesser extent for HG and LTT) (Jansson & Angelstam, 1999;Swenson & Angelstam, 1993).
See also Appendix S1 for further details on habitat requirement for each species. Their forest preferences range from coniferous forest (SJ, TTW, ST) to mixed (HG, GHW, LTT) and deciduous forest (LSW, RBF). Apart from the long-distant migrant RBF, all species are basically resident, but eruptive movements or nomadism occurs in some species.

| Data from the Systematic Protocol (SP)
The Swedish Bird Survey is a national monitoring programme, where the data are collected by volunteer ornithologists along 2 × 2 km quadratic routes on fixed location, regularly distributed along a grid with 25-km resolution across Sweden. A total of 716 squares are distributed across the country, and a portion of them is surveyed once per year between May and July. Each square is composed of eight transects of 1 km and eight points counts, but only transects data were used in this study. We obtained a total of 13,244 transects, see Appendix S2 for details. For each transect, bird observations were converted to presences or absences per species. Appendix S3 for a detailed description of our procedure for opportunistic reporting (OR) data extraction and cleaning, which is here summarized: as some Swedish Bird Survey records are sometimes reported in the OR database, we removed these observations in order to keep data from OR independent from data from SP. We also excluded uncertain observations (e.g. uncertain location) and those with a spatial inaccuracy greater than 500 m. Following Bradter et al. (2018), we choose logistic regression for our modelling (see Section 2.8). As these models require presence-absence data, we used a specific method to infer absences based on questionnaires sent to observers. We first identified a subset of 20 observers for each species that had reported the highest number of unique locations in Sweden between 2000 and 2013, to ensure using records from many different locations, rather than many observations. To ensure good geographical coverage, we further divided Sweden into four geographical regions and identified the 20 reporters that had reported the most locations of the eight species in each region.

| Data from opportunistic reporting
Thereafter, we sent a questionnaire to the 94 observers identified asking about their skills and habits for identifying and reporting the eight focal species. We received 60 responses. To infer the absences for a species, we used the reports from observers that stated they always reported the focal species when found and were able to identify the species by sight and sound. We then inferred absences when and where other species but not the focal species were reported by these observers, keeping only inferred absences with a minimum search effort corresponding to more than five bird species recorded. Note: The number of presences, absences and inferred absences in data from the systematic protocol (SP) and from opportunistic reporting (OR), for eight forest bird species. henceforth); 2003-2007 ("2005"); 2008-2013 ("2010"). The three periods were then analysed together in a single model.

| Environmental data
We chose a set of environmental predictors based on existing knowledge about the species ecology (Appendix S1) that can be grouped into three categories: climate, forest and landscape.

| Climate predictors
We used mean monthly temperature and precipitation from the EURO4M Mesan data, on a grid with a mesh size of around 5 km 2 (Landelius, Dahlgren, Gollvik, Jansson, & Olsson, 2016). We used mean monthly temperature and precipitation across the whole period 1989-2010 in spring (April to June) and winter (December to February) as predictors in our analyses.

| Forest predictors
Depending on the species ecology, we included the most relevant predictors among forest age, forest volume, % of spruce, % pine, % coniferous and % deciduous trees (Appendix S1). We used a spatially explicit description of the forest composition and structure based on estimates combining satellite imagery with data from the National Forest Inventory of Sweden (i.e., kNN data, Reese et al., 2003). This dataset is available for the years 2000, 2005 and 2010 with a pixel size of 25 by 25 m.

| Grain size
For each species, the environmental predictors were computed for a

| Landscape predictors
We included the mean elevation of the site (1 × 1 km square) and the percentage of forest in 1 × 1 or 3 × 3 km square buffers, as some species avoid open areas and prefer large or well-connected forest patches and have minimum habitat size requirements (Appendix S1).
To compute this last predictor, we considered only forest 25 × 25 m pixels with a total standing tree volume larger than 25 m 3 /ha from the raster of forest volume (kNN data), which corresponds to the minimum volume for the classification as young forest in Brotons, Mönkkönen, Huhta, Nikula, and Rajasärkkä (2003). We also included as predictor the Euclidean distance to the nearest city to estimate site accessibility, as this may explain the sampling bias in OR (more reports from more accessible areas). Finally, we included the total number of bird observations reported in Artportalen for each sampling period and the total numbers of bird records around each observation in OR (1 km 2 ), as a proxy of the sampling effort (see Appendix S4). As not all transects in SP are necessarily sampled every year and because the three periods vary in duration, we also tested to include the number of times a transect was surveyed as a predictor in SP models.

| Species characteristics
To investigate how predictive performance of the habitat suitability models varied depending on species characteristics (i.e., ecological requirements, geographical distribution, rarity or detectability), we described and classified species based on six main characteristics, see Table 2 (see also Appendix S1 for other characteristics).

| Observer's behaviour and data characteristics
Based on the 60 questionnaires, we assessed how observers' behaviour differed for each of the targeted species. We calculated the percentage that declared to always report observations of a focal species and that are confident in their own ability to recognize the species (by sight and sound). We further estimated this percentage among the observers that had reported the species in its main distribution area (as the recognition rate is expected to decrease when the species become rarer, but the reporting rate may increase). We also assessed the impact of prevalence (ratio presence/absence) and sample size in data from OR, as it may impact the predictive perfor-

| Statistical analysis
For each species, we proceeded as follows: 1. National-level presence-absence (SP and OR) data were modelled using logistic regression, a standard method for binary data (Faraway, 2006), using a Binomial distribution and a logit link.
For each species, predictors were first chosen according to our knowledge of the species ecology (Appendix S1). Then, we performed a model selection retaining the model with lowest AIC (Akaike's information criterion [Akaike, 1974]), one model per species and per data type. This included testing all possible combinations of predictors and biologically reasonable quadratic effects and interactions between predictors.
2. The best model per species and dataset selected in step 1 was then evaluated by cross-validation. We evaluated the ability of the model based on OR to predict the observations from SP. We also evaluated the predictive performance of the models based on SP using a leave-one-out cross-validation, excluding successively each transect from the training data and using the excluded transect for validation. We produced a ROC curve and computed an AUC score (area under the receiver-operating curve) for each model. For further understanding of the predictive performance of the national model at finer spatial scales, we also present AUC for the main distribution area, specifically a convex polygon containing 90% of the presences in data from SP.
3. As the absolute predictions from SP and OR were not directly comparable, because the non-systematic sampling changes the prevalence of presences in OR data compared to SP data, we ranked the predictions (from the lowest to the highest score) to compare the relative habitat suitability. We then quantified the congruence in spatial predictions between models from OR and SP data by calculating the Spearman rank correlation coefficient between the predictions. Predictions were computed for the whole of Sweden in a grid of 1 km 2 cell size, using forest predictors of 2010.
4. We investigated how the model performances and differences of congruence between models from OR and SP data were related to the specific species characteristics and reporting behaviour of observers (the observation process). We tested if the differences in AUC between models based on OR and SP varied depending on these predictors (see Sections 2.6 and 2.7) using linear regression.
5. We investigated the gain in predictive performance (AUC) of models based on OR by progressively increasing the number of inferred absences in the data. We randomly drew a subset of inferred absences from 0.05% to 100% of the amount of absences in the data from SP. At each step, we performed a cross-validation and used the model fitted on OR data with this subset of inferred absences to predict presences and absences in SP. We repeated this procedure 100 times to define a confidence interval, and We also compared these distributions with the distribution of these predictors at national scale. The observed distribution for the whole of Sweden is obtained based on grid with cell size of 1 × 1 km covering the whole of Sweden. Other distributions are based on an 1 km 2 square around each observation (e.g. % of forest), for both presences and absences and both protocol (OR and SP). We then compared the histograms for OR, SP and the whole of Sweden per class of percentage.
All analyses were performed using R software 3.6.0 (R Core Team, 2019). Model selection was performed based on AIC using the dredge function of the package MuMIn (Barton, 2009). Cross-validation was performed using the package AUC (Ballings & Van den Poel, 2013).

| Spatial congruence
A high congruence in habitat suitability maps from OR and SP models was observed for all the species at national scale ( Figure 1). Spearman's rank correlation coefficient ranged between 0.78 (three-toed woodpecker) and 0.99 (Siberian tit). Overall, models from SP predicted a slightly higher habitat suitability in the northwest for four species, but the pattern was opposite for Siberian tit. Otherwise, differences between predictions from OR and SP models varied between species.

| Model predictors
Although there were some differences, the main selected predictors were the same in models based on data from OR and SP (Appendix S5). Furthermore, predictors and their associated coefficient estimates were mainly in accordance with our expectations based on available knowledge about the species ecology (Appendix S1). However, some unexpected effects were also observed for some predictors in models based on data from OR, as for example a small negative effect of forest age on grey-headed woodpecker occurrence (Appendix S5).

| Model evaluation
For almost all species, models built on OR data can discriminate between presences and absences in SP almost as well as models built on SP data themselves ( Figure S6.1 and Table S6.1). For one species (red-breasted flycatcher), the model built on OR data was even better at predicting SP data than the model based on SP. The largest difference between the predictions of the models from OR and SP was observed for the three-toed woodpecker (delta AUC SP-OR = 0.04, Figure S6.1). Models from OR and SP both had a satisfactory (>0.7) or good (>0.8) predictive performance. The AUC scores for the final models ranged from 0.75 (red-breasted flycatcher) to 0.93 (Siberian tit) for SP, and from 0.74 (three-toed woodpecker) to 0.93 (Siberian tit) for OR ( Figure S6.1,

| Effects of sampling biases
OR data contained much more observations in the most densely populated areas. Distance to cities was a better proxy for sampling effort than total number of observations per time period in 1 km 2 , which did not improve the model (Appendix S4). Besides the geographical bias towards populated areas, OR data were also biased by habitat type, but only for open habitats (Appendix S7). The most densely forested landscapes, with higher percentages of forest in 1 km 2 squares, were under-sampled in the OR data.

| Differences explained by species ecological characteristics
The predictive performance of models based on data from OR became significantly closer to those based on data from SP with increasing species rarity (Figure 2a, R 2 = .64, p = .018). For the rarest species (red-breasted flycatcher), the model based on data from OR was even better than the one based on SP at predicting presences and absences in the SP data ( Figure S6.1). Species further tend to be more consistently reported (percentage of observers that always report the species) in OR with increasing rarity (Figure 2b, R 2 = .49, F I G U R E 1 Spatial predictions (and difference between them) using models based on data from opportunistic reporting (OR) and systematic protocol (SP) for the eight bird species. The maps on the left and middle correspond to ranked predictions for habitat suitability from OR and SP, respectively. Spearman' rank correlations between these two maps are presented for each species. The map on the right shows the difference between these two maps. Green areas correspond to low difference, while blue indicates that OR predicts a higher habitat suitability than SP, and yellow indicates the opposite p = .055). We found no support for other associations between the predictive performance of the OR model and species characteristics (Appendix S8).

| Impact of observer's behaviour and data characteristics
Contrary to our expectations, the relative predictive performance of models based on data from OR in comparison with SP (difference in AUC) could not be explained by differences in observer's skills or consistency in reporting, so the agreement between models was not higher for well reported or more easily recognizable species.
The global index of reporting quality did not explain the differences in AUC between models based on data from OR and SP (R 2 = −.24, p = .22, Figure S9.1). Similarly, the prevalence and sample size did not significantly impact the predictive performance of the models.
Although no significant relationship was detected, it can be noticed that our best model based on data from OR, that exceeded the predictive performance of the model based on SP, was obtained for the red-breasted flycatcher that was both among the most easily recognizable and consistently reported species (Figure 3). The model F I G U R E 2 Predictive performance of the models in relation to the rarity of the species. (a) Difference in predictive performance between models based on data from opportunistic reporting (OR) and systematic protocol (SP) in relation to the rarity of the species. The predictive performance is estimated based on the AUC (area under the curve) of the models. The rarity of the species is defined as the inverse of the number of presences in SP. The species are designated by their acronyms: Siberian jay (SJ), Siberian tit (ST), hazel grouse (HG), three-toed woodpecker (TTW), grey-headed woodpecker (GHW), lesser spotted woodpecker (LSW), long-tailed tit (LTT), red-breasted flycatcher (RBF). (b) Reporting rate in relation to the rarity of the species. The reporting rate is defined as the percentage of observers in the OR questionnaire that always report the species at the national scale. The rarity of the species is defined as the inverse of the number of presences in SP with the poorest performance was obtained for three-toed woodpecker, a species considered to be among the most difficult to identify by observers (poorly recognized by call) (Figure 3).

| How many inferred absences do we need?
The predictive performance of models based on data from OR increased with increasing number of inferred absences (Figure 4).
However, very few inferred absences were required to reach the maximum AUC. For all species, the plateau in AUC was reached around the 1% level of the total number of absences in data from SP (around 10 inferred absences). However, the relatively high variability in AUC at the 1% level for most species suggests that a higher number of inferred absences (up to 5% or 10% depending on species) will lead to more stable results (Figure 4). Note that the inferred absences are of high quality in that each inferred absence corresponds on average to 23 to 83 observations depending on species (minimum six; Figure 4a). The increase in predictive performance of the model varied among species, ranging between 1% (Siberian tit) and 20% (hazel grouse) to reaching performance asymptote at around 1% inferred absences. The AUC increased more for models with an initial low AUC at a low percentage of inferred absences (Figure 4b).

| D ISCUSS I ON
Our study provides evidence that habitat suitability models from OR can provide similar predictions of habitat suitability as models from SP, for multiple species with varied characteristics, ecological requirements and observation biases. These findings widens the inferences compared to studies of single bird (Bradter et al., 2018) and fungal (Mair, Harrison, Jönsson, et al., 2017) species. Indeed, models from both OR and SP data were highly congruent, and in accordance with the available knowledge on species ecology. Furthermore, models from OR predicted data from SP almost as accurately as the models from SP themselves. Some of the local mismatches (e.g. three-toed woodpecker in the north) may be explained by the lack of northern OR data, or by the poor quality of some forest predictors in this area, especially where forest age is high (Bradter et al., 2018). For one species (red-breasted flycatcher), the model based on data from OR provided even better predictions than those based on SP. This result is especially robust given the fact that our cross-validation of the OR models used an independent dataset constituting a more difficult test than the test used to assess the predictive performance of the model based on data from SP. Indeed, our thorough data cleaning procedure and our protocol to infer absences based on questionnaires should have made the quality of our OR dataset higher than many earlier evaluated datasets (Barbet-Massin et al., 2012).
However, this work was straightforward and we thus show that with some relatively simple analyses and inferred absences, OR can provide reliable spatial predictions of habitat suitability at large spatial scales or in areas where no systematic protocols exist. In accordance with Bradter et al. (2018) and Huang and Frimpong (2015), we demonstrate that logistic regression using inferred absences produce reliable results that may outperform other methods based on presences paired with background or pseudo-absence data.
The high congruence between OR and SP models is striking considering the sampling biases in data from OR. OR data contain much more data from densely populated areas and more open habitats.
This bias towards urban area or more accessible sites seems frequent in studies using citizen sciences data (Callaghan et al., 2020; Millar,

F I G U R E 3
Reporting scores for the most comprehensively (red-breasted flycatcher) and the most restrictively (three-toed woodpecker) reported species in comparison with the mean for the eight species for different reporter's behaviour and data characteristics: the percentage of observers that always report the species (at national scale), the percentage of observers that can identify the species (at national scale), the percentage of observers that always report the species (in the main distribution area of the species, defined by a convex polygon containing 90% of the presences in systematic protocol), the percentage of observers that can identify the species (in the main distribution area of the species), the sample size in opportunistic reporting (OR) and the prevalence (ratio between the number of presence and the number or absences) in OR  Isaac, 2019). We included the distance to cities in our models, as a proxy for accessibility, which better accounted for this bias than the number of reports per spatial unit. As only presences were reported in our dataset, this last variable may fail to represent well the real sampling effort. In accordance with our results, other studies show that large amounts of data in OR can compensate for bias and produce reliable results when sampling thresholds are met (Callaghan et al., 2020;Horns, Adler, & Şekercioğlu, 2018). Accessibility also probably explains the bias towards open habitat in OR data, although also other factors can be involved, for example higher visibility for bird watching, or higher interest for forest edges that combine various habitat types. This known bias (e.g., Kallimanis, Panitsa, & Dimopoulos, 2017) may therefore induce better reporting of edge species than strict forest interior species.
Models based on OR produce predictions congruent to those based on SP for all eight species. More specifically, we detected an increase in relative predictive performance of the OR models (compared with SP) among rare species, in accordance to Sardà-Palomera et al. (2012), possibly because the SP sites typically rarely match the habitat requirement of these species (Ottvall et al., 2009). The low predictive performance of SP models for rare species can also be due to the low prevalence of these species in SP data. Such impact of prevalence on predictive performance has previously been shown (Cumming, 2000;Huang & Frimpong, 2015;McPherson et al., 2004).
This is unlikely the case in OR but may be a problem in SP. Indeed, we did not find any effect of prevalence on the predictive performance of OR models, probably due to the high sampling effort for all focal species. On the contrary for SP, for red-breasted flycatcher our dataset contained only 65 presences for 13,179 absences. This may also explain the lower predictive power of the model based on SP for this species compared to the OR model.
We did not detect any relationship between difference in predictive performance of OR and SP models and the reporting quality between species. However, the rarest species were the most consistently reported in OR, in agreement with studies on butterfly and dragonfly species (van Strien et al., 2013). We therefore agree with previous work, which includes organism groups other than birds, that OR data may be of particular value for species where systematic reports do not exist or require high sampling effort, for example longhorn beetles (Snäll, Forslund, Jeppsson, Lindhe, & O'Hara, 2014) or lynx (Louvrier et al., 2019). The reporting rate of rare species is also likely to increase with observers' expertise (Johnston, Fink, Hochachka, and Kelling (2018). Other study observed an interaction effect between the prevalence/ubiquity of the species and the sampling effort that impact on the predictive performance of models based on OR (Steen, Elphick, & Tingley, 2019). Note though that in amphibians and reptiles, common rather than rare species seem to be better reported (Tiago, Pereira, and Capinha (2017).
Accounting for the impact of observer expertise on detection probability can improve the model fit and the predictive performance of models (Johnston et al. (2018), but including it in models does not necessarily improve estimates of population trends (Eglington, Davis, Joys, Chamberlain, & Noble, 2010). We agree with Tulloch and Szabo, (2012) that even if an observer effect is not included in the model, collecting information on observer behaviour helps identifying OR data issues and controlling them.
Also the risk of misidentification varies depending on both rarity and observer skills, with more false-positive reports of rare species by skilled observers but more false-positive reports of common species by less experienced observers (Farmer, Leonard, & Horn, 2012). The impact of misidentifications on model performance is difficult to assess and vary between studies (Cruickshank, Bühler, & Schmidt, 2019;Ruiz-Gutierrez, Hooten, & Campbell Grant, 2016).
We assume that the risk of misidentification of rare species is quite low in our dataset due to high self-validation control carried out by the bird watching community.

F I G U R E 4
How much absences are enough? (a) Increase in predictive performance with the percentage of absences included in the models of each bird species (upper graphs). The percentage of absences on the x-axis is expressed in relation to the number of absences in systematic protocol (SP). 100% is the equivalent of including the same number of inferred absences as the number of absences in the SP, see Table 1 for the total number of inferred absences per species. As the maximum AUC (area under the curve) is rapidly reached, we scaled the x-axis from 0% to 10% to improve visibility. As each inferred absence correspond in average to many bird observations, we also present for each species the number of observations per inferred absences to have an estimation of the sampling effort that each inferred absence represents (lower graphs). (b) Gain in predictive performance with inferred absences among species. For each species, we estimated the gain in predictive performance by subtracting the maximum AUC (see threshold in the curve Figure 4a) with the minimum AUC. As the threshold for maximum AUC is obtained with very few absences, we defined the maximum AUC based on the model with 10% of absences and the minimum AUC based on the model with 0.05% of absences. The AUC gain of adding inferred absence decrease when the AUC of the model with 0.05% of absences increase. This means that adding inferred absences will have more impact on improving model quality if the model without inferred absences has a low quality (low or medium AUC) We did not find any other relationship between the predictive performance of the OR models and species ecological characteristics. However, we focused on a selection of forest specialists, thus a small number of species resulting in low statistical power.
Another study on 195 birds species showed a significant impact of body size, diet and habitat specialization on detectability (Johnston et al., 2014). Moreover, lower population density, small body size or less charismatic species may also result in lower detection (Fitzpatrick, Preisser, Ellison, & Elkinton, 2009;Steger, Butt, & Hooten, 2017 Lobo and Tognelli (2011) suggest that the number of pseudo-absences may be more important than their location, covering a large environmental gradient is nevertheless preferable to limit biases (Bradter et al., 2018;Lobo & Tognelli, 2011;Thuiller, Brotons, Araújo, & Lavorel, 2004). Here, we selected the observers to which we sent the questionnaire not only based on the number of records they had reported, but also based on the spatial distribution of their observations, to achieve a wide spatial spread. This simple method focuses more on quality than quantity, with quite few inferred absences but covering a wide area, increasing the chances that the environmental space is well covered. This method can be easily applied to other contexts or taxa, and requires only a questionnaire answered by a small subset of the most active observers. However, it may not be applicable to very common species, as few reporters may consistently report such species. It may then be impossible to obtain enough inferred absences or to achieve a wide enough spatial distribution of absences. Alternatively, inferred absences based on historical presences of other species in atlas data may be applicable (Frimpong, Huang, & Liang, 2016;Huang & Frimpong, 2015). We however encourage observers to report non-detections or use check-lists, as this additional information would likely increase the robustness of models based on OR data (Isaac & Pocock, 2015).
In conclusion, although OR data contain biases, models based on data from OR can provide reliable predictions of habitat suitability. This is especially true for rare species difficult to detect without a high observation effort in time and space. We think that OR data should be seen as a complement to SP, rather than an alternative, as the weakness of one is the strength of the other (Miller, Pacifici, Sanderlin, & Reich, 2019). Different methods also exist to combine these data (Fletcher et al., 2019;Isaac et al., 2020;Miller et al., 2019b). Moreover, OR often promotes the establishment of long-term networks of observers and provide an information exchange platform between scientists, society and conservationists (Sullivan et al., 2009). However, it is clear that OR programmes cannot fully substitute long-term SP programmes, the latter often designed to answer specific ecological questions dependent on replicated sampling (Bayraktarov et al., 2019).

ACK N OWLED G M ENTS
We thank all the volunteer observers for reporting species in the Presence-only data and inferred absences from opportunistic reporting for the eight species are also accessible from dryad.