Birds that are more commonly encountered in the wild attract higher public interest online

Large body size, the defining characteristic of “charismatic megafauna,” is often viewed as the most significant correlate of higher public interest in species. However, common, local species (many of which are not large) can also generate public interest. We explored the relative importance of body size versus local occurrence in patterns of online interest in birds using a large sample of digital human‐wildlife interactions (367 million Wikipedia pageviews) that included more than 10,000 bird species and a range of cultural and geographic contexts (represented by 25 Wikipedia language editions). We compared interest in Wikipedia, as measured by pageviews, with a bird's body size and its regional observation frequency (using data from eBird.org). We found that local species (i.e., those that occur in the wild in the country responsible for the majority of a Wikipedia language edition's pageviews) attract more pageviews than global species. Both body size and observation frequency had a positive correlation with Wikipedia pageviews across languages, but eBird observation frequency explained more of the variance in pageviews on average. In a model that included both observation frequency and body size, observation frequency was a significantly better predictor of pageviews than body size in 24 of 25 languages. Our results demonstrate that the opportunity to encounter birds in the wild is a significant correlate of increased online interest in birds across multiple linguistic and geographic contexts. This relationship provides insight into why some species attract greater interest than others and emphasizes the overlooked potential of common species in conservation marketing.


| INTRODUCTION
Understanding why some species command greater public interest than others have long been of interest to conservationists (Kellert, 1982;Lorimer, 2007;Macdonald et al., 2015). Identifying the biological and ecological traits of species that correlate with increased public interest can benefit conservation marketing and be used to identify potential flagship species (Smith, Veríssimo, Isaac, & Jones, 2012;Verissimo et al., 2013). Previous studies have evaluated the role of biological traits such as coloration (Lišková & Frynta, 2013), venomousness (Roll et al., 2016), and the perceived attractiveness of species (Gunnthorsdottir, 2001) in determining public interest. While a variety of traits have been recognized as correlating with increased interest, large body size is frequently identified as the most important (Berti, Monsarrat, Munk, Jarvie, & Svenning, 2020;Clucas, McHugh, & Caro, 2008;Macdonald et al., 2015). Indeed, the widespread use of the term "charismatic megafauna" to describe species that attract high public interest suggests that large body size is often viewed as synonymous with increased public interest in the conservation community.
The importance of large body size suggests that direct interactions with animals in the wild may be relatively unimportant to determining interest in species. While large species are strongly represented in media (Clucas et al., 2008) and captive environments (Martin, Lurbiecki, Joy, & Mooers, 2014), they are often rare and infrequently encountered in the wild. In extreme examples, species such as tigers (Panthera tigris) and giant pandas (Ailuropoda melanoleuca) attract substantial public interest despite only a tiny minority of people ever observing them in the wild. Conversely, direct encounters with species in the wild do seem to drive public interest in some situations. The commonness of butterfly and bird species in the UK and Poland correlates with the attention these species receive online (Zmihorski, Dziarska-Palac, Sparks, & Tryjanowski, 2013). In Brazil, species that overlap with areas of higher human population density receive greater interest online, as measured by Google search frequency . This relationship between people's ability to directly encounter a species in the wild and their interest in it has relevance for conservation. If direct encounters are relatively unimportant, then emphasizing large and charismatic species as well as virtual encounters through various forms of media should be the top priority for engaging greater public interest. In this scenario, public interest in a species is unlikely to decline if the species becomes rarer or even goes extinct in the wild. Conversely, if direct encounters with wild animals are important for generating public interest, then approaches that emphasize direct encounters with species in the wild may be more effective for increasing public engagement in conservation. Furthermore, if public interest is closely linked to wild encounter rates then the declining numbers of wild animals (Dirzo et al., 2014) and trends of increasing urbanization and detachment from nature (Miller, 2005;Soga & Gaston, 2016) could lead to decreased interest in some species.
Conservation culturomics is a new research area that uses online digital data to investigate questions around human-nature interactions that are relevant to conservation . Previous studies have used conservation culturomic methods to assess people's interest in reptiles (Roll et al., 2016) and Brazilian birds , and to compare seasonal patterns in biodiversity awareness (Mittermeier, Roll, Matthews, & Grenyer, 2019). Here we used these methods to assess the relationship between online interest, body size and the frequency with which people have direct encounters species in the wild. Using birds as case study, we assessed patterns of public interest (as measured by Wikipedia pageviews) across a large number of digital interactions and a range of cultural contexts (25 different Wikipedia languages). We compared Wikipedia pageview data to information on the frequency with which birds are observed in the wild derived from eBird, an online database of avian distributions. With more than 700 million records, eBird is the world's largest biodiversity-related community science project and provides occurrence data for birds around the world (Levatich & Ligocki, 2020;Sullivan et al., 2009;Wood, Sullivan, Iliff, Fink, & Kelling, 2011). Multiple studies have used eBird data to monitor patterns in avian distribution and abundance at a variety of scales (eBird, 2020).
We tested three hypotheses around the importance of direct encounters in generating online interest in birds. First, we assessed whether the geographic distribution of birds impacted their Wikipedia pageviews. If wild encounters are unimportant, we predicted that the local occurrence of bird species would have a minimal impact on their pageviews; if direct encounters are important, we predicted pageviews would be skewed toward local species (i.e., those that occurred in the wild in a given region). Second, we tested for correlations between pageviews and body size/eBird frequency. We predicted a positive correlation between pageviews and body size if indirect encounters are important and a positive correlation between pageviews and eBird frequency if direct encounters with species are important. Third, in instances where we had both body size and eBird frequency data, we tested the relative importance of both variables in predicting Wikipedia pageviews in a combined model. Here, we predicted that body size would explain more variation in pageviews if direct encounters are less important and eBird frequency would explain more variation if direct encounters are more important.
Previously, Mittermeier et al. (2019) identified temporal correspondence between eBird sightings and Wikipedia pageviews for a sample of migratory birds in Italy, Germany, Sweden, and the United States. Their study demonstrated that, for some migratory birds, the physical presence of the species in a region correlated with increased public interest. This study builds on these findings by evaluating the relationship between the observation frequency and online interest across a much larger range of languages (25 language editions), geographic regions (25 regions including both temperate and tropical countries) and species (more than 10,000 bird species, nearly the entire global diversity). Thus, we provide a generalized understanding of the relationship between wild encounter frequency and public interest in birds irrespective of the phenomenon of migration and test the importance of the relationship across a range of linguistic and geographic contexts.

| Data selection and extraction
We used the number of pageviews that a page receives in Wikipedia as a measure of online interest (e.g., Mittermeier et al., 2019;Roll et al., 2016). Wikipedia does not include geographic information with its pageviews, but Wikipedia editions are constructed in different languages and summary data are available that list the proportion of pageviews each language edition receives by country (Zachte, 2020). Following previous studies, we used these country-level summaries as a coarse proxy for geography by pairing Wikipedia language editions with the geographic region that accounts for the majority of a language edition's pageviews (Generous, Fairchild, Deshpande, Del Valle, & Priedhorsky, 2014;Mittermeier et al., 2019).
We selected 25 Wikipedia editions for non-artificial languages that, as of June 22, 2019, had over 100,000 articles, a Wikipedia editing depth higher than 10 (a measure of the language edition's quality; Wikimedia 2019), and more than 50% of the language's pageviews originating from a single country (Appendix S1; Wikipedia, 2020;Zachte, 2020). Selecting language editions with more than 50% of their pageviews originating from a single country provided higher confidence in the geographic origin of the pageviews in our data set. It also resulted in several widely spoken languages with large Wikipedia editions not being included in our data set (e.g., Spanish, English).
We identified pages for bird species in Wikipedia using the Wikidata Query Service (https://query. wikidata.org/) to extract a list of entities tagged with an eBird taxon ID (Wikidata property: P3444) on June 23, 2019. Wikidata is a secondary database that collects structured data for Wikimedia projects, including all Wikipedia language editions (https://www.wikidata.org). We cross-referenced our list of Wikidata entities with the eBird/Clements global avian taxonomy (Clements et al., 2018) to ensure that non-bird pages were not included. We obtained page links for Wikipedia pages in languages that met our criteria and downloaded pageviews for each page for the period between July 1, 2015 and June 22, 2019 (1,453 days) using "pageviews" in R (Keyes & Lewis, 2016).
To obtain eBird data, we downloaded the eBird Basic Data set (version April 2019) with records for all species and all years (eBird, 2019) for the regions associated with each of our 25 Wikipedia language editions. In most cases, we defined a region as a single country, however, in rare instances where the distribution of a language corresponded strongly to a specific subnational region, we downloaded data for that region rather than for the entire country (e.g., Tamil Wikipedia was paired with eBird records from Tamil Nadu rather than all of India). We limited our analyses to regions with a minimum of 10,000 unique sampling events in eBird, and to species that appeared in more than 10 sampling events in the region's eBird data set. These thresholds helped minimize biases present in smaller eBird data sets and reduce instances where incorrectly identified species had not yet been removed via eBird's review process (Wood et al., 2011). Bird pages in each Wikipedia language edition were assigned as either "local" if they occurred on the eBird list for the associated region or "global" if they did not (eBird, 2019). For local species, we calculated the observation frequency of a species as the total reports of that species in the region divided by the total unique sampling events in the region (Sullivan et al., 2009). This approach provided a rough overall measure of observation frequency for all local species in region. It did not account for seasonal variations (such as for migratory species) or for how species were distributed within the region. Since global species by definition had no sampling events in a region, they did not have an eBird frequency for that region in our data set. We obtained bird body mass data from Dunning (2008).

| Data analysis
We explored the relationship between Wikipedia pageviews, eBird sightings, and avian body mass across each of the 25 language-region pairs in our data set. Since language editions varied substantially in their views, we assessed patterns in each language separately. Furthermore, since our data were sparse (we only had eBird frequency data for local species, and we did not have body mass data for all species) we analyzed data for each language in four separate analyses. First, for all of the bird Wikipedia pages in a language, we looked at the overall pattern between pageviews for global as opposed to local species and compared the median pageviews for local and global species using a Wilcoxon rank-sum test. Second, for all the birds with Wikipedia pages in a language for which we had eBird sighting frequency (i.e., the local species that occurred in the region associated with the language), we tested the relationship between eBird sighting frequency as a predictor and Wikipedia pageviews as a response variable using a simple linear regression model. Third, for all the birds with Wikipedia pages in a language for which we had body mass data (this included both local and global species), we modeled the relationship between body mass (predictor) and Wikipedia pageviews (response) using a simple linear regression model. Since we had body mass data for both global and local species in each language, we tested whether a species being classified as local or global influenced the relationship between body mass and pageviews using an analysis of covariance (ANCOVA). Fourth, for the birds with Wikipedia pages in a language for which we had both eBird frequency data and body mass data (i.e., local species with body mass data) we used a combined linear regression model with eBird frequency and body mass as predictors of Wikipedia pageviews. In the combined model, we compared the relative influence of eBird frequency and body mass as predictors of Wikipedia pageviews using variation partitioning (Oksanen et al., 2017). For all analyses, we normalized variables using a logit transformation for proportions (eBird frequency) and a log transformation for counts and continuous variables (Wikipedia pageviews, body mass). Final models were tested for assumptions of normality and homoscedasticity of residuals.

| Data selection and extraction
Our data set contained 78,415 Wikipedia pages for birds across 25 languages (Appendix S2). In total, 10,174 bird species had at least one Wikipedia page in our data set (96.1% of the total bird species; Clements et al., 2018). Numbers of bird pages per language varied from 103 (Hindi) to 10,103 in Dutch (mean pages per language 3,137, SD 3,034). Wikipedia pages in our data set received 367 million views over the sampling period (views per language 417,000-70.9 million; mean 14.7 million, SD 18.5 million). With the exception of Brazil (which accounted for 79.7% of the pageviews for Portuguese Wikipedia), all of our geographic regions were in Europe (15 regions) and Asia (nine regions).
Our eBird data set contained 2.3 million unique sampling events with records of 4,340 bird species. Unique sampling events per region ranged from 10,600 (South Korea) to 817,000 (India; mean 91,900, SD 169,000). eBird species richness per region ranged from 313 species (Czech Republic) to 1,716 (Brazil; mean overall 615, SD 374). Not all species that occurred in a region had a Wikipedia page in the associated language. The proportion of eBird species in a region with Wikipedia pages varied from 7.62% of species (Hindi-India) to >98% of species in eleven language-region pairs (mean overall 82.5%, SD 25.6%). We obtained body mass data for 9,406 species.

| Wikipedia pageviews for local vs. global species
Our data set contained many more Wikipedia pages that we classified as global (70,250 global vs. 8,165 local). Despite this, pages for local species received more views than those for global species (218 million vs. 149 million pageviews; mean pageviews for local species 26,700, SD 74,900; mean pageviews for global species 2,120, SD 14,300). The mean pageviews for local species was higher than the mean pageviews for global species in all 25 languages, with the difference between the means being statistically significant in all but two languages (Wilcoxon rank-sum p < .01; mean pageviews local 723-131,000; mean pageviews global 106-16,600; Figure 1, Appendix S3). The effect size of the difference in mean pageviews for local as opposed to global species was either large or moderate in 20 languages (large defined as >0.5, moderate as 0.3-0.5).

| eBird frequency as a predictor of Wikipedia pageviews
For local birds in our data set (i.e., those for which we had eBird frequency data) we tested the relationship between eBird frequency as a predictor and Wikipedia pageviews as a response variable using a linear regression model. The number of species on a region's eBird list with Wikipedia pages ranged from 82 to 1,138 (mean 399, SD 216; total local pages across all languages 8,165). eBird frequency showed a significant positive correlation with Wikipedia pageviews in all 25 languages (p < .01). Across languages, a linear regression model with eBird frequency as a predictor explained between 6.9 and 49.0% of the variance in Wikipedia pageviews for bird pages (adj. R 2 = .07-.49, mean 0.25, SD .13; Figure 2, Table 1).
The most frequently reported birds in eBird in each language received a high proportion of the Wikipedia pageviews for birds. Among local species, the 20 most frequently recorded species in eBird accounted for 20.2 to 49.8% of the pageviews for local bird species, while making up 3.14 to 20.1% of the local bird species pages (mean proportion of views 40.0%, SD 8.24%; mean proportion of species 13.1%, SD 5.09%). This pattern was also evident when considering the total bird Wikipedia pages in a language (i.e., both local and global). Across languages, the 20 species with the highest eBird frequency accounted for 3.16 to 19.6% of the total Wikipedia pageviews for bird species, while consisting of only 0.20 to 4.20% of the total bird species pages (mean proportion of views 13.36%, SD 4.66%; mean proportion of species 1.29%, SD 1.14%; Appendix S4). F I G U R E 1 Bird species that occur in the region responsible for the majority of a Wikipedia language edition's pageviews ("local" species) attract more pageviews than bird species that do not occur in the region ("global" species). Points indicate outliers occurring > 1.5x the interquartile range beyond the median. Statistical significance of the difference between means calculated using a Wilcoxon rank-sum test with p < 0.01; two language editions where difference between means was not statistically significant are marked with an asterisk (Malay and Hindi) F I G U R E 2 eBird observation frequency correlates positively with Wikipedia pageviews for local bird species across multiple Wikipedia language editions. In a linear model with eBird observation frequency (logit transformed) as a predictor of pageviews (log transformed), observation frequency predicted between 49% and 6.9% of the variance in pageviews (top three rows show 9 language editions with the highest adj. R2 in the linear model; bottom row shows 3 languages with the lowest adj. R2 in the linear model; see Table 1 for full results)

| Body mass as a predictor of Wikipedia pageviews
For species in our data set for which we had body mass data, we tested the relationship between body mass (as a predictor) and Wikipedia pageviews (as a response) using a linear regression model. The number of species per language for which we had body mass data varied from 99 to 9,403 (mean 2,993, SD 2,875; 74,830 pages total). Body mass showed a significant positive correlation with Wikipedia pageviews in 21 of 25 languages (p < .01; Table 2). In languages where the relationship was significant, body mass described between 2.1 and 24.5% of the variance in pageviews (p < .01, adj. R 2 = .02-.25, mean 0.11, SD 0.08). The interaction between local-global and body mass was significant in 12 of the 25 languages (p < .01). In instances where the relationship was significant, body mass had a stronger positive relationship with pageviews for global species than for local ones in all but one language (local mean coefficient 0.18, SD 0.11, global mean coefficient 0.34, SD 0.05; Appendix S5).

| eBird frequency versus body mass as predictors of Wikipedia pageviews for local species
For all species for which we had both body mass and eBird frequency data (i.e., local species with body mass data; 2,563 species with 8,014 pages), we fitted a linear regression model that included both eBird frequency and body mass as predictors of Wikipedia pageviews. This joint model was significant in all 25 languages (p < .01) and explained between 10.5 and 54.9% of the variance in Wikipedia pageviews for birds across languages (adj. R 2 = .10-.55, mean 0.33, SD 0.13). We used partition of variance using partial linear regression to assess the relative contributions of body mass and eBird frequency as explanatory variables. For all languages except one (Portuguese), eBird frequency explained more of the variance in pageviews than body mass (adj. R 2 frequency j mass 0.07-0.52, mean 0.28, SD 0.13; adj. R 2 mass j frequency − 0.01-0.26, mean 0.08, SD 0.06; Table 3). Often the difference between the two was substantial; in 14 languages eBird frequency explained over three times more variance than body mass.

| DISCUSSION
We outlined three hypotheses for assessing the relative importance of a physical trait (body size) versus the potential to encounter a species in the wild in determining public interest in birds. For all three hypotheses, our results strongly supported the importance of encounter frequency over body size in determining public interest online. First, we observed a clear geographical pattern in the distribution of Wikipedia pageviews for birds, with local species attracting more interest than pages for global species across languages (Figure 1). Second, while both body size and eBird frequency (Figure 2; Tables 1 and 2) correlated positively with increased pageviews; eBird frequency explained more of the variance in pageviews across languages (mean adj. R 2 eBird frequency 0.25, SD 0.13; mean adj. R 2 body size 0.11, SD 0.08). Third, when eBird frequency and body size were compared directly in a combined model, the former explained significantly more variance in pageviews in 24 of 25 languages in our data set (mean adj. R 2 frequency j mass 0.28, SD 0.13; mean adj. R 2 mass j frequency mean 0.08, SD 0.06; Table 3). In several language editions, the difference between eBird frequency and  (Oksanen et al., 2017).
body size as predictors of Wikipedia pageviews in this combined model was substantial: in German Wikipedia eBird frequency explained 52% of the variance in Wikipedia pageviews for local species when used as the sole predictor in the combined linear model, while mass explained only 6% of the variance. Fitting the established wisdom in conservation (e.g., Berti et al., 2020) our results found that body size correlated positively with online interest. Notably, some of the most-viewed pages in several languages were rarely encountered, global species. In a striking example, the Dodo (Raphus cucullatus), a relatively large bodied, extinct species, received the most pageviews of any bird page in both French and Portuguese Wikipedias. Thus, indirect representations such as appearances in media and popular culture clearly drive interest in some species and can even determine the most viewed species pages overall. However, when assessed across larger numbers of species, our results clearly showed that body size was secondary to the potential for direct wild encounters in predicting Wikipedia pageviews for birds. While high public interest in locally common species has been shown in previous studies using online data (e.g., Correia et al., 2016;Zmihorski et al., 2013), we demonstrated that this pattern is present across a wide range of cultural and geographic contexts.
The positive correlation between wild encounter rates and online interest could prove relevant for conservation policy in two ways. First, from a methodological standpoint it demonstrates the importance of taking encounter frequency into account when measuring public interest in species. Second, it highlights the importance of local and frequently encountered species in attracting people's attention to the natural world. Few if any of the 20 most frequently observed birds in the regions in our data set would qualify as "charismatic megafauna" under most criteria, and yet they accounted for between 10 and 20% of the total bird pageviews in most languages. As a result of the interest they attract, these commonly observed species could act as entry points for people's interactions with biodiversity. Conservation initiatives that improve opportunities for people to interact with common, local species (such as the development of public parks and greenspaces) could help increase public support for conservation.
There are several caveats to consider when interpreting our results. Our method of matching Wikipedia languages to regions provided only rough geographic resolution. For each language, a percentage of pageviews came from outside of the associated region. Furthermore, species that are frequently encountered in one part of a region may be absent from another, particularly in large regions such as Brazil. Using culturomic resources with more fine-scale geographic resolution could be an approach to investigating these patterns at finer scales (e.g., Correia et al., 2016). eBird data also have important biases (Sullivan et al., 2009). eBird users may be more likely to visit certain locations, report some species over others, and the coverage and types of users may differ between regions. It is also important to consider the ways in which people use Wikipedia as opposed to other online resources (Correia et al., 2021). Wikipedia pageviews reflect people's attention and their searches for additional knowledge, but not necessarily their preferences. High interest in a species could equally result from people liking a species as it could from people considering it is a pest and wanting to remove it. Thus, additional context is required understand the drivers behind high pageviews for species. Finally, it is possible that interactions with wild animals are more important for birds than for other taxonomic groups. Birds are more easily observed and identified in the wild than many other organisms and generate unique forms of human-nature interaction through activities such as birdwatching and bird-feeding (Cocker, Tipling, Elphick, & Fanshawe, 2013). It is possible that observation frequency may have a stronger positive correlation with online interest for birds than it does for other groups of organisms. Future studies may be able to address these caveats and expand upon our results to investigate the mechanisms underlying the high online interest in local bird species.
Our results highlighted the novel insights that are possible using the large analytical scales enabled by conservation culturomics methods. The relationship between sighting frequency and increased public interest may be obscured when public interest in species is assessed using small or unrepresentative samples. In smaller samples, traits of extreme outliers (such as the Dodo) may outweigh broadscale patterns in the data. Thus, our findings highlighted the value of large-scale analyses across many species and over broad cultural and geographic contexts. Access to large digital sources such as Wikipedia, together with powerful new analytical tools hold much promise for further novel research insights using conservation culturomics (Correia et al., 2021).
Future studies can build on our results by using more focused studies to investigate the causal relationships between public interest and people's wild encounters with species. If there is a causal relationship between wild encounters with species and increased interest, as the correlations we identified suggest, this relationship may hint at an upcoming challenge for conservation: populations of many species are declining at alarming rates resulting in fewer opportunities for people to encounter species in the wild (Dirzo et al., 2014;Inger et al., 2015;Rosenberg et al., 2019). These trends are exacerbated by urbanization and the "extinction of experience" that results from increasingly limited interactions with wild nature (Miller, 2005;Soga & Gaston, 2016). These trends could result in an "extinction of interest" in some species as opportunities for direct interactions with them decline. Reinforcing the value of common species in conservation could help to combat this (Gaston, 2010). Creating chances for people to engage with local and common species and through activities such as bird feeding are often feasible and inexpensive (Cox & Gaston, 2016;Miller, 2005). By promoting these wild interactions, the relationship between wild encounters and public interest could offer opportunities to increase people's awareness of nature and engage broader support for conservation initiatives.

ACKNOWLEDGMENTS
We are grateful to Wikipedia and eBird for allowing open access to their databases and to the many volunteers whose contributions to Wikipedia and eBird made this study possible.

CONFLICT OF INTEREST
There are no competing interests.
AUTHOR CONTRIBUTIONS John C. Mittermeier, Uri Roll, and Richard Grenyer designed the study and research questions. John C. Mittermeier and Thomas J. Matthews designed the methods and statistical analyses. John C. Mittermeier compiled and analyzed the data and prepared the figures. John C. Mittermeier, Uri Roll, Thomas J. Matthews, and Ricardo Correia wrote the manuscript.

DATA AVAILABILITY STATEMENT
All data used in this study is open access.

ETHICS STATEMENT
No ethics approval was required for this research.