The Species Awareness Index (SAI): a Wikipedia-derived conservation culturomics metric for public biodiversity awareness

Threats to global biodiversity are well-known, but slowing currents rates of biodiversity loss remains an ongoing challenge. The Aichi Targets set out 20 goals on which the international community should act to alleviate biodiversity decline, one of which (Target 1) aimed to raise public awareness of the importance of biodiversity. Whilst conventional indicators for Target 1 are of low spatial and temporal coverage, conservation culturomics has demonstrated how biodiversity awareness can be quantified at the global scale. Following the Living Planet Index methodology, here we introduced the Species Awareness Index (SAI), an index of changing species awareness on Wikipedia. We calculated this index at the page level for 41,197 IUCN species across 10 Wikipedia languages, incorporating over 2 billion views. Bootstrapped indices for the page level SAI show that overall awareness of biodiversity is marginally increasing, although there are differences among taxonomic classes and languages. Among taxonomic classes, overall awareness of reptiles is increasing fastest, and amphibians slowest. Among languages, overall species awareness for the Japanese Wikipedias is increasing fastest, and the Chinese and German Wikipedias slowest. Although awareness of species on Wikipedia as a whole is increasing, and is significantly higher in traded species, over the period 2016-2020 change in interest appears not to be strongly related to the trade of species or animal pollinators. As a data source for public biodiversity awareness Wikipedia could be integrated into the Biodiversity Engagement Indicator, thereby incorporating a more direct link to biodiversity itself.


25
Threats to global biodiversity are well-known, but slowing currents rates of biodiversity loss 26 For each species page returned, we calculated the daily average views for each month, and 117 then kept only those species pages for which the series was represented for all months (see 118 Figure S1 for the number of complete series). We used daily 119 average views rather than total views since the Wikipedia pageview API does not always 120 return views for all days in a given month. 121

Supplementary Information
To account for the overall change in Wikipedia's popularity and use, we also downloaded the 122 daily user views for a random set of 11000 pages in each language, using the Wikipedia 123 Random API 3 to request random pages. We then aggregated these views in the same 124 manner as the daily average species views, and again kept only pages represented across 125 the whole time series. From this random set of views, we then removed any page also 126 appearing in the set of species pages for that language. We initially sampled 11,000 pages 127 to maximise the number of remaining pages after removing incomplete series and species 128 pages. 129

Pollinator and wildlife trade datasets 130
To explore how species awareness varied with pollination contribution, we built a list of reptiles, and harvested ray-finned fish. We then retrieved the Wikidata ID for each of these 136 traded species using the Wikipedia API, which we merged onto each species page. In the 137 following paper we henceforth refer to any species that pollinates as providing a "pollination 138 contribution", and any species in either Schefers et al (2019) or the FAO statistics as 139 "traded". 140

Calculating absolute awareness of biodiversity 141
Before calculating the SAI we briefly explored absolute awareness of biodiversity among 142 taxonomic classes, pollination contribution, and trade status. We defined "absolute 143 awareness" as the total views for a species page on Wikipedia in the period 1 st July 2015 -144 30 1st March 2020. We merged total views for each species page with the taxonomic class, 145 trade status, and pollination contribution of that species, and then built two generalised linear 146 mixed-effects model: 1) modelling log 10 total article views as a function of taxonomic class, 147 trade status (Y/N), the interaction of class and trade, and a random effect for language; and 148 2) modelling log 10 total article views as a function of taxonomic class, pollination contribution 149 (Y/N), the interaction of class and pollination, and a random effect for language. Rather than 150 attempting to find the most parsimonious model, we present full model predicted values, with 151 AIC values for these and a set of candidate null models included in the Supplementary 152 Information (Tables S6 and S7). In the Supplementary Information (Figures S11 and S12) 153 we also present boxplots for the distribution of total views among taxa for each language. 154

Deriving the Species Awareness Index (SAI) 155
The Species Awareness Index (SAI) is a new measurement of change in species 156 awareness, calculated at the species page level from the rate of change in daily average 157 Wikipedia views per month. Since the SAI measures the rate of change in views within a 158 species page, species are weighted equally irrespective of their popularity, meaning highly 159 viewed species do not dominate the SAI. In the remainder of this paper we use the term 160 'SAI' or 'Species Awareness Index' to refer to the overall change in awareness for a given 161 species page, species, or group of species on Wikipedia. Specifically, we use the term 162 "species page SAI" to refer to rate of change at the page level, the term "species SAI" to 163 refer to the average of all species page SAIs for a unique species among languages, and 164 "overall SAI" to refer to a bootstrapped group of species SAIs (see Figure 1). We also use 165 the term "average monthly rate of change in the species page SAI" to refer to the average 166 rate of change for a single species page across a given time period. All of the above are 167 distinct from absolute interest in a given species or group of species (i.e. the total Wikipedia 168 views over the whole time series). 169 Figure 1. A schematic describing how the species page, species, and overall SAI were derived using Wikipedia views. The species 171 page SAI represents the random adjusted trend for a given species in a given language, the species SAI is the average of species 172 page SAIs for a single species across languages, and the overall SAI is a group of bootstrapped species SAIs. 173 change over time for each species in 6 taxonomic groups (amphibians, birds, insects, 176 mammals, ray-finned fish, and reptiles) on 10 Wikipedia languages (Arabic, Chinese, 177 English, French, German, Italian, Japanese, Portuguese, Russian, Spanish). The "rlpi" 178 package applies a generalised additive model (GAM) to smooth the daily average species 179 page view trends, using k = N/2 for the degrees of freedom parameter, following (Collen,  where Iat = the species page SAI at time t. 204 To account for differences in the tortuosity of trends among Wikipedia languages (see 205 Supplementary Information, Figure S7), we also smoothed the species page SAI in each 206 Wikipedia language using a loess regression (span = 0.3), before transforming the smoothed 207 species page SAI back into a rate of change. 208 After smoothing the species page SAI as above, we then calculated a species SAI for each 209 species (across languages) by averaging rates of change at each time step across all 210 languages. For example, the species Panthera tigris has the unique Wikidata 'Q19939', 211 meaning the average rate of change in SAI for all species pages (irrespective of language) 212 identified as 'Q19939' provides the overall rate of change for the species Panthera tigris. 213 We then calculated an overall SAI combining all species across 10 Wikipedia languages by 214 averaging rates of change across all species SAIs. Bootstrap confidence intervals were 215 calculated by taking the 2.5 th and 97.5 th percentiles of 1000 bootstrapped indices at each 216 timestep. To check the extent to which single languages influence the overall SAI, we then 217 jack-knifed the overall SAI for language, and removed any languages with a marked effect 218 on the overall trend (see Supplementary Information, Figure S6). 219 Using the same approach as above, we also calculated an overall SAI for each taxonomic 220 class for all languages combined, and each taxonomic class in each language. For each taxonomic class we again averaged the loess smoothed rate of change in species page SAI 222 among languages, and then bootstrapped the species rate of change in SAI at each time 223 step for each taxonomic class, as above. To check the extent to which single languages 224 influence class level trends, we again jack-knifed the overall SAI for language, and removed 225 any languages with a marked effect on the overall trend (see Supplementary Information, 226 Figure S8). To calculate an overall SAI in each taxonomic class in each language, we 227 bootstrapped the rate of change in species page SAI for the set of species pages in a given 228 class-language combination. 229

Predicting average monthly rate of change in the SAI 230
After calculating the SAI for all species pages on Wikipedia, we then calculated an average  (Tables S8-S10).  Figure S1 for a full language breakdown). After subsetting for 257 series represented for every month, the proportion of complete series was lowest in the 258 Arabic Wikipedia, specifically the ray-finned fishes (~35%) and the reptiles (~38%). Most 259 taxonomic classes for most languages had complete series in at least 80% of the species in 260 that grouping ( Supplementary Information, Figure S1). 261 After removing pages also present in the species set, our set of random views consisted of 262 ~2.82 billion views across 113,622 random pages ( Supplementary Information, Figure S3), 263 again for the same 1735 day period. The total number of random views was highest for the 264 English Wikipedia at ~629.85 million views, and lowest in the Arabic Wikipedia at ~87.94 265 million views ( Supplementary Information, Figure S3). After subsetting for only random 266 pages represented for all months, total random pages varied from 3486 in the Arabic 267 Wikipedia to 9174 in the Japanese Wikipedia ( Supplementary Information, Figure S4).

Absolute awareness of biodiversity 281
Among taxonomic classes, reptiles have consistently higher absolute awareness, appearing 282 in the top 2 classes for 7/10 languages ( Supplementary Information, Figure S11). 283 Amphibians on the other hand have consistently lower awareness, appearing in the bottom 2 284 classes for 8/10 languages. Some languages appear to have uniquely high absolute 285 awareness for specific classes. For example, the ray-finned fish have the highest absolute 286 awareness in the Japanese Wikipedia ( Supplementary Information, Figure S11). Across all 287 languages, absolute awareness (total views) is significantly higher in traded species ( Figure  288 2; F = 15206.44, p < 0.001, Table S2), but not significantly different in pollinating species (F 289 = 0.3869, p = 0.5339, Table S1).

Predicting average monthly rate of change in species page SAI 338
Average monthly rate of change in species page SAI for the period January 2016-January 339 2020 differed significantly for all of taxonomic class, language, and their interaction ( Figure  340 7, Table S5). At the level of taxonomic class, the reptiles and ray-finned fishes are increasing 341 in awareness the fastest, and the insects and amphibians are either increasing slowly or 342 declining (with the exception of the Japanese Wikipedia). Among languages, rate of change 343 in species page SAI is highest in the Japanese and Portuguese Wikipedias, and lowest in 344 the German and Chinese Wikipedias. Although absolute interest is significantly greater in 345 traded species (See Figure 3, Table S2), over the period January 2016-January 2020 346 average monthly rate of change in the species page SAI appears not to be related to either 347 trade status ( Figure S14, Table S4) or pollination contribution ( Figure S14, Table S3).