Online wildlife trade in species of conservation concern

Online wildlife trade is widespread and affects thousands of species. Yet, attempts to quantify online wildlife trade have mainly focused on a few platforms and taxonomic groups. Here, we study the prevalence of wildlife trade using automated data collection and filtering methods. We analyze trade across five digital platforms and 156 animal and plant species of conservation concern from a global biodiversity hotspot, the Philippines. We identified approximately 5000 highly relevant instances of trade in 1.47 million posts, focusing on 108 species, 79 of which are classified as threatened. Trade mainly occurred on webpages indexed in Google and on Twitter. We found that manual validation is essential, as animals obtained a higher proportion of hits prior to validation. Following manual validation, we observed a shift toward plant‐related trade hits. Scaling up these approaches to a global level is key to understanding the extent of digital wildlife trade across the globe.


INTRODUCTION
Wildlife trade affects thousands of animal and plant species, with current estimates indicating that it involves one in every four terrestrial vertebrate species on Earth, and one in every five threatened plant species evaluated by the International Union for the Conservation of Nature (IUCN) (Fukushima et al., 2020;Maxwell et al., 2016;Scheffers et al., 2019).The trade of live specimens and derived products, such as hunting trophies, food, clothing, ornaments, pets, or traditional remedies, generates huge revenues, with legal and illegal trade worth billions of dollars annually (Haken, 2011;Hughes, 2021).While the trade of some wildlife species is sustainable and supports livelihoods in many regions of the world (Hughes, 2021), unsustainable harvesting is also common, and it has been linked to decreases in species occurrence and density which can ultimately lead to population extirpations (Cardoso et al., 2021;Wilcove et al., 2013;Wittemyer et al., 2014).Over the last decades, wildlife trade has surged, and its nature has rapidly changed (Harfoot et al., 2018).The Internet has greatly increased and globalized the opportunities for wildlife trade by facilitating communication between dealers and buyers (Lavorgna, 2014;Yu & Jia, 2015).Ecommerce websites and social media sites have become the main markets for wildlife products (Harrison et al., 2016;Hinsley et al., 2016;Sung & Fong, 2018).Wildlife dealers use digital platforms to post information on wildlife products, interact with customers, and create trade networks (Lavorgna, 2014).Digital platforms can provide anonymity, which reduces the risk of liability for individuals that illegally trade protected species, such as most cases of international trade of species included in Appendix I of the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) (Xiao et al., 2017;Yu & Jia, 2015).
There is ample evidence of ongoing wildlife trade on social media sites and e-commerce platforms (Di Minin et al., 2018, 2019;Marshall et al., 2020).For example, over 70,000 individuals from 95 protected species of parrots and turtles were found on sale in just 150 days on a Chinese e-commerce platform (Ye et al., 2020).Similarly, numerous posts selling turtles, hawksbill shell, elephant ivory, and rhino horn were identified on Facebook and Internet forums (Sung et al., 2021;Xu et al., 2020).Besides animals, trade in plants is also widespread and often unsustainable.For example, 2625 plant species were being offered for sale on eBay over a period of 50 days (Humair et al., 2015), while online trade in products derived from orchids likely resulted in an estimated 90,000-180,000 wild orchids being destructively harvested (Masters et al., 2022).Most studies have focused on a small number of digital platforms, and few studies have investigated the prevalence of wildlife trade across digital platforms, languages, and taxonomic groups using automated methods (but see Hughes et al., 2021;Marshall et al., 2020Marshall et al., , 2022)).
Here, we explore the characteristics and nuances of wildlife trade in the digital realm, implement pipelines for data extraction, ranking and filtering, and discuss some of the limits of automating this process.We focus on a group of 156 animal and plant species known to be affected by wildlife trade from a global biodiversity hotspot, the Philippines (Figure 1).The Philippines has a disproportionate number of species threatened with extinction, accompanied by exceptionally high rates of endemicity (Myers et al., 2000).As a result, the Philippines has become a prime target for wildlife trade (Krishnasamy & Zavagli, 2020).We monitor the trade for these species across four digital platforms, Twitter, YouTube, Flickr, and TikTok, and a search engine, Google, leveraging advanced automated data collection and filtering methods.We use a broad spectrum of keywords provided by national conservation and law enforcement authorities, ranging from species scientific, English, and local names to specific cryptic codewords, combined with trade-related keywords, to search and retrieve data from these platforms.Posts usually include text along with image or video data, where relevant information related to the product and the seller are provided.These posts were subsequently filtered by automatically scanning their text content to find and match together the species and trade keywords and remove spurious results.We then used manual validation to robustly assess which identified posts were real cases of wildlife trade.

METHODS
We focus our search on 156 animal and plant species that were identified by the Department of Environment and Natural Resources-Biodiversity Management Bureau (DENR-BMB), the CITES Management Authority, of the Philippines as affected by wildlife trade (see Table S1 for a complete list).This list was built using the DENR-BMB's national database of confiscations and seizures related to wildlife trafficking, therefore, reflecting the vulnerability of these species to wildlife trade.It included 62 animal (35 species of mammals and birds, and 27 species of reptiles) and 94 plant species (58 species of orchids, and 36 other plant species).Overall, 51% of the animal and 15% of the plant species were classified as threatened (i.e., Vulnerable, Endangered, and Critically Endangered) by the IUCN, while the rest was classified as nonthreatened (i.e., Near Threatened and Least Concern) or Data Deficient.
F I G U R E 1 Proportion of species in each group that were searched in Google, Twitter, YouTube, Flickr, and TikTok, and their conservation status.
To retrieve information from Google, we used Google Custom Search API (https://developers.google.com).Google API has a limitation of returning only the top 100 relevant results for each search query.This can be problematic, as Google is a widely used search engine that not only indexes webpages from the entire internet, but also indexes other platforms, such as Facebook, Lazada, Shopee, Amazon, and many others.Thus, to expand our search and double the number of relevant results (200 instead of 100), we performed a search first for all webpages, and then limited it to webpages located in the Philippines.Twitter (https://developer.twitter.com),YouTube (https://developers.google.com/youtube),Flickr (https://www.flickr.com/services/api/),and TikTok (https://developers.tiktok.com),were scraped similarly, using their respective APIs.Each search query yielded a maximum of 10,000 posts for Twitter, 500 posts for YouTube and Flickr, and 5000 posts for TikTok.All data extraction was performed in May-July 2021.

Relevance score ranking, data filtering, and manual validation
After extracting all text and data from the posts, we automatically removed blank entries and duplicates by comparing each post Uniform Resource Locator (URL) against every other post URL-the URL is a unique identifier used to locate a resource on the Internet.Next, we implemented a relevance scoring and filtering algorithm to categorize the posts automatically into four categories: potentially relevant (score > 0.5), species names only (score = 0.5), trade names only (0.5 > score > 0), and irrelevant (score = 0).When the species name was present in the text, the post was assigned 0.5 points, and if any of the 19 trade-related keywords was present, the score increased by an additional 0.5/19.The maximum score that any post could obtain was 1 when all the trade terms, and one of the species names were mentioned.To improve the accuracy of our filtering algorithm and reduce the number of false negatives, we expanded the multi-worded species names and trade-related terms to include all permutations with underscores and hyphens, as well as variations with some or all of the spaces between words merged in a way that all possible combinations were included.This allowed us to more accurately identify posts that contained these species names.To further refine the results obtained from Google searches, we sorted the domains of all the webpages by frequency in descending order and manually reviewed the most common domains.We removed irrelevant domains, such as domains from scientific repositories and editorials (e.g., https://springer.com,https://researchgate.net, https://plos.org),news media (e.g., https://nytimes.com,https://news.mongabay.com),stock photography (e.g., https://canstockphoto.com,https://sciencephoto.com), and so on.
Potentially relevant results were retained for manual validation.For Twitter, YouTube, Flickr, and TikTok, all potentially relevant results were manually validated by accessing the relevant links to the posts.For Google, we randomly validated 15% of the posts and extrapolated the number of verified posts.To manually validate the posts, we read all the text associated with each post and screened all videos and images to determine whether the target species was being advertised for sale.Manually annotating each individual record could take from 1-5 s to 1-2 min, depending on the researcher expertise, the digital media platform employed, the content of the post, and the species depicted.The lead author classified > 90% of the posts (the other coauthors classified the remaining posts, and the lead author confirmed their relevance).The primary causes for the lack of relevance in posts containing both species-related and trade-related keywords were the species name coinciding with a brand name, geographic location, or product, the keywords having differing meanings in other languages (e.g., "sale" means "exits" in Spanish), or posts discussing species trade from a conservation perspective.We extracted additional relevant information, such as the type of product that was traded.
Finally, we applied data pseudonymization to comply with the European Union General Data Protection Regulation (EU GDPR) (Di Minin et al., 2020).To protect the privacy of the users who wrote the posts or webpages, we hashed all information that could potentially reveal their identity.This included usernames, links to the posts or profiles, post or user IDs, and geolocation.By hashing this information, we minimized the risk that it could be used to directly identify the individuals involved.

RESULTS
Overall, we identified 1,470,287 posts (253,940 from webpages indexed in Google, 794,784 from Twitter, 95,182 from YouTube, 55,225 from Flickr, and 271,156 from TikTok, Figure S1) containing the targeted keywords (Figure 2).After implementing relevance score ranking and filtering algorithms (see details in the Methods section), on average 7.2% of the posts were classified as potentially relevant (i.e., the text contained both the name of the species and, at least, one trade keyword).This corresponded to 97,773 posts from Google, 5587 posts from Twitter, 320 posts from YouTube, 1050 posts from Flickr, and 1207 posts from TikTok (Figure 2).There was a wide variation in the proportion of potentially relevant posts across platforms, ranging from 38.5% for Google to 0.3% for YouTube.
After manual validation of the potentially relevant posts, we found that on average 4.7% of the posts contained real wildlife trade advertisements: 4546 for Google (extrapolated from 15% manually validated posts), 288 for Twitter, 42 for YouTube, 9 for Flickr, and 13 for TikTok (Figure 2).Thus, the relative importance of each digital platform substantially changed from data extraction to data validation, with Twitter and TikTok yielding the most data, but Google proving the most reliable source of information (Figure 2).From the websites indexed in Google where trade was verified, the most frequent domains were eBay (10.3%),Carousell (5.8%), Amazon (4.7%), Lazada (4.3%), Shopee (3.7%), and Facebook (3.1%).A similar pattern was observed when performing a comparison among taxonomic groups.While across digital platforms animal species obtained a higher proportion of hits than plant species, and this pattern remained after implementing the ranking and filtering algorithms, the proportions reversed after manually validating the posts, which revealed that trade of plants is more common than trade in animals (Figure 3).Indeed, after manual validation, we found 1440 positive posts for orchids, 2692 for other plants, 318 for mammals and birds, and 446 for reptiles (see the breakdown for each platform in Figure 3).
At the species level, we identified posts for all species, but after manually validating the data, we verified positive instances of trade for 108 out of the 156 (69%) species.The dissociation between the identified posts and the validated data also occurred at the species level (Figure 4).Animal species recorded the most hits, the However, the most traded species were plant species: agarwood (Aquilaria malaccensis), the staghorn fern (Platycerium coronarium), Alocasia zebrina, Hoya wayetii, and the tropical pitcher plant Nepenthes ventricose; with only one animal species, the reticulated python (Malayopython reticulatus), within the top 15 of most traded species in our dataset (Figure 4).Other traded animal species were the spiny waterside skink (Tropidophorus grayi), the saltwater crocodile (Crocodylus porosus), the southeast Asian box turtle (Cuora amboinensis), the Asian leaf turtle (Cyclemys dentata), and the common hill myna (Gracula religiosa).
From the 108 species for which we found valid trade examples, 79 species feature in CITES Appendices (20 species in Appendix I, for which international trade is only authorized in exceptional circumstances, and 59 in Appendix II, for which trade is subject to strict regulations).Overall, we found 77 instances of trade from Appendix I species, and 469 from Appendix II species.From Appendix I listed species, the species with most trade involved were the saltwater crocodile, the Palawan peacock-pheasant (Polyplectron napoleonis), the red-vented cockatoo (Cacatua haematuropygia), and the We found most trade occurred on webpages indexed in Google (92.8% out of the total trade posts) and on Twitter (5.9%).Trade posts featured mostly orchids (29.4%) and other plant species (55%), and the most trade items were live specimens (77.4%), followed by seeds, spores, and eggs (12.3%).Other derived products included essential oils, medicinal herbs, jewelry beads, wood chips, and more.Moreover, trade advertisements were mainly in English and Spanish, followed by Malaysian and Indonesian (Figure 5).

DISCUSSION
In this study, we performed a wide-scale automated search of wildlife trade across taxa, platforms, and languages for a focal list of species.Overall, we extracted data from 1.47 million posts across five digital platforms and detected approximately 5000 instances of trade from 108 species, mainly on webpages indexed by Google, and in the social media platform Twitter.We found evidence of online trade for 69% of the searched species, highlighting the broad nature of across taxonomic groups.Plant species were the most represented taxa, accounting for over 4000 of the posts, with the remaining posts referring to reptiles, birds, and mammals.Importantly, we found that manual validation is essential for obtaining an accurate representation of the trade.From all extracted posts, 80% represented animal species and 20% plant species, and representation remained similar after applying a filtering algorithm (76% animals and 24% plants).However, the proportions reversed after manual validation, with plants representing 84% of relevant posts and animals 16%.For relatively small datasets, manually validating the data is possible.For instance, data collected from Twitter included ∼800,000 posts, but after estimating the relevance scores and filtering only the potentially relevant posts, the dataset size decreased to 5000 posts, making manual verification feasible.However, these digital datasets tend to rapidly increase in size, for example, we obtained approximately 100,000 potentially relevant posts for Google, which makes manual validation F I G U R E 5 Relationship among species, digital platforms, product types, and languages.
resource-intensive and time-prohibitive.Enhancing and further automatizing the data filtering process is necessary to fully understand the degree of trade on digital platforms.Natural language processing methods, such as neural network classifiers, provide the opportunity to automatically filter text (Kulkarni & Di Minin, 2021).Machine learning vision models can also be trained to identify images pertaining to the wildlife trade (Cardoso et al., 2023;Kulkarni & Di Minin, 2023).However, both approaches require training examples to distinguish relevant from irrelevant information.
The characterization of online trade presents multiple challenges, as information provided by traders is limited.Even the basic information, such as the identification of the traded species, might be difficult to obtain accurately.For example, some common species names might group multiple species.In our study, we found that the most traded items were agarwood products (wood chips, powder, oil, etc.), and while this term most commonly refers to A. malaccensis, it can also refer to other species of the same genus.Moreover, other key information, such as the number of specimens or items available, their price, or their origin was missing from the original post as well.On the other hand, in plants, especially orchids, species can be cross-bred leading to hybrid specimens, and while we found multiple instances where this was reported, in other cases, it was not.Another important factor essential to evaluate the impact of wildlife trade on biodiversity is the origin of the traded item, whether it is a product of captive breeding or cultivation, or it has been collected from the wild, since trade of individuals originated from captive sources might be legal.None of the posts that were manually validated explicitly stated the origin of the specimen.However, a study that examined trade occurring within orchid-themed interest groups from a social media website found that 22%-46% of trade posts featured wild-collected plants (Hinsley et al., 2016).
This array of challenges complicates teasing apart legal from illegal instances of trade.This is especially so because trade legislation for each species tends to be countryspecific, and online trade blurs geographic boundaries.In our study, contacting traders to request further information would be necessary to further delve into the legality of trade for each advertisement.However, this approach could raise ethical concerns and would demand a substantial investment of time and resources.Nevertheless, our results indicate that illegal trade might be occurring, as we found instances of trade for 20 species included in the CITES Appendix I, for which international trade is prohibited or extremely restricted.We found particularly concerning sales, such as advertisements of Critically Endangered red-vented cockatoos with only 430-750 individuals remaining in the wild (BirdLife International, 2017); the Philippine forest turtle, one of the world's 25 most endangered tortoises, only found in the Palawan group of islands and considered a collector item (Krishnasamy & Zavagli, 2020); and the Endangered orchid Low's Paphiopedilum whose population is severely fragmented (Rankou, 2015).We also found instances of trade for Alocasia sanderiana, Amesiella monticola, and Ceratocentron fesselii, all Critically Endangered plant species.Tackling illegal wildlife trade is complex and requires multiple and complementary responses, such as the implementation of bans, quotas, protected areas, certifications, captive-breeding programs, and most importantly, education and awareness (Fukushima et al., 2021).In the Philippines, recommendations include updating legislation to add all CITES-listed species in national protection lists, and more effective law enforcement (Krishnasamy & Zavagli, 2020).
Digital media companies should expand automatic data extraction possibilities for academics and practitioners, to fully capture the extent of online wildlife Currently, data extraction for each search query is capped at a few hundred posts per request.Our study reveals the differences in data accessibility and accuracy between digital platforms.While 38.5% of the posts retrieved from Google contained the search string, on YouTube only 0.3% of the posts did.Moreover, the Terms of Service of other social media platforms, such as Facebook, precludes automated data retrieval, which further complicates gaining deep insights into online wildlife trade.However, through the Google Search Engine API, we were able to detect and extract data from multiple sites (e.g., eBay, Carousell, Amazon, Lazada, Shopee, and Facebook), highlighting the potential of search engine APIs, such as Google, Baidu, or others, to scan the Internet and extract data from multiple digital platforms without breaching their terms of use.
Upscaling these studies to a global level is essential to accurately understand the extent of digital trade across the globe.Doing so will require the increase of data extraction quotas from digital platforms, the development of reliable and automatic filtering algorithms, and the creation of large training datasets.We hope that this study paves a way for researchers to develop the use of automated data collection systems for the study of online wildlife trade.

A C K N O W L E D G M E N T S
The authors thank F. Ricciardi, M.J. Caleda, and C. Fischer, and the Asian Development Bank for assistance and funding during the project.A.S.-R. was supported by a Marie Skłodowska-Curie Actions Postdoctoral Fellowship (grant agreement 101022521).H.A., R.K., and E.D.M. thank the European Research Council (ERC) for funding under the European Union's Horizon 2020 research and innovation programme (grant agreement 802933).R.K. was supported by the KONE Foundation research grant (grant 202103830).R.A.C. acknowledges personal funding from the Academy of Finland (grant 348352) and from the KONE Foundation (grant 202101976).Aggregated data will be made available in an open-access repository upon publication of the manuscript.ERC-2018-STG-802933 -WILDTRADE has been reviewed by an ethics panel composed of independent experts who approved its ethics compliance.

F
Proportion of posts, potentially relevant posts, and manually validated posts in each digital platform.Each grid cell represents 18,378 posts in the left panel, 1324 posts in the middle panel, and 61posts in the right panel.F I G U R E 3 Proportion and total number of posts, potentially relevant posts, and manually validated posts in each digital platform and for each group of species.top five being: the hawksbill turtle (Eretmochelys imbricata, n = 80,407), the green turtle (Chelonia mydas, n = 59,461), the Philippine tarsier (C.syrichta, n = 58,723), the leatherback turtle (Dermochelys coriacea, n = 56,955), and the Philippine duck (Anas luzonica, n = 52,715).

F
I G U R E 4 (a) Number of posts and potentially relevant posts, and (b) number of trade advertisements for each searched species across all platforms.1. Aquilaria malaccensis (listed as Critically Endangered in the International Union for Conservation of Nature, IUCN, Red List and in Appendix II of the Convention on International Trade in Endangered Species of Wild Fauna and Flora, CITES); 2. Platycerium coronarium; 3. Alocasia zebrina; 4. Hoya wayetii; 5. Nepenthes ventricose (listed as Least Concern by the IUCN Red List and in Appendix II of CITES).All images are under CC BY-SA license and were obtained from Wikipedia; image 1 is attributed to Hafizmuar, 2 to Bernard Dupont, 3 to David J. Stang, 4 to Altocumuli, and 5 to Alastair Robinson.Note that the x-axes of the plots (a) and (b) are at different scales.orchids Low's paphiopedilum (Paphiopedilum lowii) and Philippine paphiopedilum (Paphiopedilum philippinense).