Harnessing artificial intelligence technology and social media data to support Cultural Ecosystem Service assessments

1. Cultural Ecosystem Services (CESs), such as aesthetic and recreational enjoyment, as well as sense of place and cultural heritage, play an outstanding role in the contribution of landscapes to human well-being. 2. Scientists, however, still often struggle to understand how landscape characteristics contribute to deliver these intangible benefits, largely because it is hard to navigate how people value nature, and because there is a lack in methods that accommodate both comprehensive and time-efficient evaluations. 3. Recent advances in technology and the proliferation of new data sources, such as social media data, open promising alternatives to traditional, resource-intensive methods, facilitating


| INTRODUC TI ON
Today, we know more than ever about human activities because social media has revolutionized our everyday communication. Social media comes in many forms, including blogs, chat apps, photographsharing portals and social networks. We are used to sharing information about the people we meet, the places we visit, the products we like or the emotions we feel (Welbers & Opgenhaffen, 2018). The impact of social media is immense: in 2018 the number of active users was estimated to be some 3 billion people world-wide (Kemp, 2018), around 40% of Earth's population. Big online players like Google, Amazon or Facebook have already identified this power and are increasingly collecting the details of our everyday life to connect publicly shared information with their specific needs (Esteve, 2017). Yet, there is much to be gained from systematically using this powerful data source to better understand the multiple characteristics of our natural environment, how we relate to it and how it contributes to our health and well-being (Joppa, 2017;Pascual et al., 2017).
The way people value nature and the importance that people place on the environment largely results from diverse disciplinary, theoretical, sociocultural and political contexts and has been identified as a crucial dimension of sustainable environmental management and development (Arias-Arévalo et al., 2017;Brondizio et al., 2009;Ostrom, 2009). Personal choices often do not only depend on the inherent worth of things or the way they satisfy certain preferences (intrinsic or instrumental values respectively), but are also tied to how people relate with nature and with others for a good quality of life (relational values; Chan et al., 2012Chan et al., , 2016Tallis & Lubchenco, 2014).
Over recent years, conceptual advances have been made in how these social values are addressed and managed, mainly using the notions of ecosystem services (ESs) and nature's contribution to people (NCP) to refer to the various benefits that nature provides to us all (Daily, 1997;Díaz et al., 2018). Lately, particular attention has been given to the assessment of cultural ecosystem services (CESs) or non-material NCP, that relate to the intangible, life fulfilling functions that ecosystems and nature provide to people (Daniel et al., 2012;Schröter et al., 2020). A beautiful landscape, for example, inspires us, lets us relax or contribute to define our identities and the way we relate with nature. CESs are often not objective but are tied to the cultural context and to how a visual-sensory landscape is perceived (Van Berkel et al., 2018). As such, CESs are currently at the core of a lively discussion within the socio-ecological research community on value pluralism and holistic nature valuation (Díaz et al., 2018;Kadykalo et al., 2019;Peterson et al., 2018).
While CESs are hard to navigate, addressing them is necessary to achieve sustainable and participative decision-making (Daniel et al., 2012) and to understand how management actions might affect the delivery of benefits and values . Different quantification methods have been developed to try to assess different sets of CESs, using quantitative or qualitative data, spatial and deliberative approaches, and stated or revealed preferences, such as interviews and surveys with interested parties Hirons et al., 2016;Komossa et al., 2020). Assessing CESs is, however, by no means a trivial task, especially given that the plurality of values is not well-capturable by bio-physical methodologies alone, and that collecting data that fully reflect multiple perspectives is tricky and time-consuming (Kenter et al., 2015). Moreover, CESs are frequently not consumed or used in the locations where they are generated, thus calling for approaches that distinguish between service providing and consuming units, referred to in the literature as ES supply and ES flow respectively (Burkhard et al., 2012;Egarter Vigl et al., 2017;Villamagna et al., 2013).
Studies exploring social media as a new platform for addressing values pertaining to CESs have experienced a global increase, mostly relying on location-based photograph content sharing portals such as Flickr and Panoramio, or on social networks like Instagram, Twitter or Weibo (Casalegno et al., 2013;Levin et al., 2017;Tenerelli et al., 2016). While social media do not directly grasp the plurality of users' values and are often limited to a single or a few CES categories, they have allowed access to a great quantity of data that reveal human preferences from a variety of locations (Muñoz et al., 2020).
With the advent of application programming interfaces (APIs), these data sources became systematically available, and novel ways to estimate CES provision emerged, mainly using the photographs' spatial distribution and densities as proxies to attribute landscape qualities to specific locations (Mancini et al., 2018;Richards & Friess, 2015;Zanten et al., 2016). Keeler et al. (2015), for example, showed how to use geolocated photographs to understand how socio-environmental conditions may account for different visitation rates to lakes.
Only recently, studies have also proposed to integrate the analysis of social media content into CES assessment to gain new insights in the 6. We conclude that online available AI technology and social media data can effectively be used to support rapid, flexible and transferrable CES assessments. Our work can provide a reference for innovative adaptive management approaches that can harness emerging technologies to gain insights into human-nature relationships and to sustainably manage our environment.

K E Y W O R D S
crowdsourced data, ecosystem services, mountain socio-ecological systems, text mining, image recognition preferences of users towards specific landscapes (Guerrero et al., 2016;Hausmann et al., 2020;Langemeyer et al., 2018). Such studies, however, were largely based on manual visual image content or sentiment analysis and were thus subject to annotator interpretations, relatively time-consuming and generally limited to a small data sample size (Muñoz et al., 2020;Oteros-Rozas et al., 2018;Willcock et al., 2018).
Over the course of just a few years, artificial intelligence (AI) has permeated many aspects of our lives. From major advancement in medicine to transforming social and business environments, this technology is being introduced to reduce human effort and to give accurate and fast results (Christin et al., 2019;Sun & Scanlon, 2019).
Moreover, new user-friendly online applications have facilitated access to and use of these promising computational methods by the broader public and different scientific disciplines . Within the broader AI family, the field of deep learning is highly successful in classifying images, identifying objects and labelling them with natural language tags for further use (Gebru et al., 2017;Karasov et al., 2020;Norouzzadeh et al., 2018). Recently, both Richards and Tunçer (2018) and Lee et al. (2019) quantified recreational opportunities using online machine learning on publicly available images from social media channels. They presented a method that translates photograph image content into natural language tags and uses the embedded geographic data to automatically derive spatial patterns. Our work extends these studies by integrating them with two new elements: firstly, we analysed the tags generated by the image recognition algorithm for their semantic meaning using an innovative text mining algorithm based on the full 'knowledge base' of Wikipedia. Although semantic analyses on image content were also performed by Gosal et al. (2019), here we use a concept matrix which was created by processing the entire Wikipedia ontology. This allowed us to annotate the content of the photograph and to gain new insights into the photographers' perception of landscapes, scenes and patterns. Secondly, we used this information to train a topic model that automatically filtered only the CES-related content in images. This avoided labour and resource-intensive manual selection and classification, allowing us to analyse a large set of images, to differentiate single CES categories, and to create a robust spatial dataset for subsequent CES pattern prediction ( Figure 1).

F I G U R E 1
Conceptual framework for deriving CES hotspots from social media data. We (a) collected approximately ~32,000 images, (b) translated the content of these images into natural language for further analysis (producing ~640,000 tags), (c) validated tag quality based on a dissimilarity analysis, (d) automatically classified tags into four CES groups based on the semantic associations of tags using Wikipedia's knowledge, (e) performed an expert classification on a subset of the crowdsourced images (n = 150), (f) compared the automatic classification with visual expert classification and (g) geostatistically predicted area-wide CES distribution and hotspots using maximum entropy modelling. Grey boxes (a), (b), (d) and (g) represent the steps required for CES hotspot prediction. Boxes (c), (e) and (f) represent optional steps useful for model validation In this study, we used publicly available images from the photograph-sharing platform Flickr as data source to exemplify the applicability and performance of online AI-based systems to estimate baseline CES flows. In particular, we show (a) how a text mining algorithm can be used to semantically group artificially generated tags into distinct CES categories, (b) how the geographic information embedded in social media photographs can be used by Maximum Entropy modelling to identify variables that best explain users' preference patterns and (c) how AI and social media data can be combined and applied to support CES assessment and environmental management.

| Study area and CES selection
The research area comprises 95 municipalities located in northern Italy, including the UNESCO world heritage Dolomites, and extends over approximately 6,000 km 2 of land ( Figure S1). The mountainous region is characterized by a mix of traditional cultural Alpine landscapes, mainly dominated by grassland livestock farming systems and forests, and intensively visited year-round tourist destinations.
The growing number of tourists threatens many of the natural areas in the region, leading to increased pollution, potential habitat loss and increased pressure on endangered species (Morandini et al., 2015).
For this work, we selected four CESs that are important for the UNESCO world heritage status and that are sensitive to the different pressures exerted by human use in the area, namely aesthetic value, outdoor recreation, cultural heritage and symbolic species (Locatelli et al., 2017).

| Image crowdsourcing
We used Flickr data from 2005 to 2018 that were accessed and collected via the publicly available Flickr API. We chose this social media platform because, although Flickr shares some of the bias typical of social media, it has historically been less susceptible to changes in privacy and access, peak and trough of popularity or closure, compared to other types of social media platforms . Moreover, the extended temporal coverage (~15 years) makes these images more robust against environmental effects (i.e. weather and seasonal effects). For these reasons, Flickr has been one the most used platforms in nature-based tourism research and is thus suitable for case studies comparison (Mota & Pickering, 2020). All data were organized in a static URL table for further processing, including metadata information (i.e. time/ date information), and the locational information (i.e. coordinates) of each image. In total 106,190 images were downloaded, and, after data pre-processing (i.e. removing duplicates or not geolocated images) and selecting randomly only one photograph per user per day (PUD; Wood et al., 2013) to limit the bias of over prolific users, approximately 32,000 records remained for further use (see Supporting Information).

| Image recognition
Similar to Karasov et al. (2020) and Lee et al. (2019), we used the image annotation engine Clarifai to automatically analyse and translate image content into natural language tags. We applied their default pretrained general model (version 1.3), which is based on edge, curve and pattern recognition, to get a list of 20 tags for each image, along with a confidence score on a scale between 0 and 1. To estimate the sensitivity of the algorithm to changes in lighting, colour, weather and season within the same consistent photograph object, we applied a statistical similarity and dissimilarity analysis to tag probabilities using the Euclidean metric (Borcard et al., 2018). This compared each image's tag with all other images' tags and their respective likelihood score.

| Text mining and semantic CES grouping
We used a machine learning text analysis engine developed by Lexalytics (version 6.0.181) to semantically analyse and classify the tags generated by the image recognition algorithm. The text mining engine uses a concept matrix based on the contents of Wikipedia.
Each concept belongs to a greater topic and contains dozens of links and semantic associations to other articles related to the same topic.
For example, the ESs article on Wikipedia contains a link to an article on Recreation; this article in turn contains different links to articles such as Leisure, Outdoor recreation and Tourism that all discuss activities ascribable to human well-being. For our study, we first defined four new 'user concept topics', each representing one of our CES groups. Second, we assigned to each topic a definition syntax that we thought was best related to the CES categories of our study.
For example, for the aesthetic value CESs, we assigned keywords such as 'scenery', 'panorama' and 'view' (see Supporting Information for details). Thus, the machine learning engine determined the contribution of each single tag to each one of our user topics and assigned a confidence score that ranged between 0 (no confidence) and 1 (completely confident). The closer a tag is in the chain to the original topic, the stronger the association, and the more likely it is related to that topic. Four topic strength metrics were therefore generated for each image, indicating the classification confidence for each CES group based on the text mining algorithm. The image was classified according to the predominant CES.

| Expert validation
For assessing the accuracy of the automated CES classification, we extracted a random sample (n = 150) of the available images. Then
we validated the performance of the combined image recognition and semantic tag classification against a manual classification by a group of instructed experts (n = 9) living and working in the study region, where each expert was asked to annotate and group each of the images into one or more CES classes. We considered the majority vote among the experts (>5) of this classification as the ground truth for validation (Versi, 1992). Using a confusion matrix, we then compared the automated CES classification to the ground truth results, and computed the performance measures, accuracy, precision, recall and F-measure, which are commonly used in text mining and indicate the overall quality of the classification (Feldman & Sanger, 2006).

F I G U R E 2
Graphical representation of a dissimilarity matrix to measure the differences between image tags as produced by the Clarifai engine. For each image pair, the tags are compared, and the dissimilarity is calculated as a Euclidean distance (a). To visualize the Euclidian distance metric, we arranged the images according to their dissimilarity tag value, so that similar images are placed close together (red connection arrows) and different images are placed far apart (blue connection arrows) (b)

| Spatial analysis
To estimate the spatial distribution of our four target CESs, we relied on the Maxent model (Phillips et al., 2019), which uses the relationships between a set of environmental grids (predictor variables) and the occurrence of given observation points (presence data) to calculate the presence probability at unknown locations (Phillips & Dudík, 2008). Maxent uses presence data only, recognizing that absence data are frequently unavailable or difficult to define such as in the case of CES distribution modelling. CESs may occur although they have not yet been observed and hence these locations should not be considered as absences (Muñoz et al., 2020). As predictors we used both natural geographic variables (i.e. terrain ruggedness index and land cover data) and human variables (i.e. distance to paths, villages and points of interest). As presence data we used a sample of 1,000 image locations for each CES group and a 10-fold cross-validation for training and testing the model results. To measure the model performance, we analysed the receiver operating characteristic (ROC) curve and the area under the curve (AUC) value (Supporting Information). Finally, we applied the Getis-Ord Gi* statistic to explore the spatial pattern of each CES and identify their respective hot and coldspots. Given that CESs are frequently provided in bundles (Klain et al., 2014), we then proceeded to identify also statistically significant hot and coldspots of the aggregated CESs, weighting all four indicator maps equally (Getis & Ord, 1992).

| RE SULTS
The maximum number of photographs uploaded by an individual user was 2,202, whereas 1,834 users uploaded only a single photograph over the entire 15-year study period. The Clarifai software assigned a total of 3,678 unique tags to the ~32,000 images analysed. The most frequently assigned tags were 'no person' (28,526), followed by 'outdoors ' (26,728) and 'landscape' (26,050). The similarity analysis between tags of randomly selected images revealed that images with high inter-tag agreement also showed similar motives, although pictures were taken by different users, under different light/weather conditions and seasons. Combined with the semantic analyses of photograph tags based on the trained text mining algorithm, we identified key visual-sensory landscape attributes that we could link to our four CES groups with reasonable confidence (Figure 3). Most images were grouped into the aesthetic value concept topic (66%), followed by cultural heritage (13%), outdoor recreation (11%) and symbolic species (10%). A total of 3,321 images (9.4%) could not be assigned to any semantic concept due to low classification confidence scores (<0.4).
Analysing the correspondence between machine learning results and expert classification indicated that our approach to classifying CESs produced valid and reliable outputs, and that it was in high agreement with how the CESs in the images were perceived by the experts in the context of this study. Table 1 summarizes our validation, separately for each CES category. The numbers were derived from the evaluation of nine independent experts. Their Fleiss' Kappa value (indicating the inter-annotator agreement) was 0.44, which can be considered fair (Feldman & Sanger, 2006). The overall precision was 0.78, which was surprisingly good, although values ranged from 0.42 (for symbolic species) to 0.85 (for aesthetics value). Similarly, we achieved high values for recall (0.82) and F-measure (0.80). High statistical measure values in this context mean that images that were automatically classified to a specific CES group were actually related to this CES group.

F I G U R E 3
Examples of randomly selected photographs with a high inter-tag similarity metric along with part of the tag list produced by the Clarifai engine. A high topic strength metric indicates the classification confidence to a specific CES group based on the text mining algorithm. All images shown here were uploaded on the Flickr database under the creative common licence for further noncommercial use (CC NC) We estimated CES spatial distribution separately for each group using Maxent modelling. The probability of occurrence of a specific CES is based on the geolocated posts of Flickr and a set of natural geographic and human variables. The AUC was higher than 0.85 for all four models suggesting accurate model predictions (Swets, 1988). The variables that mostly contributed to predicting CESs were 'points of interest' and 'trail distance' for the aesthetic value and outdoor recreation ES, while, for the ES cultural heritage and symbolic species, a mixture of variables (i.e. terrain ruggedness, land cover and distance to settlements) contributed equally to model predictions (Supporting Information). Overall, our results mainly identified hotspots of CES flows in areas of lower altitude or with high tourism development, characterized by managed farmland, good road infrastructure and easy accessibility (Figure 3 and Supporting Information). In contrast, regions that were mainly characterized by large, protected areas, and influenced by forest regrowth or abandoned land as a result of rural depopulation, especially in the southern and eastern part of the study site, generally yielded significantly lower values and resulted in coldspots.

| D ISCUSS I ON
The quantification of CESs is one of the most delicate and complex assessments within the ES and NCP frameworks (Hirons et al., 2016). Grasping the subjectivity and the plurality of the be-

| Using AI for CES assessments
While AI is limited in which values it can grasp, big data and AI technology present many advantages and their rise in application has dramatically changed our ability to both study and conserve the characteristics of our natural environment (Sun & Scanlon, 2019).
The use of AI technology discloses new opportunities that allow obtaining not only information about users' movements and frequency hotspots, but also about the activities performed by these user groups. Image recognition and text mining algorithms allow for the systematic translation of high-volume and near real-time data into meaningful content. TA B L E 1 Statistical validation of expert valuation (n = 9), separately for each CES group and for the overall sample dataset Flickr images, making the value-identification procedure quite subjective and difficult to replicate. By using AI, we do not rely on the subjectivity of one or more people, but on a model, which can be used in multiple case studies and which is able to analyse a much higher number of images. The use of AI for processing social media data has the potential to mitigate the risk of researchers interpreting data in 'a one-directional way' or finding patterns that are in fact not there (Calcagni et al., 2019).
Applying these novel methods to complex contexts such as CES assessments, however, highlights the most crucial underlying bio-physical components of a given socio-ecological system contribute to deliver these values . When using AI for addressing CESs, an understanding of which is partly mediated by the cultural background of the users, models can be adapted in some parts to reflect the cultural context of the case study and tested to see whether they are aligned with it (Díaz et al., 2018). For example, in this study we provided a syntax definition of the CESs that we believed made sense for our cultural and local context, and tested the classification results against the views of local experts and stakeholders.

| Advantages of using a combined AI and social media approach
The example we presented here greatly benefitted from the com- The involvement of local stakeholders for the initial identification and prioritization of benefits and values, combined with the analysis of a great number of social media data using AI, make for a sound methodology which can be useful for managers and conservationist to support their decisions in front of different target groups (Wäldchen & Mäder, 2018).
Fourth, the objectivity and replicability of the methodology using online available engines can provide a rapid baseline CES assessment that can be repeated over time to monitor changes in how people experience and benefit from the environment. This method for example can be used to monitor changes in visitor numbers and activities over time, and to evaluate the consequences of management decision or policy changes for the adaptive management of natural capital (Hausmann et al., 2018). This method also allows researchers and practitioners to check whether hotspots of CES consumption overlap with the presence of species threatened by human activity. Moreover, identifying hotspots and coldspots of CES flow can guide the management of high tourist numbers, the development of new infrastructure or the identification of conservation intervention areas (Hausmann et al., 2018(Hausmann et al., , 2020Rossi et al., 2019).

| Limits to the application of this methodology
The This method therefore allows researchers to map and identify hotspots of CES consumption and does not necessarily identify sources of CES. This needs to be taken into consideration when interpreting our results.
Also, the costs that may arise from using online available AI products have to be kept in mind. Although these costs can easily hamper the applicability of the approach to very large datasets, most of the commercial AI suppliers offer extended academic licences for research purposes. Furthermore, as proprietary software does not always make available outdated versions, there is the risk that studies using older version cannot easily be replicated. The risk of updates significantly changing the core characteristics of the algorithms without communication is however usually low. Moreover, relying on commercial AI products such as those used in this paper allows benefiting from already established tools which are generally more user-friendly and accessible than open-source script packages (Nederbragt, 2014). The development of own models and workflows is indeed still not a trivial task, and requires advanced programing knowledge, training datasets and a considerable amount of time and computing power (Christin et al., 2019).

| Considerations on the use of AI and social media for socio-ecological research
AI allows using data derived from a high number of users, making the findings more robust. There are, however, many studies that question the representativeness of crowdsourced information and social media (Li et al., 2013;Liu et al., 2016). Although Flickr's popularity has been more consistent over time compared to that of other platforms (Pickering et al., 2020), it has been shown that the distribution of Flickr users is skewed over educational levels, wealth, geographical provenance, age and gender groups (Lee et al., 2019).
Flickr users might therefore not be representative of all the people benefiting from the CES of the study area. Moreover, the CESs captured by these users may not represent the full array of services provided by the specific landscapes but may depend on the users' personal interests and preferences (Van Berkel et al., 2018).
Interviews or focus groups with local experts and stakeholders can help verify whether the CESs identified using social media data mirror those perceived as important in the local context, and whether complementary methodologies are needed. To ensure that the data used are representative of all user groups, social media data can for example be integrated by complementary data sources, such as those deriving from participatory approaches, outdoor activity logging platforms or methods targeted to less prolific social media users (Muñoz et al., 2020).
Although it could help address social media's biased representation of society, taking in account and analysing the demographic profiles of social media users in social-ecological research is somewhat problematic from an ethical standpoint (Calcagni et al. 2019).
The progress in the use of social media, especially combined with AI technology, indeed raises ethical questions on how to use public information and emerging technologies (Liu et al., 2016). While it is clear that public data must not be used to compromise privacy, anonymity and trust expectations of individuals and user groups (Gebru et al., 2017), much work still has to be done to ensure that these guidelines are translated to practice. Given the exponential increase of studies using social media and AI (Ghermandi & Sinclair, 2019), there is also an urgent need to formulate best practice on the use of AI in environmental analysis and to advance in the public's understanding of the capabilities and limitations of these technologies, especially with regard to the generally low transparency of AI systems or the related risks of autonomous decision-making (Herweijer et al., 2018).

| CON CLUS ION
Our work demonstrates that innovative technologies such as AI systems in combination with social media data can be applied to pressing societal and ecological questions. Our study complements previous lines of research by providing a flexible, replicable and transferable approach that expands the ability of stakeholders to make sound land management decisions. The location-specific information provided by social media data is crucial in this context as it offers an improved spatial and temporal understanding of the relationship between people and nature. Future studies may provide new insights on how to further improve the performance and quality of AI-supported studies and be the basis for even more complex and integrative assessments. Both researchers and decision-makers may then benefit from the combined use of social media data and emerging technologies and contribute to change the way we understand and manage the environment.

ACK N OWLED G EM ENTS
This research was co-financed by the European Regional

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest. to the semantic analyses of image tags and supervised the expert validation process. All the authors discussed the results, implications and contributed critically to previous drafts of the manuscript and to the review process.

DATA AVA I L A B I L I T Y S TAT E M E N T
A static URL list of all considered Flickr photographs, a summary