Sentiment analysis of the Twitter response to Netflix's Our Planet documentary

The role of nature documentaries in shaping public attitudes and behavior toward conservation and wildlife issues is unclear. We analyzed the emotional content of over 2 million tweets related to Our Planet, a major nature documentary released on Netflix, with dictionary and rule‐based automatic sentiment analysis. We also compared the sentiment associated with species mentioned in Our Planet and a set of control species with similar features but not mentioned in the documentary. Tweets were largely negative in sentiment at the time of release of the series. This effect was primarily linked to the highly skewed distributions of retweets and, in particular, to a single negatively valenced and massively retweeted tweet (>150,000 retweets). Species mentioned in Our Planet were associated with more negative sentiment than the control species, and this effect coincided with a short period following the airing of the series. Our results are consistent with a general negativity bias in cultural transmission and document the difficulty of evoking positive sentiment, on social media and elsewhere, in response to environmental problems.


INTRODUCTION
Public perception and public opinion play important roles in wildlife conservation. Public pressure on politicians can instigate policy change (Phillis et al., 2013), and consumer choice can favor environmentally friendly products or services . In turn, public perception and public opinion may be shaped by the media portrayal of threats to species and the environment, particularly among urbanites with little direct access to nature (Aitchison et al., 2021;Dunn et al., 2020;Fernández-Bellon & Kane, 2020;Nolan, 2010;Silk et al., 2018), an idea formalized in conservation science as the "extinction of experience" (Gaston & Soga, 2020;Soga & Gaston, 2016) Supporting Information.
However, the role of traditional media, such as television documentaries, in shaping people's perception, opinion, and behavior is far from clear (Aitchison et al., 2021;Jones et al., 2019). For example, there is little evidence for popularly assumed effects, such as the reduction in plastic straw use in response to the television documentary Blue Planet 2 (Dunn et al., 2020). The Al Gore documentary film An Inconvenient Truth increased knowledge of global warming and intention to take action, but this intention did not reliably translate into action 1 month later (Nolan, 2010). Television documentaries and films featuring wildlife may even undermine conservation messages, such as by portraying some species as dangerous (e.g., sharks) or abundant (e.g., wildebeest) and failing to show any human impact on the species and their habitats (Aitchison et al., 2021;Bradshaw et al., 2007). Furthermore, the reach of traditional media is often limited to specific audiences, not all of which may be able to enact the relevant change (Wright, 2010).
New broadcast media, such as subscription services Netflix, Amazon Prime, and Disney+, might overcome some of these limitations. They are multinational, allowing simultaneous broadcast in multiple countries. Given their subscription model, they are under less commercial pressure to sensationalize content to maximize viewership of specific programs. They are also less restricted by impartiality rules compared with traditional broadcasters, such as the BBC. This greater freedom from ratings chasing and impartiality restrictions could lead to more accurate portrayals of the negative human impact on wildlife and the environment (Aitchison et al., 2021).
Another recent development is the use of social media in conservation campaigns (Kidd et al., 2018;Wu et al., 2018). Traditional media is one-way, broadcasting to a passive audience. Social media allows the audience to provide immediate feedback to program makers, to share salient content (e.g., film clips), and to discuss issues raised by the program among themselves. This interactivity might increase engagement and more effectively shape viewers' opinions and behavior. Social media can also be an effective method to measure public responses to nature documentaries, wildlife campaigns, and environmental issues in general (Burivalova et al., 2018;Kidd et al., 2018;Nanni et al., 2020), albeit with limitations, such as disparities in internet access or documented gaps between online and offline behavior (Wright et al., 2020). More broadly, the emerging field of conservation culturomics (Correia et al., 2021;Ladle et al., 2016) uses quantitative analyses of digital texts, including social media, to assess public interest in conservation issues (Di Minin et al., 2015;Toivonen et al., 2019). Such methods have been applied specifically to the effects of nature documentaries, such as the BBC's Planet Earth 2 (Fernández-Bellon & Kane, 2020).
We examined the social media response to Netflix's 2019 documentary series Our Planet, produced by Silverback Films in collaboration with the World Wildlife Fund. Historically, nature documentaries fall into one of 2 categories: hard-hitting documentaries with explicit environmental messages that typically reach a small audience (e.g., An Inconvenient Truth) or mass audience documentaries with little or no environmental message (e.g., Blue Planet). Our Planet aimed to bridge this gap by being a mass audience documentary with explicit environmental messaging throughout. This included explicit portrayal of the impact of humans on the environment, such as the detrimental effect of climate change on species' habitats, and calls to action, providing the public with constructive ways to change their behavior to aid conservation.
Our Planet was narrated by Sir David Attenborough and supported by extensive and carefully planned Twitter and other social media campaigns. The première was held in the Natural History Museum in London and was attended by public figures such as Prince (now King) Charles and Prince William and the ex-footballer David Beckham. The documentary was accompanied by online material specifically dedicated to conservation issues, with pages called "What Can I Do?" and "Take Action" and several additional short movies intended to raise conservation awareness. By March 2021, Netflix reported that 100 million viewers had watched the series (Moore, 2021).
Silverback Film producers provided us with information on the documentary before broadcast (e.g., topics and species featured in each episode) and were interested in gauging reaction to the series on social media. We applied sentiment analysis to a large data set of tweets related to Our Planet to test whether viewers responded with positive or negative sentiment and whether observed change lasted beyond the immediate release of the documentary. Sentiment analysis uses a dictionary of words and symbols, such as emoticons that have positive (e.g., love, good, happy, smiling face emoticon) or negative (e.g., angry, frustrated, sad, frowning face emoticon) valence to automatically score each tweet from −1 to +1 (−1, completely negative; +1, completely positive) (see "METHODS").
We made no specific prediction regarding whether sentiment would be positive or negative. On the one hand, results of several previous studies shown preference for negative sentiment in social media sharing (Bellovary et al., 2021;Schöne et al., 2021) and fake news (Acerbi, 2019a), and results of lab experiments shown people preferentially acquire and transmit negative information from and to others (Bebbington et al., 2017). Moreover, Our Planet contained explicitly negative content designed to elicit shock and anger. On the other hand, Our Planet also aimed to elicit positive emotions, such as awe for the natural world as do other mass audience documentaries, without an explicit environmental message.
We collected tweets that included the #ourplanet hashtag. However, a limitation of only examining tweets that explicitly mention Our Planet was that we did not have a baseline or comparison group. Perhaps all tweets, or all animal-related tweets, happened to become more positive in sentiment during this period, and the release of Our Planet was entirely incidental. To address this limitation, we also compared 3 sets of tweets from the same period: tweets mentioning control species not featured in Our Planet (e.g., porpoise), but, where possible, that matched various characteristics with species that were featured in Our Planet (e.g., dolphin); tweets mentioning species featured in Our Planet but that did not include the #ourplanet hashtag and were likely unrelated to the Netflix show; and tweets mentioning species featured in Our Planet that also included the #ourplanet hashtag. Only the last group should show the effect of Our Planet on tweet sentiment; the first 2 should show the sentiment of tweets covering similar topics (animals, conservation).
We predicted the following: sentiment of tweets containing the #ourplanet hashtag becomes more extreme (more positive or more negative) after the release of the series (H1); sentiment of tweets that feature species mentioned in Our Planet and contain the #ourplanet hashtag is more extreme than the sentiment of tweets that feature control species not featured in Our Planet and of the sentiment of tweets mentioning Our Planet species that do not contain the #ourplanet hashtag (H2); both these effects last beyond the immediate release date of Our Planet (H3).

Data overview
All 8 episodes of Our Planet were released simultaneously on Netflix on 5 April 2019. Automated tweet collection lasted 9 weeks from 15 March 2019 to 17 May 2019. This allowed us to divide the data into 3 consecutive periods of 3 weeks each: prerelease, release, and postrelease. Ethical approval for data collection was obtained beforehand from the University of Exeter College of Life and Environmental Sciences Penryn Research Ethics Committee (application eCORN001657, 13/12/2018). All tweets are publicly available and no personal information was collected beyond twitter username (which is often anonymous). We collected tweets in real time, with the official Twitter API through the R library rtweet (Kearney, 2019), that contained the character string "Our Planet," (case insensitive); #ourplanet (case insensitive); names of 9 species prominently featured in Our Planet (dolphin, flamingo, wild dog, caribou, wolf, polar bear, wildebeest, elephant seal, and walrus); and names of 9 control species (porpoise, macaw, dingo, mule deer, coyote, panda, waterbuck, snow leopard, and lynx).
The 9 species featured in Our Planet were chosen in advance of data collection following discussion with producers from Silverback Films. The 9 control species were chosen to represent species not appearing prominently in Our Planet. Where possible, we chose control species that had characteristics broadly similar to species featured in Our Planet (e.g., polar bear similar to panda, wild dog similar to dingo), although this was not possible in 2 cases (see Appendix S1 for full matches and explanations). Mentions of species were detected in the collected tweets by searching for the common name character string and slight variations of this string (e.g., plural forms).
After filtering out non-English tweets, we had a data set of 3,504,254 tweets, including retweets of the same tweets. For each tweet, we collected the full text, date and time it was created, number of followers of the author of the tweet, and, for retweets, number of times the original tweet was retweeted at the time of collection.
The full data set contained all mentions of the words Our Planet, case insensitive variants thereof, and #ourplanet. However, upon inspection of the tweet content it was apparent that many mentions of Our Planet did not refer to the Netflix documentary. We therefore narrowed the data to just the #ourplanet, which excluded irrelevant tweets.
The data set we analyzed was composed of 224,895 tweets with #ourplanet or case insensitive variants thereof (e.g., #Our-Planet or #ourplanet); 1,158,704 tweets mentioning a species featured in Our Planet; and 934,435 tweets mentioning a control species not featured in Our Planet (total 2,137,635). The sum of these 3 samples does not equal the total sample size because these categories were not mutually exclusive (e.g., 169,240 tweets containing the #ourplanet also contained Our Planet species). We obtained 573,820 unique tweets by removing all retweets of the same tweet.
We used R package vader (Roehrick, 2020) to perform a sentiment analysis of the tweets. Vader, short for Valence Aware Dictionary and sEntiment Reasoner, was chosen because it is especially suited for analyzing short social media texts and performs well when analyzing emoticons (such that emoticons contribute to the final sentiment score), slang and acronyms, and punctuation and capitalizations typical of social media posts (Hutto & Gilbert, 2014). We used the Vader compound score, which sums, for each tweet, the valence of each word and provides a normalized score from −1 (extreme negative) to +1 (extreme positive). For examples of tweets classified as positive and negative, see Appendices S2 and S3. We excluded 635 tweets that could not be processed in the sentiment analysis.

Distribution of tweets
The pattern of retweets was highly skewed. For all tweets (containing #ourplanet or any of the Our Planet or control species or both), the most retweeted tweet was retweeted 157,068 times (tweet text "rt XXXX: seal accidentally scares baby polar bear" categorized as containing an Our Planet species [polar bear] but not containing the #ourplanet hashtag; sentiment score = −0.59; tweeter and retweeter usernames here and in our full data set were anonymized) and the second-most retweeted tweet was retweeted 155,062 times (tweet text "the sad reality of climate change. the walrus with no ice or place to go. #walrus #ourplanet #climatechange #climate" categorized as containing an Our Planet species [walrus] and also containing #ourplanet; sentiment score = −0.65). These 2 tweets combined make up 14.6% of the data and were each retweeted more than twice as many times as the third-most retweeted tweet.
For those tweets that contained #ourplanet, the skew was much higher: the most retweeted tweet was retweeted 155,062 times (the second-most retweeted in the full data set, see above), or was 68.9% of the data. The next-most retweeted tweet was retweeted 2370 times. Figure 1 shows this skew for both all tweets and hashtag tweets.
This skewed distribution means that any analysis will be skewed by the small number of highly retweeted tweets. Consequently, we ran analyses on both the full data set, including retweets, and the unique tweet data set, excluding retweets.

Analyses
We first checked the overall sentiment of the data, presenting basic descriptive statistics (mean, median, and standard deviation) of the Vader compound score. We used intercept-only Bayesian regression models to detect deviations of the out-comes from zero (neutral sentiment). Following the model equation format of McElreath (2020b), the intercept-only regression model was where S i is the sentiment score of tweet i, and and are the mean and standard deviation of the sentiment scores, respectively, which have normally and exponentially distributed priors, respectively.
To test H1 and H3, we ran Bayesian regression models with time as a predictor and emotion score as the outcome for tweets containing #ourplanet. We analyzed time in 2 ways, discrete and continuous. For the discrete time analysis, we divided the data set into 3 consecutive periods of 3 weeks each: prerelease, release, and postrelease. This was used as an index variable in a linear Bayesian regression model with normally distributed priors (McElreath, 2020b). For the continuous time measure, we used days since data collection began (15 March 2019) scaled to start at zero. This was used as a continuous predictor in a Bayesian regression model and compared linear, quadratic, and cubic models with the Watanabe-Akaike information criterion (WAIC) (McElreath, 2020b). The discrete time model was where time[i] is an index variable specifying in which of the 3 periods tweet i was tweeted. The continuous time model was where T i is the continuous time measure for tweet i.
To test H2 and H3, we used the same discrete and continuous time measures, but we considered 3 different data sets: tweets that feature species mentioned in Our Planet; tweets that feature control species not featured in Our Planet; and tweets mentioning Our Planet species that do not contain #ourplanet. As above, we used a Bayesian regression model with time (discrete or continuous) as a predictor and emotion score as the outcome.
As an unplanned extension of our main analysis, we tested the general effect of sentiment on retweet probability. A Poisson regression model was run with unique tweets as data points, the count of the number of retweets for that tweet as the outcome measure, and Vader compound score and user follow count as predictors. This model was where R i is retweet count, F i is follower count, and S i is the sentiment score for tweet i. All analyses were run with the rethinking package 2.13 (McElreath, 2020a) and cmdstanr (Gabry &Češnovar, 2022) in R 4.1.3 (R Core Team, 2022). We report 89% confidence intervals and compared models with WAIC rather than reporting p values (McElreath, 2020b). The data (with twitter usernames removed or anonymized) and analysis code are available from https://osf.io/rv8ek/. . The full data including retweets are heavily influenced by the most retweeted tweet with an emotion score of −0.65, which can be seen in Figure 2. The data including only unique tweets show a hump at zero (neutral sentiment), a small hump around −0.5 (negative sentiment), and a larger hump around +0.6 (positive sentiment).

Sentiment over time
The discrete time analysis showed that at prerelease, sentiment was largely positive. At release, sentiment became strongly negative, although this was skewed by the highly retweeted outlier tweet with a sentiment score of −0.65, and at postrelease, sentiment became slightly positive but not as positive as prerelease (Figure 3a).
For unique tweets excluding retweets, the pattern was similar but less extreme: prerelease tweets were slightly positive; at release, tweets became less positive (but not negative); and postrelease tweets were more positive than at release (Figure 3b). Regression analyses supported these patterns for all tweets (Appendix S4) and unique tweets (Appendix S5). In both cases, prerelease was most positive, release was most negative, and postrelease was more positive than release.
For the continuous time analysis, model comparison showed that the cubic model best fit the data for both all hashtag tweets and unique hashtag tweets (Figure 3c,d). These confirm the positive sentiment at the start of the time period, the increasingly negative sentiment reaching a minimum after release, and the less negative sentiment at the end of the period. The relationship for all tweets (Figure 3c) was more extreme than that for the unique tweets ( Figure 3d) due to the highly retweeted outlier in the former data set.

Species comparison
For all tweets including retweets, control species showed little change in sentiment over time, if anything becoming marginally more positive around the release of Our Planet (Figure 4a,d). Our Planet species with no #ourplanet became negative around the time of release and then marginally positive after release (Figure 4b,e). Our Planet species with #ourplanet showed a more extreme pattern of becoming strongly negative at release (Figure 4c,f). This is likely due to the highly retweeted outlier with an emotion score of −0.65. Unlike all hashtag tweets shown in Figure 3b, this negativity remained after release; albeit, it was slightly more positive than at release.
Perhaps a more accurate picture unaffected by the outlier can be seen for unique tweets excluding retweets ( Figure 5). For discrete and continuous time control, species showed no effect of the Our Planet release date on sentiment, as we expected (Figure 5a,d). Tweets were consistently neutral or very slightly positive. Our Planet species without #ourplanet showed a similar pattern but with a slight decrease in sentiment at release (Figure 5b,e). This may be due to tweets about Our Planet species that referred to the documentary without using #ourplanet. Our Planet species with #ourplanet, however, showed a marked decline at release and became clearly negative overall (Figure 5c,f). As for all tweets (Figure 4c), this negativity persisted to the postrelease period, becoming only slightly more positive than at release. The patterns shown in Figures 4 and 5 were confirmed by Bayesian regression models as shown in (Appendices S6 & S7, respectively).

Retweets
A further unplanned analysis was conducted on retweet count. Retweet count is a measure of tweet popularity or a measure of the extent to which people wish to transmit a tweet to others. The full model (Appendix S8) with both tweeter follower count and tweet emotion score fit the data better than models with just 1 or neither predictor. Follower count had a reliably positive effect on retweet count ( follower = 1.26, 89% CI[1.26 to 1.26]).  As one would expect, tweets from users with more followers were retweeted more. Emotion had a negative effect, with more negative sentiment tweets getting retweeted more, consistent with the analyses above ( emotion = −1.34, 89% CI[−1.35 to −1.34]).
Further analyses, however, showed that the effect of emotion was driven by the highly retweeted outlier (Figure 1b). Removing the most retweeted tweet resulted in a small positive effect of emotion (β emotion = 0.06, 89% CI[0.05 to 0.06]). The effect of follower count remained positive and larger than emotion (β follower = 0.75, 89% CI[0.75 to 0.76]). This indicated that any effect of emotion on retweet count was largely driven by the outlier (Figure 1b).

Discussion
Netflix's Our Planet was one of the first wildlife documentary series produced by an international subscription-service rather than a traditional television broadcaster. The producers, Silverback Films, in conjunction with the World Wildlife Fund, aimed to bridge the gap between mass audience but environmentally neutral natural history documentaries and limited audience films with explicit and hard-hitting environmental messaging.
Whether tweets associated with Our Planet were positive or negative in sentiment differed depending on the type of tweet data used (Figure 2). All tweets including retweets were clearly negative. However, this was driven by a massively retweeted negative outlier. Removing retweets and only considering unique tweets, sentiment was marginally positive.
Over time, however, both all tweets and unique tweets increased in negativity during the Our Planet release period, compared with before and after release (Figure 3). This supports our first prediction (H1) that tweets associated with Our Planet become more extreme in their sentiment following the release of the series. Furthermore, tweets containing both species featured in Our Planet and #ourplanet showed clear negative sentiment at the time of release, declining from positive sentiment prerelease (Figures 4 & 5). Control species not featured in Our Planet showed no change over time, suggesting this increase in negativity was not a general change in sentiment during this period or caused by some other wildlife-or conservation-related event.
Our third prediction (H3) that these effects are long lasting was not well supported. The discrete time analyses showed that by the third 3-week period, sentiment was already returning to its more positive prerelease levels. The continuous time analyses typically showed a U-shaped relationship between sentiment and time, with the minimum sentiment just after release returning to positive at the end of the recording period.
Overall, therefore, we conclude that the release of Our Planet coincided with more negative sentiment tweets. This is clear when comparing species mentioned in the series with control species. For the overall sentiment of tweets with #ourplanet, it depended on the analytical choice: all tweets with the single heavily retweeted negative tweets, which was negative, or only unique tweets, which was slightly positive. A relevant feature of our data was extremely high skew due to a single massively retweeted tweet (Figure 1). In the data set containing only the tweets with #ourplanet, retweets of this tweet accounted for 68.9% of all tweets. Because this outlier tweet was strongly negative with an emotion score of −0.65, this skewed the results toward negative sentiment. Given that the distribution in Figure 1 is likely to be typical of many social-media-generated big data sets like ours, this is a note of caution for analyses of big data. We therefore repeated all analyses with unique tweets excluding retweets. This yielded some differences, for example, the unique tweets had slightly positive sentiment following release compared with the full data set (Figure 2). However, the general trend of becoming more negative at release was found for both the full data set and unique tweets.
There is no straightforward way to decide which of these data sets is best to use. Conceptually, from a cultural evolution perspective (Acerbi, 2019b;Mesoudi, 2011), the unique tweets data can perhaps be seen as a measure of cultural innovation, with each unique tweet representing novel, newly created information. The full data set incorporating retweets, meanwhile, additionally contains information about cultural transmission, assuming that retweeting can be seen as a form of transmission to others ("choose-to-transmit" in the terminology of cultural evolution [Eriksson & Coultas, 2014]). If fitness is a measure of replication success, then the latter might be seen as a more appropriate measure of cultural fitness. It may not be a coincidence therefore that the massively retweeted tweet was strongly negative in sentiment, if a negativity bias exists in human cultural evolution (see below). However, a tweet that has been retweeted also becomes more available and so more likely to be observed and retweeted further, in an example of an informational cascade (Bikhchandani et al., 1992). This effect may also have been enhanced by the Twitter algorithm producing users' timelines. Our retweet analysis showed that when this outlier was removed, on average more positive tweets were retweeted more. Whether excluding this outlier was justified is, however, debatable. It is an outlier in the statistical sense, but it is valid information that so many people chose to retweet this (negative) tweet in particular.
Our study has several limitations common to analyses of social media big data. First, the Twitter sample was biased in characteristics, such as age and socioeconomic status. Twitter users are younger and more educated compared with the general population (Sloan et al., 2015). We also restricted our sample to English-language tweets, so our results are specific to English speakers and English-language countries. Second, outputs of the Twitter API do not represent an unbiased reflection of activity on social media (Correia et al., 2021), and the exact biases are unknown. The timeline algorithm used by Twitter is also unknown and likely to influence the results. Third, sentiment analysis is a crude tool. On the aggregate, sentiment analysis produces reliable results, but it is especially challenging for short texts like tweets, where sentiment must be inferred from just a few words and contextual effects can be more easily lost (Hutto & Gilbert, 2014). More importantly, Twitter activity may not accurately represent actual attitudes or predict behavior change. Similarly, one cannot determine whether negative sentiment, such as fear or anger, is being potentially used for positive or negative means. Anger at global inaction over climate change would be classed as negative with an automated sentiment analysis, but might be seen by some as an appropriate and positive response to a crisis in need of urgent action.
Overall, our findings fit with a general negativity bias previously demonstrated in human cultural transmission. Experiments shown that people preferentially acquire and transmit negative information from and to other people (Bebbington et al., 2017), and analyses of real-world data sets show trends toward more negative pop music (Brand et al., 2019) and literature (Morin & Acerbi, 2017). The same effect is present in online communication; negative information is disproportionally common in fake news (Acerbi, 2019a) and advantageously spreading on social media (Bellovary et al., 2021;Schöne et al., 2021). This negativity bias is thought to be due to the asymmetric costs of false positives and false negatives (Fessler et al., 2014): it is more costly to mistakenly ignore a negative stimulus, such as a predator, than to mistakenly ignore a positive stimulus, such as food. The former gets you eaten, the latter just hungry. Human cognition has therefore evolved to pay more attention to negative stimuli than positive stimuli (Baumeister et al., 2001).
Our results suggest that this general negativity bias need to be taken into account when planning environmental campaigns. Comparable studies show that social media interest toward iconic species, such as rhinoceros, although generally slightly positive in sentiment, is triggered by negatively valenced events (Fink et al., 2020). Framing messages positively could result in less engagement or in the target audience preferentially picking up the negative aspects. How then should one frame campaigns when the goal is to convey a positive message? Although generally robust, the effects of negativity bias are context specific. First, there is individual variability in the extent to which one preferentially attends to negative information, with some individuals more interested or attracted to negative information than others (Bachleda et al., 2020). Second, there is time variability. Even though medium-and long-term trends in sentiment of, for example, news stories tend to trend negatively, they are interspersed by cycles in which positive sentiment prevails (Leetaru, 2011;Rozado et al., 2022). This can be seen even at the level of single transmissions, where short bouts of positive news break up longer negative broadcasts. At a minimum, if negative sentiment is the norm, positive sentiment can represent a potentially attractive change of pace (Soroka & Krupnikov, 2021). Finally, the diversification of platforms can facilitate the creation of niches where more positive news stories are disseminated (e.g., Upworthy) and users can actively search for them (Soroka & Krupnikov, 2021). In sum, the existence of a negative bias does not imply that spreading positively valenced information is always more difficult. A better understanding of this psychological and cultural process could help in planning successful campaigns and may allow it to be used to conservationists' advantage, rather than working against it.