Mike Thelwall is a member of the Statistical Cybermetrics Research Group at the University of Wolverhampton, U.K. He develops methods for extracting large-scale data from the Web to support social science research goals.
Address: School of Computing & Information Technology, University of Wolverhampton, Wulfruna Street, Wolverhampton WV1 1SB, UK
Statistical Cybermetrics Research Group University of Wolverhampton
David Stuart is a member of the Statistical Cybermetrics Research Group at the University of Wolverhampton, U.K. He is currently researching the ability of Web links to provide indicators of relationships between organizations but is broadly interested in all areas where information science meets the Web.
Address: School of Computing & Information Technology, University of Wolverhampton, Wulfruna Street, Wolverhampton WV1 1SB,
This article compares communication technologies within and across crises, using evidence from contemporary postings in 68,022 blogs and news feeds and using a semi-automatic method to detect words that increase in usage during a crisis. Three case studies from 2005 are used: the July 7 London attacks, the New Orleans hurricane, and the Pakistan-Kashmir earthquake. The results highlight the information provision importance for bloggers of Web 2.0 resources such as Wikinews, the Wikipedia, and the Flickr picture sharing site, although these still play a minor role in comparison to the mass media. Some personal communication methods were also mentioned significantly, including SMS and cellphones, but the newest technologies of those mentioned were all Web 2.0. The importance of electronic communication for bloggers was found to depend on the nature of the crisis: For example, despite the heavy Pakistan-Kashmir earthquake death toll, there was relatively little interest in related communication issues from English language bloggers and news sources.
When news of a significant crisis such as the Asian Tsunami of 2004 emerges, an immense communication need is created as people investigate whether family and loved ones are safe, if there is any further risk, and whether they need to take any action (e.g., BBC News, 2005c). In addition, there is often a high level of public interest in the emotional trauma (Seaton, 2005), the extent of the crisis, future prospects, and checking that responsible parties are performing their roles properly (e.g., government aid, rescue services). Crises may thus precipitate a surge in personal communication as well as heightened consumption of mass media sources ranging from radio and TV to news websites and blogs. This phenomenon has been described as “global crisis communication” (Bucher, 2002), to distinguish it from management crisis communication, which concerns crises within corporations.
Although previous crises have highlighted new communications technologies, no research has developed a method to identify systematically which technologies are used or to compare their use across different crises. These tasks are important for two reasons. First, communication is critical during a crisis, and more knowledge may help future planning (e.g., arranging emergency bandwidth allocation). Second, crises may precipitate or hasten the adoption of a new technology. In the social sciences, there are many cases in which social processes combine long periods of stability with short bursts of rapid change (Baumgartner, 2006; Gould & Eldredge, 1977), and it is possible that the same will happen to new communications technologies such as blogs.
The Internet affords increased opportunities to gain information about media consumers. For example, the owners of news websites can count the number of times each story is accessed, giving their journalists and editors direct feedback about the interests of readers and the popularity of different approaches to news coverage (Gillmor, 2006). Another convenient source for informal knowledge of public activities on a large scale is blogspace (Fukuhara, 2005; Glance, Hurst, & Tomokiyo, 2004; Gruhl, Guha, Liben-Nowell, & Tomkins, 2004). Blogs are often regarded as a new form of media in competition with traditional forms and sometimes as a potentially dissident/democratic communication channel (Gorgura, 2004; Rodzvilla, 2002; Smith, 2006).
In this article, the focus is on what blogs may reveal about the attitudes of the bloggers themselves, rather than the role of blogs per se in communication. Blogs are typically quite personal and individual (Herring, Scheidt, Bonus, & Wright, 2004), and hence can be used for insights into bloggers’ activities and perceptions. Bloggers probably represent a wider section of the community than any other large-scale Web publishing genre, although they are clearly not representative of the world’s population. For instance, blogs are popular in the United States, and those without access to the Internet will not maintain a blog (BBC News, 2005a). In general, blogs are probably heavily overrepresented in the population of richer nations, and the U.S. in particular, and uneven within nations, for example with relatively high numbers of suburban and student bloggers (Herring, et al., 2004; Lin & Halavais, 2004). Bloggers also seem to have their own international culture to a certain extent (Su, Wang, Mark, Aiyelokun, & Nakano, 2005).
In this article we use a combination of blogs and news sources to investigate communication during crises. The first general research question is:
RQ1: Which communication technologies are important during a crisis and do these differ across crises?
We are particularly interested in the emergence of technologies during a crisis, and hence the second research question is:
RQ2: Are there new communications technologies that emerge as important for the public in crisis communication, and if so, what type of communication need do they fulfill?
Finally, because we are interested in analyzing blogs rather than any other source of information, we ask the methodological question:
RQ3: Do crises precipitate discussions or mentions of new technologies in blogspace?
We define a crisis as a sudden and uncontrollable event that threatens the lives of a number of people. We are particularly concerned with crises that attract significant media attention. When a crisis occurs there may be a need for rapid communication of various types.
1General information need: the need to find out about the crisis event. This need could be satisfied, for example, by newspapers and television news.
2Personal information need: the need to ensure that certain individuals (e.g., friends, co-workers, relatives) are safe. Phone calls and emails could be used to discover this kind of information.
3Information usage: people may wish to communicate about the crisis itself, rather than to find out about it. This includes:
a.Discussing the event as a conversation topic (e.g., talking)
b.Informing or warning others (e.g., via phone calls or talking; on a larger scale, organizations may use public service announcements, posters, or television advertising campaigns)
c.Advocating efficient solutions to the crisis (e.g., emergency service actions; prevention of similar future problems)
d.Various other small-scale uses, such as using the event as part of a political argument
As the above list suggests, there are many reasons for crisis-related communication, and a different set of technologies may be most appropriate for each one.
While there is no systematic body of research about crisis communication, there are many relevant individual studies. Yet, other than research confirming increased interest in traditional mass media news sources, insufficient direct information seems to be available about the information seeking behavior of individuals during a crisis (Seaton, 2005). Bucher (2002) focuses on the related issue of the impact of the Internet on news consumption during a crisis, forming several hypotheses related to the potential for the Internet to deliver news of the crisis and to increase the possible number of sources that the public can use to gain news of events. Bucher sees journalists as “guides for global information space” rather than just gatekeepers. This more active role in news making seems appropriate for bloggers, too, given previous cases of the influence of blogs on the news. However, because we are interested in communication technologies rather than news related to a crisis, we are not primarily concerned with the ways in which bloggers influence the news (Gill, 2004) through alternative “personal journalism” (Allan, 2004), eyewitness reporting (e.g., the Baghdad Blogger, Thompson, 2003) or A-list blog commentary (Trammell & Keshelashvili, 2005). Nor are we primarily concerned with the way in which debate in blogspace can form a democratic “public sphere” (Gorgura, 2004; Habermas, 1991; Honeycutt, 2005; cf. Sunstein, 2004), although both of these uses may impact the communications technologies generally used.
The field of public health includes a body of research into “crisis communication” that is primarily concerned with ensuring that effective health information is conveyed to the public in periods of crisis (Covello, 2003; Mebane, Temin, & Parvanta, 2003; Wray, Kreuter, Jacobsen, Clements, & Evans, 2004). More specifically, it focuses on situations where there is an ongoing or impending risk to the public (3b above). In any country, this kind of information would normally only be relevant to domestic crises. Government organizations also produce many official publications on this topic (e.g., Steib, 2002). The goal of crisis communication in this literature is to transmit information effectively to the public in time to maximize the possibility that people take appropriate action to ensure their safety (Keselman, Slaughter, & Patel, 2005).
Also relevant is the public relations crisis: a situation in which an organization suddenly must engage in communication to offset a surge in negative publicity. Five practical guidelines have been proposed to guide public relations strategies to minimize the risk of negative press: (1) prompt response, (2) truth/avoidance of absolutes, (3) constant flow of information, (4) concern for victims and their families, and (5) choice of appropriate spokesperson(s) (Martin & Boynton, 2005). These guidelines advise organizations in dealing with the press rather than communicating directly with the public, but a recent trend is to use the Internet for unmediated and interactive communication with the public, although this is an extra component of a traditional media-centered strategy, not a dominating piece (Taylor & Perry, 2005). Public relations exercises in crises also operate at a national level, perhaps in a similar way, with governments competing with each other to make a favorable impression through their reaction to a crisis (Zhang, 2005) or seeking to persuade the public to accept a given course of action (Hiebert, 2003).
Communications Technologies, Blogs, and the News
We include here a brief discussion of technological change in news reporting, because the news is relevant to crises, and news reporting provides an interesting case study of the interrelationship between technology and communication. Moreover, news media research illustrates the potential that crises can have in accelerating or highlighting technological change. In terms of practicalities, electronic devices, such as hand-held cameras and satellite phones, have made it much easier for reporters to gain access to real time live footage (Bennett, 2003; Higgins, 2000). This change has altered the type of news that is reported (e.g., more event-driven news, Livingston & Bennett, 2003) and the reporting frame (Entman, 1993; Livingston & Bennett, 2003), and it has helped to introduce new reporting program styles such as “action news” (Bennett, 2003) and organizations such as CNN (Livingston & Bennett, 2003; Volkmer, 1999). Embedded journalism during wars (Walsh & Barbara, 2006) is a subject of recent controversy, and the ethics of embedded journalism has generated a political debate (Thompson, 2003; Zelizer, 2005).
New technology does not only impact the reporting of “traditional” news events; it can also influence what becomes news. For example, hand-held cameras in the hands of the public can create news, such as in the case of the videotaped police beating of Rodney King (Lawrence, 2000), and can make more routine events, such as individual murders, more newsworthy (Seaton, 2005; Zelizer, 2005).
In summary, new communication technologies can influence (1) what is reported, (2) how it is reported, (3) the type of organization that reports it, (4) the program format in which it is presented, and (5) the politics of the events reported. Clearly the issue of how new technology affects the news is far from straightforward, and it would be reasonable to expect a lesser, but significant, level of complexity in the impact of new technologies upon crisis communication.
Automatic Methods for Identifying New Crisis Communication Technologies
What methods are available to identify communications technologies that people use during a crisis? Traditional social science techniques such as interviews and questionnaires could be employed and might give useful answers. In this article, however, we use automatic methods in order to assess the extent to which they can provide useful information. The advantage of automatic methods is that, if developed, they can potentially provide easier and faster access to relevant information. Moreover, the methods described here are non-intrusive.
The Web now hosts a range of sources of information that can be used to make inferences about the activities of the general public to some extent. For example, major websites often carry a list of the most read/emailed/blogged news stories, which can give insights into which news stories seem to be the most interesting. An alternative and a more general indicator of public concerns, because of its wider coverage, is Google Trends (http://www.google.com/trends). This tool shows the relative popularity of searches over the past few years and could be used to assess the popularity of known communications technologies. However, it would not give valid results. For example, entering a search term of “SMS” (Short Message Service, used mainly on small mobile devices) produces a graph of the relative popularity of this search term over the past two years. From this graph (http://www.google.com/trends?q=sms) it can be seen whether the number of SMS searches was higher than average during any given crisis. The results are fragile for two reasons, however: The number of SMS searches may not match the use of SMS (which is also true for SMS mentions in blogs); and, more importantly, there is no way of being sure whether the searches were crisis related, as opposed to being coincident with something else like the introduction of a new online free text messaging service, a news story, or a high profile advertising campaign.
A better method would be to search blogspace via a tool such as BlogPulse (http://www.blogpulse.com), which is a search engine for blogs. It operates like Google except that it is restricted to blogs and can produce a graph of the relative frequency of the search term(s) in blogspace over a period of time. The advantages over Google Trends are: There is more reason to connect a blog’s mention of activities to the activities themselves, since many bloggers blog their lives in personal journals (Herring, et al., 2004); and, most importantly, the individual blog postings can be read to assess whether they are relevant to a crisis. Nevertheless, the link between the activities of bloggers and what they blog is still problematic, and bloggers are not representative of the population in general, so making inferences about the volume of use of communication technologies via blog searching is still unreliable. This is an unavoidable limitation of the method used in this study.
A problem that most of the above methods miss is that they can only confirm trends for known technologies rather than identify unknown, new technologies. The exceptions are interviews and open-ended survey questions (e.g., “which communications technologies did you use during crisis x?”). A method has been developed that is similar to BlogPulse-style blog searching but that is not dependent on prior knowledge of the names of the communication technologies searched for: RSS scanning (Thelwall & Prabowo, 2007, to appear; Thelwall, Prabowo, & Fairclough, 2006). RSS (Rich Site Summary/Really Simple Syndication) is a set of formats for presenting information concisely. It is used on the Internet by many blogs and news websites to allow interested site visitors to check automatically for updates via an RSS aggregator program (Gill, 2005; Hammersley, 2005; Notess, 2002). Users with aggregators only need to click on the RSS or Atom icon on a site that they are interested in, and the aggregator will then inform them whenever the site is updated. This is a useful way to keep track of a number of news websites and/or blogs without having to visit them repeatedly to check if they have been updated. For a blog, the RSS “feed” is in fact a list of the most recent posts on the site. RSS feeds are useful for analyzing large numbers of blogs for purely technical reasons: It is far easier and more ethical to gather data on a large collection of blogs via their RSS feeds than directly from the blogs themselves, because the simple and compact RSS format reduces Web traffic and the load on the servers hosting the blogs.
The RSS scanning method is based on a large collection of RSS feeds that are continually monitored over a period of time in order to build a database of daily lists of postings. These databases could then be used in the same way as the BlogPulse site: to generate time series graphs of the relative frequency of any given search term, such as SMS. The RSS scanning method is not based on direct searching, however, but scans every possible search (i.e., every word in the database) and reports only the search terms that show a significant increase in usage during the monitoring period. This automatic scanning bypasses the need to know the search terms in advance but does not necessarily produce relevant words. This problem is dealt with in two ways. First, only relevant postings are scanned. This is achieved by using some key words (or a Boolean statement) relevant to the topic investigated and ignoring the non-matching posts. Second, the results are manually scanned to sift out the relevant from the irrelevant searches. In our case, this means sifting out words that do not relate to communication. Thelwall and Hellsten (2006), in a demonstration that RSS scanning can be used to extract useful information about a crisis, utilized this method to compare media discussions of the London Attacks of 2005 with media timelines of the events. The findings suggest that retrospective media timelines of crises ignored several events that were significant at the time and also downplayed the role of communication during crises.
Why would the RSS scanning method be likely to identify communication forms emerging during a crisis? The assumption, based on the example of the news, is that crises can precipitate rapid changes in the use of technology, and that any such rapid change is likely to be reflected to some extent in the personal journals of blogspace by a sudden surge in the use of the name of the technology. To give a simple example, if SMS messages were used extensively by people during a crisis to check on the safety of relatives, then we would expect some of these people to blog that they had checked by SMS that everyone was safe, and that this would lead to an overall increase in the term “SMS” in blogspace during the crisis. The RSS scanning method would then automatically identify SMS as a word with a sudden increase in usage in posts relevant to the crisis.
Note that the RSS scanning method has the same drawbacks as the BlogPulse graphs: It only reports on bloggers and relies on what bloggers decide to blog. Moreover, it is more sensitive to unusual names, and thus new technologies with common names (e.g., orange) may fail to be identified. An additional drawback of the RSS format for monitoring blogs is that it is general purpose and not exclusive to blogs: Both news feeds and blogs were central to its widespread adoption (Gill, 2005). In summary, BlogPulse and RSS scanning both have advantages and disadvantages. We do not claim here that one is clearly better than the other; only that RSS scanning is a reasonable method to use in the sense that there is not another method that is clearly superior. Moreover, we suspect that most researchers, especially casual investigators, will use BlogPulse or similar online tools; it is therefore valuable to develop alternative approaches that can shed light on the strengths and weaknesses of existing tools.
In order to gain quantitative evidence from blogs about methods of communication during a crisis event, large numbers must be monitored, because fewer than 5% of the blogs are likely to mention the crisis (Thelwall, 2006), and only a fraction of blogs mentioning the crisis will explicitly mention communication.
We generated a collection of 19,587 RSS feeds from search engines and RSS databases using the purpose-built RSS collection and processing software Mozdeh (Thelwall, Prabowo, & Fairclough, 2006), beginning ongoing daily monitoring from the date that the software was fully operational: January 31, 2005. Additional feeds were added on August 30, 2005 to bring the total to 68,022. Blog monitoring continued until mid-November 2005 for this study. The feeds were collected using an ad-hoc combination of browsing and searches in Google and RSS feed databases. This is thus a convenience sample, which is unavoidable when collecting a large sample of feeds, because there is no single systematic register from which to select randomly. This is an issue that also plagues blog research and which it is difficult to avoid unless focusing on a specific blogger site that can be exhaustively scanned (e.g., Lin & Halavais, 2004). The problem with the latter approach, however, is that it is difficult to generalize results to the rest of blogspace. Note also that since only a subset of blogs supports RSS feeds, our sampling method is biased from the start against some blogs, probably including personal blogs owned by less technologically-oriented individuals.
The browsing was conducted primarily for news sites but also included some environmental sites (included for another purpose that is not relevant to this study). The main RSS feed database used was the now defunct RSS search engine CompleteRSS.com (see http://web.archive.org/web/20040720031625/www.completerss.com/company/AboutUs.aspx), which claimed at one time to be the largest RSS database on the Web and allowed anyone to submit their RSS feeds to their database. For the Google searches, we used the filetype:rss command to search for feed URLs, combining this with random mid-frequency English language words (e.g., expect filetype:rss) in order to increase the number of hits returned by the searches. From this sample we filtered out about 10% of feeds that contained mainly commercial, marketing, or sports information. This was a manual process based on feed URLs alone and hence was error prone. The remainder of sites analyzed for this article were mainly blogs and news feeds.
Specifically, the data produced by this method are daily lists of the postings of each blog and the news story summaries from each news feed. About 80% of the selected postings were from blogs and the remaining were from news media sources, including news aggregation services (Chowdhury & Landoni, 2006). Ideally, the corpus would be restricted to blogs, but filtering out all of the non-blog sources was not a practical option in our case. Blogs are known to vary in style from teenage diaries to photoblogs, journalists’ and activists’ comments, and academics’ pronouncements (Bar-Ilan, 2005; Cohen, 2005; Herring, et al., 2004; Huffaker & Calvert, 2005; Kim, 2005; Schaap, 2004). Our manual inspection of the blogs in the corpus suggests that they are also quite heterogeneous.
Given the ad-hoc nature of these data, the results of our analysis can only be suggestive; it would not be possible to make inferences from it with any confidence about the English-speaking world population or even English-speaking bloggers. Note that this is a general purpose collection of blog data that is available from the first author for other researchers to use.
Crisis Postings (Sub-Corpora)
We chose three case studies of crises that occurred after the corpus monitoring started, and for each case study we produced terms that we thought would be effective at selecting most postings related to the event. These events were: the London attacks of July, 2005 (search term: London); the Hurricane Katrina hitting New Orleans in August, 2005 (search term: Orleans); and the earthquake in October that devastated Kashmir and parts of Pakistan (search terms: Kashmir, Pakistan, earthquake, quake; used with the Boolean OR operator). In each case the term selection was problematic: Several unrelated significant events occurred in London; many commercial posts discussed New Orleans; and several posts discussed the game Quake in the Pakistan corpus. These terms also only retrieved blogs that were either in English or a language that utilized the English spelling of the words. This stage produced 34,883 feed-day postings for London, 60,119 for New Orleans, and 33,628 for Pakistan. Here, a feed-day posting is at least one posting on a day by a single feed. For each crisis we created a sub-corpus of just the postings (i.e., RSS items) containing the chosen word(s).
The three crises were chosen because they were the crisis events most covered by the media during the data collection period. Substantial media coverage was needed because the analysis method would not be effective without a significant number of English language blog postings. It was frankly a bonus that the three events took place in three continents so that interesting comparisons could be made with the results. Nevertheless, each crisis is a unique event with its own special characteristics, and the choice of only high profile crises means that the events are not necessarily representative of all crises. Note also that the casualties in the London events occurred mainly within a few minutes of each other, although chasing those responsible and a second failed attack produced extended media coverage. The earthquake was also a short-lived event, but it generated a prolonged human disaster due to the injuries caused and the devastation of the region. The hurricane was a more prolonged initial event, which was known about beforehand and caused damage for days after arriving in Louisiana. It also generated a prolonged human disaster due to the injuries, homelessness, and the need for reconstruction that followed.
The Volume of Discussion of Each Crisis
Figure 1 illustrates the relative sizes of the sub-corpora with respect to the original corpus of 19,587 feeds, i.e., excluding the additional data for the second half of the collection period (for this graph only) so that direct comparisons of volumes could be made. The increases in size of the sub-corpora at the time of their crises supports the legitimacy of investigating blogs and news feeds as a source of crisis-related information. The lower English language Internet interest in the third event, despite its enormously higher death toll, is consistent with the BBC’s findings that it was not one of their top-read Web news stories for October, 2005, even though the other crises both appeared for two months each (BBC News, 2005c). One of the reviewers of this article suggested that “event-blogging fatigue” may have taken place, “With the earthquake coming so soon after Katrina, bloggers may have been weary of crisis blogging (and audiences weary of crises).” This may well be one factor (e.g., “Katrina” is a crisis word for the earthquake, see Appendix) together with other factors including geographic distance and poorer news coverage due to the much smaller number of western journalists available to cover Pakistan, difficulties in gaining access to some of the remote areas most affected, and the lower quantity of first-hand eyewitness digital images, resulting in a lower “glitz” factor (Livingston & Bennett, 2003).
This section describes the methods developed, using the data described above, to identify and compare communication forms during crises. It is an adaptation of the RSS scanning method reviewed above. In the results section, we follow the initial results with more detailed analysis of specific cases. We therefore use a mixed methodology approach (Tashakkori & Teddlie, 1998) within a pragmatic paradigm (Creswell, 2003).
Word Usage Time Series
The RSS program Mozdeh extracted all the words used in each of the sub-corpora; a word was operationalized as any sequence of letters, numbers, and hyphens that was preceded and followed by a space or new line character, punctuation, or an HTML/XML tag. Every other sequence of characters was rejected, in addition to URLs and parts of URLs. This produced a large vocabulary for each sub corpus (Pakistan 187,977 words; London 153,781 words; New Orleans 239,601 words). For each word and each day in each sub-corpus, Mozdeh calculated the proportion of postings in the sub-corpus that contained the word. See Figures 2–5 for examples of the time series generated from these statistics.
In each sub-corpus Mozdeh calculated the ‘crisis value’ of each word, defined here as the largest percentage increase in usage (compared to the average of all previous days) for any day during the period of the crisis (i.e., counting only postings relevant to the crisis). Usage is defined as the number of feeds employing the word. For example, if a word occurred in 30% of all crisis-relevant active feeds on July 7 but occurred in an average of 10% of crisis-relevant active feeds on all previous days, then its crisis value would be 20%, larger if it had a higher jump in usage on any other day. Mozdeh then ranked the words in decreasing order of crisis value and selected the top 1,600 (a practical limit) for each sub-corpus. These are the ‘crisis words’ because they are particularly associated with crisis-related postings. Some of these words were related to communication, but most were not (see Appendix).
The rationale behind selecting words with large increases in usage during the crisis rather than just the most popular words during the crisis was to select words strongly associated with the crisis itself. Many words would be common in blogs for the crisis only because they are common words (e.g., news, people), but those increasing in usage during the crisis and in crisis-related postings may have a particular association with the event.
Identifying Crisis Communication Words
The crisis words in each sub-corpus were manually scanned to extract words referring to known communication forms. All unknown words were also checked by reading the postings containing them to see whether they referred to a previously unknown technology.
For the New Orleans corpus we ignored any word with a usage increase that was caused by repeated cross-blog reposting from a single source (about 20% of the words in this corpus; Spam is a problem in blogs [Han, Ahn, Moon, & Jeong, 2006]). All crisis words that were potentially communication-related were manually inspected (in the context of the postings in which they occurred) and rejected if they were not used in a context related to the crisis. For example, for Pakistan 15 words out of the top 1,600 were excluded in this way (e.g., MSN, ipod, Linux, palm, outlook). The manual checking stage is not perfect because it is likely to ignore communication forms described with common words or with words having an alternative well-known meaning (e.g., Sky). These limitations could have been avoided at the expense of additional work scanning more of the postings, but this would have been impractical.
Results and Further Analysis
Table 1 reports the complete results of the manual identification of communication-related terms from the top 1,600 in each crisis produced by the RSS Scanning method. This is the primary result of our method. See the Appendix for an idea of the kinds of crisis words present in the corpus but judged unrelated to communication.
Table 1. Communication-related crisis words in the top 1,600 for each corpus
Table 1 includes three potential emerging crisis media: Wikipedia, Wikinews, and Flickr. These were classified as emergent because they are new in comparison to the other sources and do not seem to have been previously discussed in the context of crisis communication. They appear in Table 1 because there was a significant increase in crisis-related postings containing these words during the London attacks (and during the hurricane crisis for Flickr).
How significant are the above three identified sources for crisis communication, and how do they compare to other forms? It would be interesting to know whether Wikinews surpassed traditional media, given that it can be updated in real time and is not subject to the traditional constraints (e.g., checking of sources) of established media organizations. Figures 2a-c present a comparison with the top single mass media source, the BBC. Figure 2a shows that all of these sources experienced a burst in mentions (in London-related postings) on the day of the bombings (7.7.2005) and attempted bombings (7.21.2005), with both Wikipedia and Flickr experiencing significant usage, although less than half of the usage of the BBC almost all of the time. It seems that the new communication forms were particularly useful for sharing information and finding out facts concerning the initial event, but after this the mainstream sources were able to deal most effectively with news coverage of its aftermath. Figure 2b shows a similar pattern for the hurricane: a strong emergence of Flickr and Wikipedia during the initial few days of the crisis. In contrast, Figure 2c suggests a small and ongoing part for new media for the Pakistan discussions (about 1%, a similar level to the other two graphs), but no special role for the period of the earthquake and its immediate aftermath.
Note that the graph is mainly from the perspective of bloggers and so is likely to exaggerate the importance of online information and particularly blog-like information sources. The omission of URLs (as described in the Word Usage Time Series section above) may also contribute to the higher relative usage of the term BBC, which is primarily a television and radio broadcaster. Online sources will not always be referred to by name but may have their URL or an embedded link given instead, while a traditional broadcast would probably be referred to exclusively by name.
The established mass media are represented by international news broadcasters (BBC, CNN, Sky), international news sources (Associated Press, Reuters), American mass media (NBC, Newsweek), and three New Orleans regional media (WWL, WWL-TV, Clarion-Ledger). The high profile of established news sources is not surprising, since a significant minority of blogs (about 13%) take the form of information filters (Herring, et al., 2004; Herring, Kouper, Paolilla, Scheidt, Tyworth, Welsch, et al., 2005), and a higher proportion of our sample is news feeds. Nonetheless, the results suggest that mainstream media sources increase significantly in importance during a crisis, corroborating previous claims (Seaton, 2005).
For London and Pakistan the top news source was the BBC, whereas for New Orleans it was CNN. Figure 3c illustrates the relative strengths of these two, in addition to Sky, for the Pakistan events. It is logical for the BBC to be a main source for London, and CNN for U.S.-based events, and perhaps the historical U.K.-Pakistan connection means that the BBC is better placed in Pakistan than CNN. It may also be that the BBC is simply the top news source in terms of blog links, as suggested by the links in a different blog data set that only considered the days before the London attacks (Intelliseek, 2005; Thelwall, 2006).
Note that the BBC, CNN, and Sky were monitored as part of the corpus, and thus up to five feeds relating to them could be self-references in each case (i.e., there were up to five feeds for each of these news sources), which could exaggerate their relative impact in a minor way, by up to 1%. Many blog and news postings quoted Reuters or Associated Press (AP) news wires. These are not shown, as we consider AP to be a service to the mass media rather than a mass medium in its own right, in the sense of having direct contact with the public (although the Internet is blurring this boundary).
Figures 3a and 3b demonstrate the dominance of one mass media source over the others, although it is a different source that dominates in each figure.
It was interesting that several local news sources were mentioned extensively in the case of New Orleans, but not for the other two events. New Orleans local newspapers, a TV channel, and a radio station all were mentioned by name. As Figure 4 shows, the local Times-Picayune newspaper was an important information source throughout the event. This probably reflects the ability of that newspaper’s reporters to get relevant and timely information. Interestingly, the TV and radio sources only seemed to be important at the very start of the crisis. This perhaps reflects the impact of the dramatic scenes during the hurricane or the faster-moving nature of the crisis during the hurricane, making real-time information more critical.
One individual blog (AmericaBlog) occurred by name in the corpora. In addition, several other blog-related words occurred, including generic blogspace names like lj (LiveJournal.com) and metroblogging. These two sites host large numbers of personal blogs, so their occurrence probably reflects mentions for many different bloggers. Although only one individual blog appeared in the top 1,600 terms, and most blogs are probably read by a few people, the collective power of blogs as a communications medium may be more significant than this suggests because of the cumulative effect of millions of bloggers (see Pollard, 2005), and because bloggers may link to or comment on other blogs without mentioning them by name.
Some forms of communication may not have been identified by the RSS scanning method above because they are in constant use and are not specifically relevant to crises. To investigate common forms of communication, we therefore generated word usage time series for a set of words that we deemed relevant to communication. The purpose of this was exploratory: to compare the extent of mentioning of these terms between and within crises to identify any patterns or unusual differences.
Figure 5a illustrates a variety of communication-related words in the New Orleans sub-corpus. Hurricane Katrina hit the coast of Louisiana on August 29, 2005, and a prolonged period of uncertainty followed as the extent of the damage became clear and a slow attempt was made to rescue people from the affected areas. Although “phone” was not selected as a crisis word for this sub-corpus, it still demonstrated a noticeable jump in usage, as did “email” and “e-mail” (both well after the initial storm hit New Orleans). The graph should be interpreted carefully. It should not, for instance, be inferred that blogs were more important than SMS (mobile phone/cellular phone text messages), since the source corpus is dominated by blogs. It is likely that SMS was in fact more popular, but that a small proportion of SMS users blogged that they used SMS during the crisis. Perhaps surprisingly, blogs did not seem to be useful for crisis communication in this case, as the usage of the term decreased during the crisis period.
Figure 5b shows a similar pattern to Figure 5a and highlights the importance of phones during the time around the attacks. Figure 5c suggests that communication methods had about half of the importance for the Pakistan event, in comparison to New Orleans (Figure 5b), at least for bloggers (compare the y-axis scales). It is difficult to see why communication would be less important in Pakistan than elsewhere: Many more people were killed in Pakistan and the crisis affected a much larger area, so there must have been many more people trying to use any means possible to find out if their loved ones were safe. One possible answer is that bloggers in the corpus typically were further removed from the events in Pakistan than in the other two cases. Assuming a U.S. dominance of the overall corpus, due to the English language bias of the corpus and extensive blogging in the U.S., it is conceivable that the view of Pakistan was a more dispassionate one that was generally made by people who did not have direct connections, especially given the low Internet penetration rate of Pakistan and its different languages. The top words in the Pakistan corpus, as in the other corpora (see Appendix), were typically fact-based, with many proper nouns and descriptions of actions (e.g., relief, toll, survivor). Moreover, in the Pakistan corpus, no emergent forms of communication were mentioned in more than 1% of postings on any day of the crisis.
The first research question concerned the importance of communication technologies during crises. Which communication technologies are important during a crisis and do these differ across crises? A wide variety of communication technologies seem to be important, at least to the bloggers in our corpus, including national and regional mass media and personal communications technologies. The precise mix of technologies seems to depend on the nature of the crisis. Local news sources were only significantly present for New Orleans, and the Wikipedia and Wikinews were only significantly present for London. No communication forms seemed to be important for Pakistan.
The second research question addressed emergent communications technologies: Are there new communications technologies that emerge as important for the public in crisis communication, and if so, what types of communication need do they fulfill? Three new technologies emerged as important to our corpus bloggers: Flickr (created in 2004), Wikipedia (2001), and Wikinews (2004). In addition, other relatively new technologies were also present: SMS (1992), cellphone (1980s), webcam (1991), and blogs (1999), including both individual blogs and blog sites. Cellphone cameras were also mentioned frequently but not picked up by our single word search method. Interestingly, the three new technologies and blogs could all be characterized as Web 2.0 phenomena, in the sense of being online resources benefiting from (Flickr) or made possible by (Wikinews) Internet-based collaborative working (O’Reilly, 2006). It is likely that the new technologies were fulfilling a general rather than a personal information need: helping people to find out about the crisis rather than checking the safety of individual people.
The third and last research question was: Do crises precipitate discussions or mentions of new technologies in blogspace? The answer is yes, but only Web 2.0 collaborative information sharing technologies were new, at least during the crises of 2005. Of course, in the future different technologies will be new. Surprisingly, we found no mentions of new offline technologies, such as types of mobile communication devices. Mobile phone pictures were important for London, but the mobile phone-related innovation derived from posting images on Flickr rather than from a hardware innovation like videophones or wireless connectivity.
The main limitation of our method, in terms of the first two research questions, is its bias toward blogs with RSS feeds and news stories and also toward communities containing bloggers. This undermines the significance of the findings related to Web 2.0, in particular, since bloggers are more likely than other people to be early adopters of Web 2.0 technologies. In addition, the corpus is dominated by English language sources and hence is biased towards English speaking peoples and the U.S. Moreover, since the sub-corpora were defined by the inclusion of English terms, these are likely to be more biased in this respect than the overall corpus. It would be interesting to apply similar methods in languages other than English with a widespread blogger community, such as Farsi, perhaps using a special-purpose RSS or blog corpus. Such an approach might also make it possible to compare discussions of events that were not reported in the West but which were regarded as important elsewhere. Finally, the method is limited to crises for which a large volume of data can be collected; we suspect that, in practical terms, this means crises with extensive coverage in the English-language news media.
The word frequency technique is particularly sensitive to new words, and hence technologies with common existing names (e.g., Orange phones in the U.K.) would be unlikely to be found. Similarly, multiple-word names would appear as separate single words. Future research might consider how these problems could be overcome and how the inclusion of further automatic analyses for the terms (e.g., consideration of only those not in a dictionary) could reduce the labor effort in manually scanning the word lists.
The methods adopted in this study have looked at only one technique for the identification of emerging communication forms. It would be interesting to compare this with other methods, such as interviews or questionnaires aimed a wider public than bloggers alone, or through qualitative (e.g., content analysis) and quantitative (e.g., blogger interviews) blog-based research. In addition, comparisons with qualitative techniques such as judgment or voting by media experts or bloggers would be useful to assess the extent of overlap between the two approaches and whether one would be better.
Finally, the lack of demographic information about the creators of the data is a drawback. The corpus is probably dominated by U.S. sources, and thus the results could reflect a national perspective. Ideally, researchers would have access to precise demographic information and would compare the results for different communities, but this seems far beyond current (academic) technical capabilities.
Despite the limitations of the methods which make it difficult to make strong claims about the relative importance of media from the results presented here, we found three candidates for emerging media. It cannot be claimed that this is new information, however, since some blog posts at the time of the London attacks explicitly cited Flickr, Wikinews, and Wikipedia as useful sources of information, and Google describes Wikipedia as one of the top gaining search terms of 2005 (Google, 2005). The quantitative method employed here therefore serves to take this information into an academic setting, to give it numerical backing, and to confirm the existence of a rise associated with a crisis, at least for the bloggers in our corpus. It also serves to differentiate crisis communication types from more general communication forms: For instance, Google also listed two other communication tools (orkut and MySpace) as top gaining terms for 2005 (Google, 2005), but our results suggest that these were not particularly crisis-relevant. Orkut is a friendship community, and a rise in mentions during a crisis might have suggested its use for personal information gathering, such as checking that friends were safe. Logically, this probably did happen for London and New Orleans to some extent, but this was not reflected in blogspace, at least not in our corpus. Also, and applicable to MySpace as well, the absence of mentions may relate to the fact that both are in some sense competing technologies with the pre-existing blog software and blog sites. In addition, the comparison across crises has shown that the view of communication from our presumably U.S.-centered blog-dominated corpus is varied, with communication not seeming to be a major issue for bloggers blogging about the Pakistan-Kashmir earthquake.
The research presented here suggests several promising avenues for crisis-related research:
1The use of Wikinews and Flickr, and Web 2.0 technologies generally
2The role of local news sources
3Comparative analyses of different crises to investigate and explain differences in the roles of communication forms.
The limitations of the blog-based word frequency approach reported here notwithstanding, we believe that it has sufficient advantages to be useful in future research. Specifically, it is capable of identifying emerging trends through a retrospective analysis of crisis-related postings, particularly for user-generated new media such as many Web 2.0 applications. In addition, it is able to provide a preliminary assessment of media sources for a crisis, including a comparative assessment, which could be valuable as a starting point for more detailed exploration of phenomena that appear particularly significant or unusual.
This work was supported by a European Union grant for activity code NEST-2003-Path-1. It is part of the CREEN project (Critical Events in Evolving Networks, contract 012684). Susan Herring and two anonymous reviewers are warmly thanked for valuable extensive suggestions and contributions.
Table A. The top 50 crisis words in each corpus (including words unrelated to communication).Notes: The words have been automatically depluralized by removing any terminating -s. Terms in italics derived mainly from a single, much copied news story.
About the Authors
Mike Thelwall is a member of the Statistical Cybermetrics Research Group at the University of Wolverhampton, U.K. He develops methods for extracting large-scale data from the Web to support social science research goals.Address: School of Computing & Information Technology, University of Wolverhampton, Wulfruna Street, Wolverhampton WV1 1SB, UK
David Stuart is a member of the Statistical Cybermetrics Research Group at the University of Wolverhampton, U.K. He is currently researching the ability of Web links to provide indicators of relationships between organizations but is broadly interested in all areas where information science meets the Web.Address: School of Computing & Information Technology, University of Wolverhampton, Wulfruna Street, Wolverhampton WV1 1SB,