Identifying factors of online news comments

Authors


Abstract

The purpose of this project was to identify the factors which should be studied based on the available components of the online comments and using available data analysis tools. The factors were the geographic locations, the number of readers' recommendations as a sentimental indicator, and top phrases of the comments. The selected news story was the execution of Troy Davis reported by The New York Times on September 21, 2011 (Severson, 2011). Troy Davis, an African American, was convicted of murdering an off-duty police officer in Savannah, Georgia in 1989. The U.S. Supreme Court rejected the appeal despite worldwide opposition to Davis's death sentence. This story was selected due to the significant international level of attention it received, and the topic of “death penalty.” There were 1527 comments collected on December 22, 2011, 60 days after the story was published online. The comments were classified into 2-tier geographic categories based on the 4 regions and 9 divisions defined by the U.S. Census Bureau. The preliminary results indicated that the geographic locations, the number of readers' recommendations as a sentimental indicator, and top phrases of the comments were appropriate research factors for studying online news comments.

INTRODUCTION

The rapid development of online/virtual communities has drawn research interests from scholars, practitioners, and policymakers to study the social, economic and technological impact on society (Daniel, 2011). News providers worldwide have been providing virtual space for interactive feedback on news stories to readers of online news (Gibbs & McKendrick, 2011). Such virtual space and interactive feedback on news stories presents interdisciplinary research challenges to news organizations, practitioners and researchers in studying readers' motivations and their behaviors when reading online news (D'Hertefelt, 2000).

LITERATURE REVIEW

Thelwall and Wouters (2005, p. 187) suggested that “information scientists can play a valuable role by evaluating new information sources in a meta-disciplinary context, developing tools and methods to analyse the data and, crucially, contributing to the predication of the kinds of research questions that the data may usually help address.” Tsagkias, Weerkamp, and de Rijke (2010) analyzed news comment volume to generate useful predictions for identifying news stories as well as for supporting front page optimization for news sites. With regard to the e-commerce sector, Singer (2012) reported that Fab.com, an e-commerce site, started monitoring how its shoppers' online comments and traffic impacted the company's revenue. Another example is that Facebook appropriates users' online comments about specific products in order to create marketing tools for Facebook's advertisement clients (Sengupta, 2012). This seeming trend of organizational building based on consumer comments is an example that illustrates the need for researchers to study users' online comments as forms of human communication and how that communication further informs various processes in context. Ali-Hasan and Adamic (2007) included the geographic locations as a study factor when analyzing bloggers' online and offline relationships. Additionally, Pang and Lee (2008) advocated research needs on the area of opinion mining and sentiment analysis within the opinion-rich resources such as online news sites.

In sum, online news comments have become new research territories for information scientists. Several previous studies have identified geographic locations and the sentiment of the online comments as study factors.

This poster focused on the social aspect of the online news analysis. The selected news story was the execution of Troy Davis reported by The New York Times on September 21, 2011 (Severson, 2011). Troy Davis, an African American, was convicted of murdering an off-duty police officer in Savannah, Georgia in 1989. The U.S. Supreme Court rejected the appeal despite worldwide opposition to Davis's death sentence. This story was selected due to the significant international level of attention it received, and the topic of “death penalty.”

RESEARCH QUESTIONS AND METHODOLOGY

The purpose of this project was to identify the factors which should be studied based on the available components of the online comments and using available data analysis tools. The factors were the geographic locations, the number of readers' recommendations as a sentimental indicator, and top phrases of the comments. Therefore, this poster aims to answer the following questions:

  • 1.Is there any significant relationship between the readers' “recommended” votes and their geographic locations?
  • 2.What were the common phrases used in all news comments?
  • 3.Is there any significant relationship between the common words and the readers' geographic locations?

Figure 1 is an example of the reader's comment from The New York Times. A spreadsheet program was used to record the reader's online name, comment, and the number of “Recommended by readers.” The “Recommended” count was used to indicate a reader's sentimental feedback on an online comment.

Figure 1.

A reader's comment

There were 1527 comments collected on December 22, 2011, 60 days after the story was published online. The comments were classified into 2-tier geographic categories based on the 4 regions and 9 divisions defined by the U.S. Census Bureau (U.S. Census Bureau, 2011). All comments from other countries or those that were unidentifiable were classified as region 5 and division 10 (Table 1).

Table 1. Categories of geographic classification
RegionsDivisions
  • 1Northeast
  • 2Midwest
  • 3South
  • 4West
  • 5Non-U.S.A
  • 1New England
  • 2Middle Atlantic
  • 3East North Central
  • 4West North Central
  • 5South Atlantic
  • 6East South Central
  • 7West South Central
  • 8Mountain
  • 9Pacific
  • 10Non-U.SA.

Wordstat, a content-analysis software program, was used to analyze the phrase frequency of the 1527 comments. The purpose of the content analyze was to identify any similar pattern of the phrases. Stop words (e.g., of, the, and) were excluded from the data sets. The results reported were based on the frequency of top phrases, the number of comments, and the percentage of the comments. SPSS was used to examine any significant statistical relationship between the number of “recommended” votes, readers' geographic locations, and top words of the comments.

RESULTS

Table 2 and 3 presented the distribution of the online comments' geographic origins and the number of “recommended” votes. The majority of the comments (34.8%) were posted by the readers from the northeastern region of the United of States. Over 74% of comments (1131 out of 1527) received popularity votes between 1 and 50. On the other side, over 20% of the comments did not receive any “Recommended” votes.

Table 2. Geographic distributions of comments
original image

3

Table 3. Distributions of the “Recommended” votes
 N%
All comments1,527100
# of “Recommended” votes  
Over 1716310.7
16110.7
15130.9
14140.9
13151.0
12251.6
11332.2
10402.6
9352.3
8553.6
7694.5
6845.5
5966.3
41409.2
31499.8
21419.2
11228.0
032221.1

Is there any significant relationship between the readers' “recommended” votes and their geographic location? The results from the chi-square analysis indicated significant relationships between the readers' “recommended” votes and their geographic locations:

  • Recommended votes and Regions: χ2 (68)=1209.35, p<0.00, and

  • Recommended votes and Divisions: χ2 (153)=1300.08, p<0.00.

What were the common phrases used in all news comments?

Table 4 presented the top 10 phrases from the 1527 comments. The top 4 phrases occurred in more than 1% of the comments.

Table 4. Top 10 common phrases from all comments (N=1,527)
 Frequency% of comments
United States of America503.3
Eye for an eye251.6
Support the death penalty151.0
Family of the slain officer151.0
Rest of the world130.9
Guilty beyond a reasonable doubt120.8
Cruel and unusual punishment120.8
Abolish the death penalty110.7
Execution of Troy Davis100.7
Live in a country70.5

Is there any significant relationship between the common words and the readers' geographic locations? There were significant relationships found between 17 words from the comments and the regions of the comments (Table 5). Ten of the 17 words were the same or similar to those words of the top ten phrases.

Table 5. Significant words within Regions (N=1, 527)
Top wordsRegionsχ2p
12345
America4112716116.980.00
Executed4382939114.690.01
Case86225527011.710.04
System74313226011.660.04
Punishment55243117010.270.07
Justice12248527009.920.07
Capital4218151309.670.09
Family4826282408.790.12
Penalty11651736508.040.15
Evidence5216452707.890.16
Doubt6018463017.550.18
Guilty6425513406.520.26
Guilt536272406.470.26
State10431624105.760.33
People11344687705.370.37
Black408302704.960.42
Death1716010910304.360.50

DISCUSSION AND CONCLUSIONS

The preliminary results indicated that the geographic locations, the number of readers' recommendations as a sentimental indicator, and top phrases of the comments were appropriate research factors for studying online news comments. Based on the results, more in-depth analysis on extracting common words and phrases based on the geographic locations and readers' sentimental feedback is needed. Further development of data analysis tools for mining factors is also desirable.

Ancillary