Understanding information and knowledge sharing in online communities: Emerging research approaches


Hsin-liang Chen

School of Library and Information Science, Indiana University–Indianapolis


  • Hsin-liang Chen

    Associate Professor, School of Library and Information Science, Indiana University—Indianapolis, chenhsin@iupui.edu

  • Anatoliy Gruzd

    Assistant Professor, School of Information Management, Director of Social Media Lab, Dalhousie University, gruzd@dal.ca

  • Xiaozhong Liu

    Assistant Professor, School of Library and Information Science, Indiana University, Bloomington, liu237@indiana.edu

  • Eric Meyers

    Assistant Professor, School of Library, Archival & Information Studies, The University of British Columbia, eric.meyers@ubc.ca


Social media have become an important component of contemporary information ecosystems. People use social media systems, such as Twitter, Facebook, YouTube, and Tumblr to communicate ideas and information needs, seek advice and solve problems, show appreciation and disagreement with a person or issue. These tools facilitate the emergence of communities, often resembling the communities of practice that arise in workplaces and educational institutions, where a common interest, identity and set of norms and structures for communicating develop through interaction. But while it seems easy to “suck in” data streams from social media to understand online communities, making sense of the vast data sets has been challenging. The issues include not just the tools and methods for extracting and synthesizing large data sets like the Twitter Firehose, they also extend to the ethical and responsible use and reporting of this data for academic and commercial purposes.

This panel will focus on methodological approaches and research strategies for the study of social media communities, in particular web 2.0 tools that play an important role in the North American cultural landscape. In exploring the challenges of working with social media data and online communities, this panel will pose and address a number of questions, including:

  • 1.What philosophical approaches do we take to understanding online communities?
  • 2.How can we thoroughly address the ethical issues surrounding the use of “open” social media data?
  • 3.What methods do we currently use for gathering and analyzing social media data?
  • 4.What challenges do social media present to researchers and scholars, and how can we overcome these challenges?
  • 5.What is the appropriate role for researchers to play in interacting with online communities (active participant or dispassionate spectator)?
  • 6.What technical tools do we need for better analyzing the use of social media by online communities?


The panel speakers will present recent empirical or conceptual work focused on social media, in particular discussing the methodological issues posed by the data collection, analysis and reporting. Each speaker will limit his remarks to 15 minutes. This will allow approximately 30 minutes for the panelists to engage in open discussion with the audience and with each other regarding the intersections of their theoretical or empirical work.

Each presenter will pose a question at the end or his talk to engage the audience in reflection; these questions will be re-posed at the conclusion of the presentations to ground and stimulate discussion. The speakers on this panel have organized and participated in large, diverse panels at ASIS&T and other Information Science-related conferences, and have found that this confluence of perspectives to be an engaging and exciting approach to fostering discussion in this domain.

All four panel speakers will discuss the first two questions together, and then each speaker will take turns leading discussion on Question 3–6.

  • 1.What philosophical approaches do we take to understanding online communities?
  • 2.How can we thoroughly address the ethical issues surrounding the use of “open” social media data?
  • 3.What methods do we currently use for gathering and analyzing social media data?
  • 4.What challenges do social media present to researchers and scholars, and how can we overcome these challenges?
  • 5.What is the appropriate role for researchers to play in interacting with online communities (active participant or dispassionate spectator)?
  • 6.What technical tools do we need for better analyzing the use of social media by online communities?

What factors should be studied in online news comments?/Hsin-liang Chen

News providers worldwide have been providing virtual space for interactive feedback on news stories to readers of online news (Gibbs & McKendrick, 2011). Such virtual space and interactive feedback on news stories presents interdisciplinary research challenges to news organizations, practitioners and researchers in studying readers' motivations and their behaviors when reading online news (D'Hertefelt, 2000).

This presentation will focus on the social aspect of the online news analysis. The selected news story is the execution of Troy Davis reported by The New York Times on September 21, 2011 (Severson, 2011). Troy Davis, an African American, was convicted of murdering an off-duty police officer in Savannah, Georgia in 1989. The U.S. Supreme Court rejected the appeal despite worldwide opposition to Davis's death sentence. This story was selected due to the significant international level of attention it received, and the topic of “death penalty.” 1527 comments were collected on December 22, 2011, 60 days after the story was published online. The comments were classified into 2-tier geographic categories based on the 4 regions and 9 divisions defined by the U.S. Census Bureau (U.S. Census Bureau, 2011). All comments from other countries or that were unidentifiable were classified as region 5 and division 10. The comments were also classified by the number of the popularity votes into 6 categories by an increment of every 50 votes. The majority of the comments (1131 out of 1527) received popularity votes between 1 and 50.

The purpose of this project is to identify the factors which should be studied based on the available components of the online comments and using available data analysis tools. Dr. Chen will discuss the research procedure and preliminary results based on the geographic location, number of popular votes, and the top common phrases used in the comments. Two software programs, SPSS and WordStat, were used for data analysis. Dr. Chen will also discuss the development of data analysis tools for mining desirable factors.

Automated discovery and analysis of online communities/Anatoliy Gruzd

Dr. Anatoliy Gruzd will present on his work in the area of automated discovery and analysis of online communities. In particular, he will discuss and demo a web-based system for automated discovery, analysis and visualization of information about online communities called Netlytic (formerly known as ICTA - Internet Community Text Analyzer) (Gruzd, 2011).

Netlytic (http://Netlytic.org) is designed to analyze online interactions within text-based online communities such as: health support groups, fan forums, customer reviews blogs, Twitter communities, etc. Specifically, Netlytic can help researchers to automatically discover and visualize who is talking to whom within an online community, how often they are communicating, what they are talking about, and the nature and relative strength of their relationships or interactions with each other. As a way to demonstrate some of the main aspects of the system and the types of analysis that it allows researchers to perform, the presentation will use a case study of how Netlytic was used to study Twitter use to communicate and disseminate information during the 2011 Canadian Federal Election.

Netlytic is being developed by Dr. Gruzd and his team at the Social Media Lab at Dalhousie University, Canada. Since 2006, it has proven to be useful in studies of various online communities including learning communities (Haythornthwaite & Gruzd, 2008; Gruzd, 2009a), communities of bloggers and blog readers (Gruzd, 2009b; Chung et.al., 2010), communities emerging on the i-Neighbors website (Hampton, 2010), a scholarly community on Twitter (Gruzd et.al., 2011) and most recently a study of an online community around a popular fan website, TheOneRing.net, dedicated to discussions of Tolkien's text and Jackson's film versions of The Lord of the Rings (Martin, 2011).

Computational challenge and opportunity/Xiaozhong Liu

With the exponential growth of social media sites and various online communities in the past couple of decades, we are now facing some exciting research opportunities along with some challenges. It is now possible to improve existing information systems and processes, i.e. information retrieval, text mining, digital library, by leveraging data from social media. But how can we effectively extract real-time user interests from massive social media data, and at the same time, separating useful information from noise?

In the area of Information Retrieval (IR), most modern search engines rely on short keyword queries (instead of long ones) to represent users' information needs (Jansen, Spink, Bateman, & Saracevic, 1998), which often makes it difficult to gauge individual's actual information needs. For instance, when a user types “Obama,” it is not easy (even for humans) to figure out if she is interested in “Obama's policy on education” or “Obama and Iraq war”. To address this problem, a promising direction in this domain has emerged to expand short user queries using information from users' social media data (Liu & von Brzeski 2009; Liu. 2009), which is based on the premise that information about users' online communities and their interaction within those communities may help search engines to disambiguate particular facets of their information needs. However, this is quite a challenging task, i.e. most Twitter messages are very short and noisy. And most of the existing methods in Natural Language Processing (NLP) and text mining are not equipped to deal with such data.

To address the challenges of handling social media data, researchers in IR and related disciplines have been working on producing annotated social media datasets which in turn can be used to train NLP and text mining methods to “understand” social media data better. For example, in 2006, TREC (Text REtrieval Conference) introduced the Blog track (Ounis et al., 2007; Macdonald, Ounis, & Soboroff, 2008). In this track, new IR-related tasks were added, such as sentiment analysis, which were based on (then new) web elements of blogs (i.e. comments or permalink). And since social media is a sub-class of cyberspace, most web-structure based ranking algorithms can be directly applied to blog ranking, such as PageRank and HITS. Dr. Liu will discuss how to address some of the challenges in this area with the help of new algorithms tailored to analyze data from blogs and Twitter.

Analyzing learner discourse in YouTube/Eric Meyers

YouTube is one of the largest publicly accessible databases in the world, providing informative and entertaining videos to viewers across the globe. Each day over 8 years of video content are uploaded to this system, and users, 70% of whom reside outside the US, view 3 billion videos (YouTube, 2012). This research project is interested in a niche segment of YouTube's vast database, namely informational video that may be used by middle and high school students to support their academic needs. In this project, we are focusing on young peoples as consumers of these videos, but also in their roles as contributors to the conversation around learning content. The comments learners contribute to the system provide important information about users' information needs, their context of use, and their individual interpretations of the learning content. These comments are also highly social, and reveal significant community engagement in the active construction of meaning (Meyers, 2012).

Dr. Meyers will discuss his method of developing and implementing a system of coding user comments in YouTube, with a specific focus on how these comments reveal social learning practices. He employs computer-mediated discourse analysis or CMDA (Herring, 2001) as well as several cloud-based extraction and analysis tools, including InfoExtractor (http://InfoExtractor.org) and DiscoverText® (http://discovertext.com). His research team is analyzing a corpus of over 6,000 user comments extracted from 4 playlists related to different content domains. In analyzing these comments, his scheme emphasizes the relationships among users of video content, problems in the learning domain, and knowledge construction practices. This work will help educators and information professions to understand the evolving relationship between users and social media in e-learning contexts.