Helpful to you is useful to me: The use and interpretation of social voting

Authors


Abstract

Social voting plays a key role in the organization of user-contributed content; readers are asked to indicate what they “like” or find “helpful,” with collected votes then used to prioritize valued content. Despite the popularity of these mechanisms, little is known as to how users employ and interpret this feedback. We conducted a study in which participants researched items at two review communities, keeping lists of reviews they found helpful and not. We observed their behaviors and asked them to rate reviews on several dimensions. We found consensus that helpful contributions are clearly written, relevant to users' needs, and express an appropriate amount of information. We also observed that users relied on others' judgments, attending to the most helpful content. We discuss implications, when users behave as though what is helpful to others is helpful to them.

INTRODUCTION

As the amount of user-contributed content (UCC) on the Web increases, so does the need to organize this information. Much UCC takes the form of text (e.g., postings to a forum, comments in response to another contribution, reviews of a product or service). Large communities in particular tend to have smaller signal-to-noise ratios, making the task of finding useful information challenging (Gu et al., 2007). Because they are likely to facilitate interactions between participants through textual postings, online communities can be particularly susceptible to information overload (Jones et al., 2004). While users rely on heuristics such as appearance or grammaticality when judging online sources generally (Fogg et al., 2001), in an online community environment, social feedback, including voting mechanisms and cues such as participant profiles, might help users judge the credibility of postings by sources unknown to them (Rieh, 2002).

Social voting mechanisms are now widely used to organize UCC and involve a variety of constructs. For instance, many mainstream news sites (e.g., cnn.com) that invite readers to respond to articles, allow users to vote for comments they “like.” Another approach, often used for consumer reviews (e.g., drug reviews at WebMD.com, product reviews at Amazon.com), is to ask users whether or not UCC is “helpful.” The feedback collected is then used to organize UCC, aiming to help others navigate a large collection of texts by highlighting what the community as a whole considers its most valued content.

Previous studies have addressed issues surrounding the use of social voting to organize UCC, such as the characteristics of highly rated content (e.g., (Danescu-Niculescu-Mizil et al., 2009; Liu et al., 2007; Otterbacher, 2009)). However, little is known with respect to how users employ voting and other social cues while engaged in an information search. Researchers (e.g., (Kostakos, 2009; Li & Hitt, 2008; Liu et al., 2007)) have also uncovered potential biases in voting mechanisms. Therefore, it is important to study not only what contributions end up being highly valued by a community, but also what users do with this information.

We focus on information seekers' interpretation and use of social voting at two review communities, Amazon.com and the Internet Movie Database (IMDb.com). We found that users interpret the constructs of “helpfulness” and “usefulness” to mean that the judged review is clearly written, is relevant to their needs, and contains the right amount of information. We also found that users rely on the social votes to a large extent, and use other cues of information quality and source credibility to a lesser extent. Though others have found that users report the routine use of group-based tools and judgments for evaluating information online (e.g., (Metzger et al., 2010)), our findings are valuable because they rely on direct observation of our participants engaged in information-seeking tasks rather than users' self-reports.

RELATED WORK

In addition to combating information overload, the provision of social feedback on UCC serves many purposes in a community. Receiving feedback encourages users' continued participation in the creation and sharing of quality content (Moon & Sproull, 2008; Rashid et al., 2006). The use of profiles, or other means through which users can express identities, further enhances such benefits. In particular, when users feel others can uniquely identify them, they contribute more and report being more satisfied by their experiences with the community (Ma & Agarwal, 2007). Feedback via voting can be less personal in nature, with many sites simply displaying the number or proportion of voters who were positive toward the contribution. Nonetheless, some contributors are motivated by the prospects of their contributions receiving high ratings from others (Gilbert & Karahalios, 2010).

Social Voting

Although the use of social voting has proliferated in recent years, the concept is not new. For example, Hill and Terveen (1996) described the use of voting in recommending interesting URLs to participants in Usenet newsgroups. In their work, the votes were not explicitly solicited from users, but were gathered through social processes. Each time a URL was mentioned in a newsgroup, it was considered a vote, with the most frequently mentioned URLs considered the most interesting. Similar approaches have been taken in recent work. For instance, Chen and colleagues (2010) used the social voting concept to recommend interesting Twitter posts to users. Their work took the use of voting a bit further by customizing recommendations for individuals, with the weights of votes being dependent on how close voters were to the respective user in her Twitter network.

In a study of users' explicit voting on UCC, Lampe and colleagues (2007) examined the Slashdot community, where users post textual comments (e.g., in response to articles), which others can rate on a scale from −1 to 5. The ratings are used to determine the display of comments. By default, a user will not see comments with low ratings. Of particular relevance to our work is their report that almost half of Slashdot's registered users do not change this default filter. Thus, many users only see what the community has deemed to be of interest.

Amazon's voting mechanism has received attention from researchers across a number of fields (e.g., economics, social computing, computational linguistics). This mechanism collects binary feedback from readers as to whether or not reviews are “helpful;” reviews are then displayed in rank order by helpfulness. This mechanism has become ubiquitous, with a wide range of e-commerce and informational sites emulating Amazon's community. Researchers have examined the benefits and drawbacks of this approach. For instance, Ghose and Ipeirotis (2010) noted that it takes substantial time to accumulate sufficient votes for review rankings to be meaningful. This is similar to the problem of “cold starts” in recommendation systems (Schein et al., 2002). However, Otterbacher (2009), who computed review quality scores based on a user-centric framework (Wang & Strong, 1996), reported that after a good number of votes is collected, helpfulness ratings correlate well to quality scores.

Researchers have also reported potential biases in the mechanism. Liu and colleagues (2007) warned of an “early bird” bias; reviews posted earlier have more time to collect votes. They also noted a “winners' circle” bias. Since highly rated reviews are displayed first, and since users are unlikely to read (and vote on) many reviews, what is helpful remains so over time. Other researchers found that there is a penalty for reviews that deviate from the average opinion of the product (Danescu-Niculescu-Mizil et al., 2009). More specifically, when a reviewer's rating of a product is below the average over all users, the associated review is less likely to be seen as helpful by others.

While researchers have examined voting as a means to organize UCC, what exactly users do with a set of socially ranked UCCs has not received much attention. Studies on search engines (e.g., (Joachims et al., 2007)) have shown that when users are presented with a ranked list of items, they rarely look beyond the top of the list. Similarly, studies of consumers seeking product information have shown that they perform surprisingly shallow searches (Johnson et al., 2004), considering information sources in decreasing order of expected marginal benefit (Hauser & Urban, 1993).

If these findings also apply to the online community context, then we would expect users to simply read the most helpful/useful reviews in the order presented to them. Following earlier findings that users rarely change default social voting filters (Lampe et al., 2007), we are also interested to see how often users change the default displays when searching for information at Amazon and IMDb, or if they tend to rely on others' judgments of which reviews they should read.

Helpfulness, Quality and Credibility

Another concern researchers express is the interpretation of the “helpfulness” construct. In an experiment during which four subjects rated a set of Amazon reviews for helpfulness, researchers found that between-subject agreement was very high (Liu et al., 2007). However, when they compared their subjects' ratings to those of Amazon users, there was very low agreement. An important difference between that study and ours is that the researchers provided explicit instructions, defining what constituted a helpful review. To contrast, most sites do not provide such guidelines, leaving users to interpret “helpful” and “useful” for themselves. In fact, in the Amazon customer community, there are many active discussions surrounding the meaning of the “helpfulness” construct.11

Rieh (2002) suggests that providing clues about information's usefulness could benefit users because they often use usefulness as a way to judge information quality; without clear indications of usefulness, users are left to guess which content they should access. The same study also found that users relied more heavily on characteristics of information objects (e.g., content) than on characteristics of information sources (e.g., type of source, source reputation) when making evaluative judgments.

Other research on how users make credibility judgments suggests that decisions about credibility should not be left to the user because a sizeable set of Internet users fails to verify online information or information sources even though they are skeptical of both content and sources (Flanagin & Metzger, 2007; Metzger, 2007). For instance, even when information about an information source, such as author's name, qualifications, and credentials, is available, users rarely access the information (Flanagin & Metzger, 2000). Users exert the least effort toward verification and credibility evaluation when seeking commercial and entertainment information (Flanagin & Metzger, 2000; Rieh & Hilligoss, 2007).

Our research framework is discussed in detail below. Generally, our questions fall into three categories:

  • Interpretation of social voting: What do users consider to be “helpful” or “useful” reviews?

  • Consensus: Do users agree which reviews are “helpful” or “useful” and which are not?

  • Credibility: What information do users leverage in determining the credibility of reviews?

USER STUDY

Study Context

During the spring of 2010, we conducted an observational study in which users from a private university in the Midwest researched items at Amazon and IMDb. Figure 1 shows an example review for the movie Vertigo at IMDb. Figure 2 shows a review at Amazon for the Panasonic Lumix DMC-ZS3 digital camera. Table 1 summarizes the sorting options and information provided at each site.

Figure 1.

IMDb review page for Vertigo

Figure 2.

Amazon review page for Panasonic Lumix DMC-ZS3

Participants

Participants were recruited via the university's online newspaper. A total of 32 students participated; the first four in our pilot, and the remaining 28 in the test study. All received a $20 gift card. We used a pre-task questionnaire to gather general information about participants, and that information is summarized in Table 2. The items in the fifth through last rows were presented as five-point Likert items. Since tasks involved searching for information about a Panasonic camera and a Hitchcock movie, Vertigo, we asked participants if they had ever owned a Panasonic product (16 of 27) and if they had seen Vertigo (1 of 27). In addition, because participants needed to interpret textual reviews and not all were native English speakers, we asked them about their comfort in reading and writing in English. Over all, they are very proficient, with only one answering less than 4 on these items. In general, they have rather technical backgrounds, and are experienced in using online communities and consumer reviews. Furthermore, overall, they are relatively neutral toward Panasonic products and Hitchcock films.

Table 1. Information provided at each review site
 AmazonIMDB
Sorting Option
Best/Most HelpfulXX
ChronologicalXX
Prolific Authors X
Loved it/Most favorableXX
Hated it/CriticalXX
Main Review Pages: Content
Full text of user-contributed reviewsXX
Average user rating of product/movieX 
Number of users who found each review helpful/usefulXX
Date reviewed was submittedXX
Comments on reviewsX 
Number of reviews displayed per page1010
Main Review Pages: Source
UsernameXX
LocationXX
Profile Pages
Helpful votes receivedXX
Reviewer rankX 
Contact infoXX
BirthdayX 
List of contributionsXX
PhotoX 
Personal introX 
Last active date X

Instructions and tasks

While there are benefits to observing users engaged in an information-gathering problem of their own choosing, it is difficult to compare across users when few factors are controlled (Tombros et al., 2005). Since we wanted to form insights about what social cues are used in online communities in general, we conducted our study in the lab and presented participants with two simulated search tasks, scenarios that users would be likely to encounter in their own lives (Borlund, 2000). The instructions participants received for each task are shown in Table 3.

After reading the description, participants were directed to the relevant review forum at Amazon or IMDb. To prepare the requested report, they used an online form that requested a textual description of “what people are saying in general” about the camera/movie. They were also asked to keep a list of five reviews that they thought would be “helpful” (Amazon) or “useful” (IMDb) for their friend, as well as a list of up to five reviews they encountered that would not be helpful/useful.

Table 2. Summary of pre-task questionnaire responses
Gender16 men; 11 women
AgeMean: 23.4; Median: 23
Field of study11 engineering; 5 CS/IT; 3natural sciences; 2 finance; 2architecture; 2 social sciences;1 math; 1 undecided
 MeanMedian
It's easy to read in English.4.65
It's easy to write in English.4.635
It's easy to find information I need on the Web.4.525
I am knowledgeable about information search and retrieval.4.194
I frequently visit online communities, where people may read information posted by others and respond.4.074
I frequently read reviews of products and services written by consumers.4.114
I frequently read reviews of products and services written by experts.3.884
I often write reviews of products and services that I have experienced.2.552
I am knowledgeable about digital cameras.3.814
I am knowledgeable about movies and cinema.3.704
I consider Panasonic to be a good brand.3.634
I like Hitchcock movies.3.153
Table 3. Simulated search task instructions
Amazon
A friend of yours, who is not so confident at searching for information online, has asked for your help in collecting information about a product of interest to him. Specifically, he is considering purchasing a digital camera with an optical zoom. As he received a gift certificate for Amazon.com, he plans to purchase a camera available there. One candidate model in his price range is the Panasonic DMC ZS3. He has asked you to help him by learning what other consumers think in general about this camera, using the relevant product review forum at Amazon.com.
IMDb
A friend of yours, who is not so confident at searching for information online, is a movie buff and in particular, loves Alfred Hitchcock thrillers. She has asked for your help in organizing a Hitchcock film festival at your university. As a first step, she has put together a list of candidate films to be considered. Vertigo is on her list and she has asked you to find out, in general, what people have said about it on the Internet Movie Database (IMDb).

Once they created and submitted their reports, the participants were directed to a page that redisplayed the reviews from both lists. They were then asked to rate each review on the eight attributes detailed in Table 4. Finally, participants were invited to provide us with other thoughts they had about social voting mechanisms, and what characteristics make reviews more or less helpful/useful.

Data collection and coding

We collected users' responses to the pre-study questionnaire, screen and audio captures of their sessions, their reports, and their ratings of each of the reviews.

We encouraged participants to “think aloud” as they worked, so that we could get insight as to their thoughts as they considered the reviews and as to why they viewed what they did (Lewis & Rieman, 2004). Our web-based forms captured the participants' reports, lists of reviews and their respective ratings.

Two authors reviewed the screen captures from the four pilots in order to develop a coding strategy. We reviewed each screen capture, noting what the participant viewed at each point in time (e.g., a review, profile, comment, item description) and moves that she made (e.g., advance to next page, change review filter). Next, authors coded the screen captures for the first five test participants. We agreed on the number of pages viewed as well as the number of sorting mechanisms used. However, there were discrepancies as to the number of reviews participants considered. We therefore refined our criteria. Specifically, a review was recorded if the participant 1) traced over it with the mouse, 2) verbalized an interest in it or 3) lingered on the review for three or more seconds. With the new criteria, we achieved agreement on 8/10 screen captures; the remaining discrepancies between the review counts were not more than two reviews. Using these new criteria, the remaining screen captures were coded by one researcher each.

In addition to data from our participants, we also recorded information about the two review forums. At Amazon, there were 653 customer reviews displayed across 66 pages for the Panasonic Lumix DMC-ZS3 camera. For Vertigo, there were 515 reviews across 52 pages at IMDb. For each review, we recorded its ranking by social vote (i.e., under the default helpful/useful ordering mechanism) and the page number on which it was displayed. Finally, we recorded the raw number of helpful/useful votes and the total number of votes. Our study was conducted over a time span of one month; therefore, the vote counts likely changed slightly. However, as mentioned, in a mature review forum, the default ranking is unlikely to change, particularly at the top of the list (Liu et al., 2007). Indeed, we confirmed that the same reviews remained in the top 30 for both sites during the span of our study.

Table 4. Attributes for rating reviews
AttributeQuestionExplanation
ClarityWas the review easy to read?Attributes associated with intrinsic, contextual and representational quality of information. Previously used by Otterbacher (2009) to quantify quality attributes in Amazon reviews.
OpinionDid the review express a clear opinion? 
QuantityDid the review contain a desirable amount of information? 
UniquenessWas the review different from the others? 
RelevanceDid the review contain information relevant to your need? 
BiasDid the review agree with your own biases about the movie/camera?Danescu-Niculescu-Mizil and colleages (2009) found that how reviews are received at Amazon depends on how much the opinion expressed deviates from the average. We also attempt to measure deviation from the reader's expectation.
ConformityDid the review agree with what others were saying about the movie/camera in general? 
FunWas the review amusing to read?Many sites (e.g., Slashdot, Yelp) ask users to rate UCC on whether it is fun or funny.

RESULTS

We reviewed the reports created by participants for each task and their recorded sessions to ensure that they had followed our instructions and had remained on task. One participant's data was excluded because it was obvious that she had not understood our instructions. Therefore, the analysis is based on 27 test participants. First, we describe the behaviors we observed and then discuss how those behaviors address our research questions.

User behavior

For both sites, users spent about the same amount of time reading reviews22 (t= 0.7097, p = 0.481, n.s.), read nearly the same number of reviews (z = 0.026, p = 0.979, n.s.), and viewed the same number of pages (z = 1.251, p = 0.211, n.s.). However, a Wilcoxon test revealed a tendency for participants to use fewer sorting mechanisms at IMDb (z = 1.828, p = 0.068, n.s.). We also computed the proportion of users who changed this default. While 70% changed the “helpful” setting at Amazon (n =19), less than half changed the “best” setting at IMDb (n =12). Finally, although users visited three to four pages of reviews displaying 10 reviews each, the median number of reviews read was only 22.

Interpretation of Helpfulness/Usefulness

We now analyze participants' ratings of their selected reviews. First, we note that there were no differences between Amazon and IMDb reviews as rated on the eight attributes. Specifically, a Wilcoxon test comparing Amazon and IMDb reviews on participants' ratings was not significant for any of the eight dimensions. This was true when we compared both helpful/useful reviews and those that were viewed as not being helpful/useful. Therefore, we do not have reason to believe that participants interpret the “helpful” and “useful” constructs differently, and so we analyze the Amazon and IMDb reviews together as just two datasets (i.e., all “helpful” and “not helpful” reviews).

We consider responses on Likert items to be ordinal rather than interval data, following Jamieson (2004), and thus, analyze responses using non-parametric methods. However, we do report the mean and median responses on each dimension in Table 5, in order to show there were no cases of very skewed distributions. To determine if there is agreement between participants on the eight dimensions, we used the Average Deviation Index (ADI), with respect to median responses (Burke & Dunlap, 2002). As shown in Table 5, nearly all dimensions showed significant agreement. For a sample size of 25, using 5 categories, the ADI should be less than 0.92, using a significance level of 0.05 (Burke & Dunlap, 2002).

Table 6. Users' ratings of their selected reviews
 HelpfulNot helpful
Mean/MedADIMean/MedADI 
  • *

    Note. *p < .05

Clarity4.16 / 40.51*3.39/40.98
Relevance4.22/40.50*2.29/20.77*
Quantity4.12/40.52*2.20/20.74*
Fun3.37/30.65*2.30/20.69*
Opinion4.17/40.53*3.21/31.03
Bias3.48/30.68*2.68/30.72*
Conformity3.99/40.47*2.84/31.02
Uniqueness3.23/32.992.97/30.92

Helpful Reviews

Participants agreed that helpful reviews were clearly written, were relevant to their information needs, contained an appropriate amount of information, and conformed to the general opinion of the item reviewed. Participants were neutral as to whether or not helpful reviews were fun to read, or if helpful reviews agreed with their own biases about the item of interest.

Unhelpful Reviews

Participants disagreed that unhelpful reviews were relevant to their needs, contained an appropriate amount of information, or were fun to read. They were neutral as to whether unhelpful reviews agreed with their own biases.

Agreement between participants

Figure 3 and Figure 4 show histograms of the number of participants listing a review as helpful/useful or not. In both cases, the majority of reviews has been listed by only one participant. One helpful Amazon review was chosen by 22 participants, and one IMDb review was chosen by 16. It comes as no surprise that the former was the “most helpful” Amazon review, and the latter the “most useful” IMDb review. Each appeared first in its respective forum and was therefore readily visible by readers.

In contrast, of the not helpful/useful reviews, two IMDb reviews were listed by eight participants. These were also front-page reviews (the 4th and 7th most useful according to the IMDb user population). In general, we observed that there was more overlap on participants' “helpful/useful” lists then on their “not helpful/useful” lists.

To examine agreement between our participants and the Amazon/IMDb communities, we found the correlation between the number of participants listing a review as helpful/useful, and the number of people who had voted favorably for it at Amazon or IMDb. The Pearson correlation coefficient for Amazon is 0.8293 (n = 39 unique reviews listed) and 0.7938 (n=44) for IMDb (p < 0.001 in both cases).

Figure 3.

# Reviews listed as helpful by x participants.

Figure 4.

# Reviews listed as not helpful by x participants.

To measure the strength of correlation between helpful/useful status and the default display order, we ranked reviews by the number of participants who had listed them as helpful/useful. We then compared this to the ranking given by the default ordering mechanism at the respective site. We also ranked reviews by page number (i.e., considering all reviews displayed on a given page as being of the same rank). We compared the rankings using Kendall's Tau (Siegel & Castellan, 1988), which ranges from −1 (perfect inverse correlation) to +1 (perfect correlation), adjusting appropriately for ties.

As shown in Table 6, the rankings of the “most” and “least helpful/useful reviews” are positively correlated to both default ordering and the ranking based on display page (p < 0.001 in all cases.) While the positive correlation between the least helpful/useful reviews and display order and page might initially seem counterintuitive, this gives us insight into what participants attended to.

Table 6. Rank correlation between our participants' and Amazon/IMDb rankings
 Display orderPage #
Amazon Helpful0.46690.4332
IMDb Useful0.49680.6015
Amazon Not0.31840.3233
IMDb Not0.35200.3179

Verification of reviewer credibility

Only three of 27 users viewed authors' profiles. Combined, these three users viewed a total of only five profiles. Even those who read profiles did not make attempts to verify the author's qualifications or credentials (e.g., votes collected over all contributions; a match in reviewer's interests as expressed in a self-description to the item reviewed).

During the Amazon task, 44% of participants viewed the textual comments on reviews at least once. In addition to voting as to whether or not a review is helpful, textual comments give users the chance to express their agreement or dissent with a reviewer's opinion, or simply to discuss the subject further. The behavior of viewing such comments can be characterized as an attempt to validate the content of the reviews – commenters' agreement, for instance, would indicate support for the authors and content.

Summary

In summary, we observed that users did not often adjust the default filters set by IMDb and Amazon. In addition, users did not take advantage of additional information about review authors in evaluating reviews. Instead, they relied to a great extent on the helpfulness judgments provided via the social filtering mechanisms of the respective community.

Users generally agreed on what reviews and what properties of reviews were helpful. To contrast, they showed little agreement about what reviews were not helpful. At the same time, they did agree on three properties of reviews that are not helpful (i.e., they are not fun, do not provide an appropriate amount of information, nor do they provide relevant information.)

DISCUSSION

A growing literature considers issues related to the “wisdom of the crowds” expressed online through UCC. Many have expressed optimism at the potential of “Web 2.0” to transform the way we communicate, giving ordinary citizens a voice (e.g., (Reich, 2008)). However, when signal-to-noise ratios become small, it is difficult for users to put UCC to good use (Gu et al., 2007). Using social voting to organize UCC is an attractive solution for many reasons. In particular, it is easy and fast to collect users' feedback as a simple vote, and clues about usefulness are likely to benefit users.

Our examination of the use of voting to organize reviews at Amazon and IMDb suggests that there is consensus as to what makes a review “helpful” or “useful,” even when these terms are not defined. In addition, we see evidence that people find many of the same reviews helpful, particularly those prominently displayed at their respective forums. This is encouraging, given concerns as to the validity of the “helpfulness” construct.

However, we also see negative implications from the use of social voting. Even before Web 2.0, researchers observed that the use of information sources followed a power law (Adamic & Huberman, 2001); a few sources received most of users' attention, while most sources received little attention. This is natural, given that users have limited cognitive resources, and often satisfice rather than satisfy their information needs (Warwick et al., 2009), flocking to popular sources, which are easy to find.

In our study, users primarily selected front-page reviews in preparing their reports. In particular, of the reviews selected by participants as being helpful for a friend interested in the item, 66.4% were displayed on the first page of their respective forums, and 82.4% were displayed on the first three pages. This behavior is not surprising, as users mainly attend to items on the first page of a ranked list (Joachims et al., 2007). However, this behavior has implications for which voices are heard – it contributes to the persistence of biases such as the “winners' circle” (Liu et al., 2007). When users stay on the first page, they are less likely to find interesting reviews serendipitously (André et al., 2009). They will also be exposed only to the majority opinion, since reviews deviating from average are not often voted as helpful (Danescu-Niculescu-Mizil et al., 2009).

Our findings relate to those of Pan and colleagues (2007), who found that users place great trust in the relevance rankings of the Google search engine. They presented users with lists of manipulated search results and asked them to rate the documents as to their relevance to the query. They found that users not only read results in the order presented to them, they also rated them in the presentation order.

We also found evidence that users trust the judgments of others. For example, we found it surprising that participants did not often change the default ordering at IMDb, or use reviewer profiles. Movies are hedonic goods, and factors such as a consumer's age and gender are known to affect preferences for such items (Holbrook & Schindler, 1994). Therefore, we would expect reviewer identity to matter. In fact, IMDb provides a gender filter, but it was not used by any of our participants. Despite the availability of information about whether or not reviews were current (e.g., date submitted), whether the community found the reviewers' aggregate contributions helpful (i.e., reviewers' helpfulness rating), users did not appear to use this information when evaluating reviewers' contributions. The only information we observed users leveraging in evaluating sources of reviews were comments on the Amazon reviews, and even then only 44% of participants did this.

CONCLUSION AND FUTURE WORK

One limitation of our current approach is that our data about where users were looking or what they were reading had limited precision. Since multiple reviews are listed at the same URL at both Amazon and IMDb, there was no easy way for us to automatically capture the number of reviews read. Therefore, it was necessary for us to develop very specific coding criteria in order to achieve good agreement. Data from an eye-tracking device would be useful in helping us make more precise measurements of what content users consume on each page.

Finally, there is much information in our data that has not yet been explored. It would be interesting to quantify the extent to which participants read UCC in a serial manner, and to identify what causes them to break this pattern. Likewise, we could also consider time spent searching using each mechanism, and in what order. This might provide insight as to what specific content proves most fruitful to users (Hauser & Urban, 1993).

This study generated many questions as to the role, benefits and actual use of social navigation features in large online communities. For instance, while the use of participant profiles has been shown to promote the contribution of quality information (Ma & Agarwal, 2007), our study revealed that users don't often access this information when reading reviews. Open questions about the role of profiles in communities include (1) does profile use depend on the extent to which users have one-on-one interaction with one another, and (2) do such features promote trust between users, just by virtue of being there, even if they are not often used?

We also demonstrated that users rarely adjust the default sorting mechanisms of reviews, and future research should explore whether they could be persuaded to use multiple displays of the UCC in a large community. For example, if they knew of the possible biases that often plague social voting mechanisms (e.g., early bird, winners' circle), would users be more likely to experiment with the available sorting mechanisms?

Many questions remain about the incentives for users to seek and access different voices in the community, as well as for the communities to encourage users to be exposed to a diverse set of views. Are there ways to reap the benefits of social voting, but at the same time encourage users to look beyond the first pages of the most helpful UCC? One possibility is to consider a “follow the reader” approach (Lampe et al., 2007), developing profiles based on users who change default settings, in order to help those who are less skilled searchers to discover and use different sorting mechanisms. Another idea is to educate users about how UCC is organized. In contrast to information-oriented sites such as Slashdot, commerce-related communities such as Amazon and IMDb do not typically disclose their algorithms. Therefore, just as Pan and colleagues (2007) asked of Google, we also argue that such companies have a responsibility to explain their methods and possible biases.

Site designers put forth considerable effort in developing features that promote trust in communities (e.g., user profiles), and that reduce information overload (e.g., filters using review and reviewer characteristics). However, rather than fully exploiting such features, users often rely on judgments of others as to which UCC to read. Participants also generally behave as though what is helpful to others is also helpful to them. Additional research will help identify ways to more effectively leverage UCC's properties to ensure users can easily access credible, useful, and diverse information.

Acknowledgements

We thank the students who participants in our pilot and test studies. We also acknowledge the assistance of Tim Hayes with data collection.

Footnotes

  1. 1

    See for example: http://www.amazon.com/gp/forum/cd/discussion.html/ref=cm_cd_search_res_ti?ie=UTF8&cdMsgNo=1&cdPage=1&cdSort=oldest&cdThread=TxEAUR6WMD4EG0&cdMsgID=Mx2QF7YPNZ81CAD#Mx2QF7YPNZ81CAD

  2. 2

    Throughout, we use the t test to evaluate differences for variables that are approximately normally distributed, and the non-parametric Wilcoxon test in all other cases.

Ancillary