Predicting Continued Participation in Newsgroups


  • Elisabeth Joyce,

    1. Edinboro University of Pennsylvania
    Search for more papers by this author
    • Elisabeth Joyce is an associate professor in the Department of English and Theatre Arts at Edinboro University of Pennsylvania. She received her Ph.D. in modern American poetry from Temple University in 1991. She has a book from Bucknell University Press, Cultural Critique and Abstraction: Marianne Moore and the Avant-Garde (1999), and she is currently working on a book-length project on poetry and space and Susan Howe’s work. She is also working on developing ways to use machine learning to carry out conversation analysis and assess group behaviors in online communities.

      Address: Department of English and Theatre Arts, Edinboro University of Pennsylvania, 500 Meadville St., Edinboro, PA 16444 USA

  • Robert E. Kraut

    1. Human-Computer Interaction Institute
      Carnegie Mellon University
    Search for more papers by this author
    • Robert Kraut is Herbert A. Simon Professor of Human-Computer Interaction at Carnegie Mellon University. He received his Ph.D. in Social Psychology from Yale University in 1973 and has previously taught at the University of Pennsylvania and Cornell University. He was a research scientist at AT&T Bell Laboratories and Bell Communications Research for twelve years. Dr. Kraut has broad interests in the design and social impact of computing and conducts research on everyday use of the Internet, technology and conversation, collaboration in small groups, computing in organizations, and contributions to online communities. His most recent work examines factors influencing the success of online communities and ways to apply psychology theory to their design. More information is available at

      Address: Human-Computer Interaction Institute, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213 USA


Turnover in online communities is very high, with most people who initially post a message to an online community never contributing again. In this paper, we test whether the responses that newcomers receive to their first posts influence the extent to which they continue to participate. The data come from initial posts made by 2,777 newcomers to six public newsgroups. We coded the content and valence of the initial post and its first response, if it received one, to see if these factors influenced newcomers’ likelihood of posting again. Approximately 61% of newcomers received a reply to their initial post, and those who got a reply were 12% more likely to post to the community again; their probability of posting again increased from 44% to 56%. They were more likely to receive a response if they asked a question or wrote a longer post. Surprisingly, the quality of the response they received—its emotional tone and whether it answered a newcomer’s question—did not influence the likelihood of the newcomer’s posting again.


One of the most visible uses of the Internet is to support online groups or communities. These are collections of individuals, typically with a common interest, whose primary method of communicating is exchanging text messages over the Internet. By many criteria, online communities are widely successful. They allow people to exchange information on a wide variety of technical, professional, health-related, and recreational topics, provide members with social support and a site to form friendships, and create a source of entertainment and distraction (Ridings & Gefen, 2004). In the United States over 50% of all Internet users regularly stay in contact with an online group. Usenet, perhaps the oldest collection of online communities, continues to grow; in 2004 it had over 190,000 public groups containing over 250 million messages from over nine million unique participants (Smith, 2004). Newer offerings, including Google Groups and Yahoo! Groups, each with hundreds of thousands of groups, support both private groups and newer user interfaces.

Despite this success, building and maintaining online communities is difficult. Participation is often sparse and uneven. For instance, Butler (1999, reported in Cummings, Butler, & Kraut, 2002) examined a random sample of listserv-based online groups. One-third of all lists had no communication during a three-month observation period, and, among those that were active, traffic was low, with the median list having a message every 3.6 days. In addition, among lists with traffic, participation was very unevenly distributed among subscribers. The median list had only 15% of subscribers who contributed even a single message during a three-month period. In addition, churn in membership is high, with most people who initially contribute to an online community never contributing again. For example, Jones, Ravid, and Rafaeli (2004) examined a sample of 578 Usenet newsgroups active between August and December 1999 and found that only 11.5% of people who posted in one month returned to post in a second month.

One can decompose the problem of participation in and contribution to groups into two components, which are probably controlled by different mechanisms: (1) motivation to join an online group and (2) motivation to continue participation after joining. Moreland and Levine’s (2001) theory of commitment and socialization to groups, for example, distinguishes between an investigative phase, in which groups recruit potential members and individuals reconnoiter possible groups, and a socialization phase, in which individuals and groups become committed to each other. The investigation phase, in part, involves groups marketing themselves and potential members choosing among the groups based on attributes visible to an outsider. The socialization phase, however, is more personal and is based on transactions between the individual and the group. In this phase both the individual and the group attempt to change one another in ways that might make their relationship more rewarding. Whether an individual eventually becomes committed to the group is likely to depend on the type of contributions the individual makes to the group and the responses the person gets after contributing.

In the case of most online groups, participation typically consists of reading and posting messages. Reading other people’s posts is itself a valuable activity in online groups, for without an audience, what is the value of posting (Nonnecke & Preece, 2000)? However, for many other online groups, including the Usenet groups we study in this article, participation is visible only through posts and other explicit communication. Active participants in an online group cannot directly see “lurkers,” those members of a group who read messages but do not post any themselves, and can only estimate their numbers by extrapolating from the numbers of people who do post and by noting the declarations from ex-lurkers that they are “delurking” when they finally reveal themselves (for more on lurking and delurking behaviors, see Rafaeli, Ravid, & Soroka, 2004).

Factors in Group Commitment

Our goal in this paper is to examine factors that will cause a person who initially posts to an online group to contribute to it again. For groups to be successful they must recruit and retain enough new members to sustain themselves over time (McGrath, 1984). As Lampe and Johnston (in press) suggest, first-time participants in online communities are necessary to replenish losses due to natural attrition and to energize the group. However, they are a vulnerable population, as they may not understand the norms of that group and so may not receive acknowledgement from it. Newcomers to a group, posting for the first time, are far less likely to return to it than those who have posted previously (Arguello et al., in press). According to Smith and his colleagues (Brush, Wang, Turner, & Smith, 2005; Smith, 2004), online groups differ substantially in their likelihood of responding to the initial posts of newcomers; technical and illness support groups are among the most welcoming (Fisher, Smith, & Welser, 2006).

Not only must newcomers return so that an online community can sustain itself over time, but their returns also show how well the group is meeting their needs. Moreland and Levine (2001) predict that people will become more committed to a group to the extent that it satisfies their requirements. Thus, when newcomers return to a group by posting again, this action represents a minimal expression of commitment. Drawing from Moreland and Levine’s model, we examine the nature of the transactions between a contributor and the group that increase subsequent participation. In particular, we look at the nature of the communication exchanged and whether an exchange occurs at all. We therefore also examine features of the initial post and features of any response to it.

Interaction as a Basis of Commitment

The Response and Its Characteristics

It is likely that newcomers’ interaction with a group—their posting a message and others responding to it—will be a first step in their commitment to the group. There are at least three routes through which newcomers’ interaction with a group may increase their commitment to it. The first is described by a reinforcement model which predicts that people repeat actions that lead to positive reinforcements. In the case of conversations, getting a response, getting a positive response, and getting a response that fulfills some explicit need are all reinforcing events. Therefore, people will be more likely to continue their participation in an online community if they receive a response and if the response is a positive one. This type of model emerges from Skinner’s findings that we are conditioned to respond to messages to us (Ferster & Skinner, 1957). For example, almost fifty years ago, Verplanck (1955) showed that in conversation speakers are more likely to offer opinions the more their conversational partners agree with them.

A reciprocity model makes similar predictions. According to Gouldner (1960), if someone does something for us, we will do something for him or her. This can be generalized to the level of the group, so that if one group member performs a favor for us, we are likely to return a favor to the group. An essential point in this model is that reciprocal exchange does not have to be balanced exactly; unequal exchange can set up obligation to the group, thus ensuring stronger group cohesion.

In addition to these two routes, individuals may become committed to a group through the formation of personal bonds with group members developed through interacting with them. Conversation is a major mechanism through which friendship and other social relationships are enacted (Duck, 1998). In fact, conversations are both the causes and consequences of successful interpersonal relationships (Duck, Rutt, Hurst, & Strejc, 1991). Sassenberg (2002) notes that attachment to particular others in a group can lead to commitment to the group as a whole and that in online groups this attachment is created through communication exchanges.

All three models predict that getting a response will increase a poster’s likelihood of posting to the group in the future, and that this effect will be larger if the response is a positive one. In addition, the reinforcement and reciprocity models suggest that this effect will be especially powerful if the initial post asks a question and the response provides an answer. The answer is reinforcing and sets up obligations of reciprocity.

We examine these ideas by looking at the interactions occurring when a potential member of an online group first posts. We believe that a person’s first participation in the group is key to predicting continued participation, largely because the majority of active participants of an online group post only once if they post at all (Butler, 1999; Nonnecke & Preece, 2000). Understanding the impact of the poster’s contribution to the group and the quality of a response to it may clarify that poster’s motives for further participation or continued nonparticipation or minimal participation.

H1: Receiving a response to an initial post will increase the likelihood that the poster will post again.

Moreover, the nature of the response—whether it offers useful advice, whether it is positive in tone, and whether it agrees with the initial post—may also be important in encouraging continued participation. Patterson (1994) and Davis and Holtgraves (1984) argued that the quality of a response contributes to the quality of an interaction. A response that addresses the participant’s concerns, that agrees with the initial post, or that expresses positive emotion can make a conversation continue, while a negative response or one that disagrees with the poster’s position may end the conversation. In addition, good responses can clarify the current topic of conversation and enhance the likelihood that a positive relation might develop (Reis & Patrick, 1996). The response to a posting can, therefore, control the extent of future participation, by cultivating that first experience, by requesting clarification and thus developing the conversation, or by undermining the conversation by responding to that first post negatively.

Although there are many ways for a member of an online community to respond positively to an initial post, we consider three here. A response to a post can be instrumental. It can provide useful information, which is one of the main benefits that members derive from the communication exchanged in online groups. Participants in online support groups, for example, report that these groups provide useful information about diseases and their treatment, practical advice about how to cope with the disease, and emotional support (Rodgers & Chen, 2005). In technical and professional groups, participants exchange useful information about software, statistics, technical writing, and a host of other professional topics. Even in social and hobby groups, useful information is often a benefit of participating. This information can be highly related to the focus of the group (e.g., patterns, knitting advice, and information about turning a hobby into a business among a group of knitters) or more peripheral (e.g., advice about how to care for a sick pet among a chat group of older women [Kraut, Scherlis, Mukhopadhyay, Manning, & Kiesler, 1996]). Even though unsolicited information is often valuable, the information will be especially valuable for a new member of an online group if it answers an explicit question raised in an initial post.

H2: An initial post that receives a response that provides information rather than asks a question will increase the likelihood that the poster will post again.

H3: An informative response will be more likely to encourage someone to post again if the response is a direct answer to an initial question.

Another type of positive response is the emotional tone associated with the reply. It is likely that new members of a community will feel welcomed if they receive a welcoming reply with a friendly and upbeat tone, but be put off if they receive a hostile reply with a negative tone. Flames or passionate, negative responses occur in online groups (Sproull & Kiesler, 1991). Herring (1992, 1993) reported that women are particularly discouraged from participating by hostile messages. This would predict a relationship between negative emotion in the reply and subsequent (non-) posting for women, more so than for men. Many new participants in online groups report that they receive welcoming and solicitous responses from more established members. Others, however, are criticized for offering posts that are off topic, ignorant, or redundant. For example, the acronym RTFM1 was coined for newcomers who waste the time of more experienced participants by asking questions that could be answered by reading a manual or list of frequently asked questions (FAQs).

H4: An initial post that receives a response that has a positive emotional tone rather than a negative one will increase the likelihood that a newcomer will post again.

The extent to which a response agrees or disagrees with an initial post may also change its reinforcement value. Agreement can lead new posters to contribute more to a group through a reinforcement process (Verplank, 1955) and may encourage their continued participation by causing them to like members of the group more and to feel that they have found acceptance in a group of similar others (Byrne & Griffitt, 1966).

H5: An initial post that receives a response that affirms what the poster said rather than disagrees with it will increase the likelihood that a newcomer will post again.

Finally, for reasons of reciprocity, we would expect that a longer reply would encourage a newcomer to post again because it shows greater investment of time and energy on the part of the replier and may be interpreted as a display of concern or of more general interest, leading the recipient to continue to participate in the community in order to reciprocate this effort.

H6: Receiving a longer reply to an initial post rather than a shorter one will increase the likelihood that a newcomer will post again.

Eliciting a Response

Characteristics of the initial post should influence whether it receives a reply and what the quality of the reply will be, if one is received. Because these characteristics themselves may predict whether the poster will post again, they must be controlled to estimate the influence of a reply on continued participation in an online group. In particular, the length of an initial message, its emotional tone, and whether it is a question or statement may influence whether anyone replies and will likely influence the nature of the replies, if any. Responders are more likely to offer information and advice when confronted with an initial question (e.g., Sacks, Schegloff, & Jefferson, 1978). For reasons of both reciprocity and modeling, repliers may mimic the tone, form, or style of the initial post, for example responding more negatively to negative posts (e.g., Hatfield, Cacioppo, & Rapson, 1993) or offering longer replies to long posts.


The goal of this research is to assess whether the existence of responses to a newcomer’s initial post to an online group and the characteristics of these responses will cause the newcomer to continue participating in the group by posting again. In addition, we include exploratory analyses to identify characteristics of the initial post that elicit replies from other members of the group and shape the characteristics of these replies. Our data come from six online groups: the Mozilla User Interface newsgroup (netscape.public.mozilla.ui) and five Usenet support,, alt.politics.usa.constitution.gun-rights,, and alt.baldspot. We selected these groups to have variability in the groups’ purposes, topics of conversation, and gender composition.

Mozilla is an open-source Web and email application suite descended from the original Netscape browser (see for more details). The mozilla.ui group includes users of Mozilla, who often report bugs in the user interface and request changes, and developers, who monitor requests and occasionally reply to the users. The mozillia.ui group is an open-source software development community whose members are generally unpaid, contributing ideas and code voluntarily. It is task oriented, but resting in the world between work and pleasure. People contribute in part for extrinsic reasons: the reputation, education, and financial compensation that sometimes results from participating in open-source development communities (Hertel, Niedner, & Herrmann, 2003; Lakhani & Hippel, 2003). They also contribute for intrinsic reasons: pleasure from problem solving, altruism, and commitment to a community (Holohan & Garg, 2005; Rossi, 2004). Mozilla is an active news group with over 12,000 posts from March 10, 1998, to April 8, 2001. Like many open-source communities, Mozilla is primarily male.

To complement the primarily male and task-based nature of the Mozilla group, we collected data about three support groups and two interest groups. Support groups focus on a central, often emotionally laden, topic of concern for its members, one which is often central to their definitions of themselves. Herring (2004) identifies these types of communities as formed around a “shared purpose” that we see as setting the members apart from other people. The baldspot group primarily consists of men who share the dilemma of hair loss. The diet support group contains mostly women who are overweight and share the goal of weight loss. The breast cancer group includes women with breast cancer and their caregivers, who exchange information and social support about this life-threatening disease (Rodgers & Chen, 2005).

The interest groups are also composed of participants devoted to a particular topic, in this case, gun rights and the New York Rangers hockey team. Compared to the support groups, these interest groups are discussing topics less central to the members’ identities. In contrast to both the Mozilla and support groups, the communication exchanged is less likely to contain instrumental solutions to real-world problems, and more likely to contain exchanges of opinions.

Like many online communication forums, these groups construct their conversations in reply structures which the Internet technology makes visible in the form of threads, so that an initial post by an individual can be followed by zero, one, or more replies. Generally, the software that people use to read these conversations indicates that one message is a reply to another by placing a “Re:” in the subject field of the message. Reader-software generally also allows people to view the messages in chronological order, with messages from multiple threads interspersed, or organized by thread, making it possible to view all messages from a single thread together. For example, Google groups ( represents threads visually so that the indentation of a message beneath another identifies it as a response to the first one.


We collected six months of data for each group. We divided the data collection period into two phases. During the first three months of each data collection period, we identified the first post for all new posters (i.e., user-ids who had not posted to this group in the three-month period before the start of data collection). We then followed these individuals for the remainder of the data collection period, from day one to six months, to identify which of them posted again. The data from Mozilla started in January 2000, the diet and baldness data started in April 2002, and the Rangers, breast cancer and gun rights data started in September 2001.

By limiting the data collection to six months, we ran the risk of missing people whose second post occurred more than three to six months after the first. This risk is low, however. According to our analysis of four years of posts from the Mozilla User Interface group, only 4.6% of all posters post for longer than a three-month period.

The data consist of 2,777 records, representing information 7 first post of 1,120 unique individuals in the Mozilla group, 426 individuals in the baldness group, 541 in the diet group, 164 in the breast cancer group, 224 in the gun rights group, and 302 in the Rangers group. This information includes measures of their subsequent participation in the group, attributes of their first post, whether anyone replied to this post, and attributes of the reply, if one was made.

Our goal was to identify a participant’s first post to the newsgroup, the reaction to that post, and the poster’s subsequent participation in the newsgroup. The following data were gathered from each user in the sample: date of their first post; number of responses to their first post; number of posts they sent in the six months following their initial post; number of words in their first post and the total number of words they contributed to the newsgroup in their first three months of activity; date of their last post; number of posts they sent over the entire data set; and number of days from their first posting to their last one. We used a program to harvest data from UseNet and Google Groups and supplemented it with data from the Netscan project at Microsoft (Smith, 2004).

Dependent Variable

Post-Again. Post-again is a binary variable indicating whether newcomers made a second post during the data collection period. It is coded 0 if they made only an initial post and 1 if they posted at least once more after their first post. Of those making an initial post, 44% (1222) made at least one additional post, as shown in Table 1. We also examined the number of posts they made and the duration of their participation (i.e., the interval between their initial post and their last one). On average, these newcomers, including the one-time posters, posted 7.5 times, with a median of 1. As the discrepancy between mean and median shows, this distribution was highly skewed. Newcomers’ mean survival in the group (i.e., the interval between their first and last post) was 45 days, with a median of 0 days. Again, the discrepancy between the median and mean shows that most people did not post a second time, and if they did, they did so within 24 hours. However, 25% of the sample posted again in the group at least 11 days after their initial post and 10% of the sample posted again at least 135 days after their first post. Figure 1 shows that proportion of initial posters who were still active in the group over the period of data collection.

Table 1.  Descriptive statistics and correlations
 NMeanStandard DeviationInitial postsReplies
Got ReplyPost AgainWord Count (Log)Is QuestionPositive EmotionNegative EmotionAssentNegateReply Word Count (Log)Reply Is QuestionReply Positive EmotionReply Negative EmotionReply Assent
Got Reply2777.61.49 
Post Again2777.44.50.12 
Word Count (Log)27771. 
Positive Emotion27772.313.39.00.00−.01−.01 
Negative Emotion27771.−.03−.04−.05 
Reply Is Question1662.−.10−.−.01−.13 
Reply Positive Emotion15692.525.−.05.09−.03.00−.04−.10.02 
Reply Negative Emotion15691.352.−.03.10−.01.03−.04−.02−.01 
Reply Assent1569.402.97.01.03−.01−.02−.01.01.04−.01−.09−.02.47.06 
Reply Negate15692.−.01.01−.04−.03−.04−.02−.03
Figure 1.

Survival of initial posters.

We conducted preliminary analyses predicting alternate measures of continued participation in the groups. Using the following measures shows quantitatively similar results: (1) the binary variable of whether a newcomer will post again, (2) the continuous variable of numbers of posts (in the log scale), and (3) the continuous variable of the length of time they continued posting in the group. Given the similarity, we report only the results for predicting the binary variable of posting again.

Independent Variables

GotReply. GotReply is a binary variable that indicates whether or not a first post received a direct response. It is coded 0 if the initial post received no response and 1 if the post received a response. If a newcomer’s first post was the initial post in a thread, we coded it as receiving a reply if the post received any reply. If the newcomer’s first post was itself a reply to another message, then we used both the content of the message and the reply structure to assess whether a subsequent post was a reply to the initial message or to the newcomer’s post. Of the 2,613 first-time posters, 60.7% received a reply, as shown in Table 1.

We also measured the number of direct replies received by an initial post, the total number of messages involved in a thread, including replies to replies, and the number of generations involved in the thread. The results described below are qualitatively similar whether we use GotReply as an independent variable or any of these alternative measures of interactivity. Therefore, for simplicity, we report results using only the GotReply variable.

Characteristics of Replies

Length of Message. Length of message is the number of words in a reply. The distribution for length was skewed, as 75% of messages were smaller than 125 words, but 5% of messages were longer than 317 words. To make this variable more normal, we used the log of the number of words in a message in analyses.

Statement. We coded whether a post or its reply was a question or a statement (IsQuestion). A question asked for advice, a suggestion, or an opinion (coded as 1), while a statement offered advice, a suggestion, an opinion, or some kind of orientation (coded as 0). A sample question is, “I wanted to ask everyone if they have ever had any experiences with the two diet aids I’ve been hearing about on the radio all the time—Trim Spa and Body Solutions. I’ve been on Trim Spa for three weeks now and can’t see a damn bit of difference.” A statement would be, “The best advice I can give you is watch portion sizes, exercise, lift weights and read the great posts here. Go to and record what you are eating and you will be surprised at the amount you may still be eating.” Two judges coded a randomly selected 10% of the messages for question/statement three times with discussion in between and achieved 99% agreement on the last round, indicating high reliability in the coding. One of the judges then coded the rest of the messages individually.

Question-Statement Pair. Because a statement may be more valuable to a newcomer if it is a direct response to an explicit question, we created a derived variable for this case. A question-statement pair is coded 1 if the first post asks for information and a reply to it offers information. It is coded 0 for all other combinations of initial posts and replies.

Emotional Tone. We coded the positive and negative emotional tone of the replies using the Linguistic Inquiry and Word Count (LIWC2001) program (Pennebaker, Francis, & Booth, 2001). This is thesaurus-based software that matches individual words in the messages with predefined categories. The approximately 2,300 unique words in its dictionary account for approximately 80% of the words used in a broad sampling of texts written in American English (Pennebaker, Francis, & Booth, 2001). These words have been assigned to categories of affect, spatial and temporal orientation, and “personal concern.”

Pennebaker and Francis (1996) have validated LIWC by having judges rate the dictionaries on which the program is based and by comparing evaluations of texts by human judges with the output of the program. The reliability of ratings using this program (i.e., the correlation between the program’s output and the judgment of human coders) is as high as the reliability between pairs of human judges. Pennebaker and King (1999) also report that the program is sufficiently discerning that it can identify language differences among individuals. In addition, they verify that words for positive and negative emotions, an important factor in this study, are relatively stable across a large number of participants.

We evaluated each message by comparing the text of the messages to the two categories in LIWC representing positive and negative emotions. The Positive Emotion category consists of 261 words, such as happy, pretty, joy, certainty, pride, and win. The Negative Emotion category consists of 345 words, such as hate, worthless, enemy, nervous, afraid, tense, pissed, grief, cry, and sad. The LIWC score for each message is the percentage of the words in each message that fits in each category. Thus, a positive emotional tone of 2.47 for replies means that about two and a half percent of the words in the reply messages came from the Positive Emotion category.

Agreeableness. We coded the extent to which replies agreed or disagreed with the initial post by using the assent and negation categories from the LIWC program (Pennebaker, Francis, & Booth, 2001). The assent category included 18 word stems indicating affirmation or agreement, such as accept, agree, alright, fine, granted, indeed, o.k., and yes. The negation category included 31 word stems indicating negation, dissent, or disagreement, such as cannot, doesn’t, isn’t, never, no, not, nowhere, shouldn’t, wasn’t, weren’t, and without.

Control Variables

Characteristics of the Initial Post. The characteristics of the initial post will likely affect whether it receives a reply and, if so, what the reply’s characteristics will be. Therefore we coded the initial posts in the same way we coded the replies: length of the post (Word Count (logged)), whether the post was a statement or question (IsQuestion), its emotional tone (Positive Emotions and Negative Emotions), and its agreeableness (Assent and Negate).

We also controlled for the groups from which the messages came. Preliminary analyses show that the groups differed on the predictors of this outcome (see Table 4) both in the probability of a newcomer posting again and in the characteristics of the initial posts and their replies. Therefore, we included a group as a dummy variable in all the analyses, using Mozilla as the omitted group.

Table 4.  Predicting characteristics of replies from characteristics of initial posts
Model descript.Reply Word Count (log)1Reply IsQuestion2Reply Postive Emotion1Reply Negative Emotion1Reply Assent1
 CoefficentRobust Std.Err.ppChange in probabilityRobust Std.Err.ppCoefficentRobust Std.Err.ppCoefficentRobust Std.Err.ppCoefficentRobust Std.Err.pp
  • 1

    OLS regression analysis. Coefficient shows predicted change in the continuous dependent variables. The intercept is the expected value of the dependent variable when all binary variables are zero and all continuous variables are at their mean levels.

  • 2

    MLE probit analysis. Coefficient shows predicted change in the probability in the dependent variable being true, given a unit change in a continuous independent variable or a discrete change in a binary independent variable.

Intercept1.348.077***.000 *** 2.580.715*.015.768.462 .157.744.631 .291
Baldspot−.178.024***.001−.004.006 .504−.440.166*.0451.265.057***.000−.153.051*.031
Diet−.085.017**.005−.058.004***.0001.220.116***.0001.040.071***.000.033.039 .430
BreastCancer−.748.053***.000.031.015*.026.002.532 .997.434.124*.017−.250.286 .422
GunRights−.711.038***.000.027.011**.010−.890.344*.049.331.171 .111−.228.265 .430
Rangers−.694.056***.000.064.019***.000−.594.565 .341.388.216 .132−.304.389 .469
Word Count (Log).173.049*.017−.022.013 .089.126.575 .835−.076.171 .676−.123.367 .751
IsQuestion.043.082 .618−.058.024*.025−.720.569 .261.050.172 .783−.098.137 .505
Positive Emotion−.005.001*.012.002.002 .382.126.075 .153−.016.007 .072−.015.005*.042
Negative Emotion.005.008 .588.000.003 .927−.093.039 .062.080.047 .148.009.017 .614
Assent.002.010 .849−.002.005 .609−.114.012***.000−.023.030 .487.120.164 .497
Negate.011.003*.023.000.003 .931−.154.029**.003.057.048 .288−.019.010 .099


Table 1 shows the descriptive statistics and Pearson product-moment correlations for the variables used in this research.

Predicting Getting a Reply

As earlier stated, of the 2,777 initial posts, 61% received at least one reply. Table 2 shows the results of a probit analysis, using the dprobit procedure in Stata, predicting the likelihood that a newcomer would receive a reply to an initial post based on characteristics of that first posting. This procedure estimates the marginal increase in the probability of receiving a reply produced by an infinitesimal change in continuous independent variables (df/dx) and by changing binary independent variables from false to true. Because messages within a group are not independent of each other, we used the clustering option, by group type, to adjust the standard errors.

Table 2.  Predicting probability of getting a reply from characteristics of an initial post
GotReplyChange in probabilityRobust Std.Err. p
Breast Cancer.188.019***.000
Gun Rights.152.033***.000
Word Count (Log).144.043***.001
Positive Emotion.002.003 .636
Negative Emotion.005.008 .532
Assent.005.012 .647
Negate.001.002 .721

The group in which the initial message was posted influenced the likelihood of someone responding to it. Participants in the Mozilla group were significantly less likely to reply than members of any of the other groups in the sample. Reply rates ranged from 55% in the Mozilla group to 74% in the breast cancer group. The average response rate for the non-Mozilla groups was 70%.

Both the length of the initial post and its form influenced whether it received a reply. Increases in the word count of a post were associated with a 14.4% increase in the probability of at least one reader replying to it (p < .001). In addition, questions were 16.4% more likely to receive a reply than were posts that presented information, advice, or opinions (p < .003). Emotional tone and agreeableness, however, were not associated with changes in the probability of receiving a reply.

Predicting Posting Again

Forty-four percent of the newcomers posted at least once more after their initial post. We conducted probit analyses in four stages predicting whether a newcomer would post again. The first stage included control variables, predicting posting again by group type and characteristics of the initial post. The second stage examined whether receiving a reply led to posting again. The third model examined, among the posts that received a reply, whether characteristics of the reply—its length, reply type, emotional tone, and agreeableness—changed the likelihood of posting again. Finally, the fourth model examined whether question-statement pairs increased the likelihood of posting again more than other combinations of initial posts and replies. Again, we report the results in terms of the change in the probability of posting again from base-rate, when all binary variables are false and continuous variables are set to their average values.

We use the Akaike’s information criterion (AIC) and the Bayesian information criterion (BIC) to compare the fit of nested models. Because both the AIC and BIC metrics penalize models with more variables, reductions in them indicate better models as additional variables are introduced.

Model 1 in Table 3 shows that the different groups had different base-rates. In particular, newcomers in the breast cancer group (38.8%) and the gun-rights group (37.8%) had a lower probability of posting again than those in other groups (mean = 47.1%). These figures represent the adjusted means, when the control variable IsQuestion is false and the other control variables are set to their mean levels.

Table 3.  Predicting probability of posting again
Model descript.Post characteristicsAdding Got ReplyAdding Reply CharacteristicsAdding Question-Statement Sequence
PostAgainChange in probabilityRobust Std.Err.ppChange in probabilityRobust Std.Err.ppChange in probabilityRobust Std.Err. pChange in probabilityRobust Std.Err.pp
Baldspot−.017.010 .392−.026.011*.021−.035.006***.000−.035.006***.000
Diet−.002.010 .100−.005.008 .518−.012.009 .168−.012.009 .168
GunRights−.068.018***.000−.117.010***.000−.068.050 .180−.068.050 .180
Rangers−.057.019 .171−.040.013**.002−.081.050 .112−.081.050 .112
Word Count (Log).065.021***.001.046.016**.004.080.017***.000.080.017***.000
IsQuestion−.033.037 .173−.071.033*.035−.017.082 .833−.017.082 .833
Positive Emotion−.001.002 .244−.002.001 .186−.002.001 .279−.002.001 .279
Negative Emotion.007.005 .434.004.006 .522.006.005 .208.006.005 .208
Assent.011.008***.000.012.004**.003.010.010 .329.010.010 .329
Negate.005.002 .358.002.002 .427.006.004 .149.006.004 .149
GotReply .124.025***.000 
Reply Word Count (Log) −.028.041 .506−.028.041 .506
Reply IsQuestion .049.024*.038.049.024*.038
Reply Positive Emotion .000.001 .926.000.001 .926
Reply Negative Emotion −.002.003 .535−.002.003 .535
Reply Assent .011.007 .125.011.007 .125
Reply Negate .005.003 .100.005.003 .100
 −.011.059 .852
N2766 2766 1476 1476 
AIC3762.105 3723.965 2025.024 2025.02 
BIC3791.73 3753.59 2051.51 2051.51 

Newcomers whose initial posts were longer were about 6.5% more likely to post again than those who wrote shorter posts. The length of their initial post may reflect participants’ pre-existing interest in the topic of the group. In addition, those whose posts contained a higher proportion of words indicating agreements were slightly more likely to post again than those with a lower proportion (about 1%).

Model 2 adds the presence of a reply as a predictor of whether a newcomer returns to the group. Adding the variable GotReply to the base model one improved the fit of the model, reducing both the AIC and BIC measures of model fit. The effect of getting a reply was powerful. Newcomers who received a reply were 12.4% more likely to post again than those who did not receive a reply, holding constant differences among groups and characteristics of the initial post that elicited replies. Only 39% of those who failed to receive a reply posted again over the next three months, compared to 51% of those who did receive one.

These predictors of posting again did not differ systematically among the six groups examined in this research. A model including all the interactions of the groups by the independent variables in Model 2 fit the data worse than the Model 2 that we present. (In comparing the two models, the AIC fit statistics is 3546 for the interaction model vs. 3534 for the main-effects only model; the BIC fit statistics is 3795 for the interaction model vs. 3564 for the main-effects only model.)

Model 3 answers the question of whether the nature of the reply changes the probability of posting again among newcomers who received a reply from their initial posts. Adding characteristics of the reply improved the model from a base model (not shown) among the subset of newcomers who received a reply that included only the variables in Model 1, reducing the AIC criteria from 2,282 to 2,028 and the BIC criteria from 2,309 to 2,055.2 In particular, replies that asked questions were associated with a 5% increase in newcomers’ subsequently posting again to the group over and above the boost associated simply with receiving a reply. The emotional tone of the reply was not associated with changes in the probability of posting again. There was, however, very weak evidence that the agreeability of the reply may have influenced the probability of posting again. In particular, those receiving replies with a higher proportion of words indicating either agreement or disagreement were slightly more likely to post again to the groups, but neither relationship was highly significant (ps = .10 and .13 respectively).

Model 4 asks whether a message sequence in which an initial post asked a question and the reply provided a statement, opinion, or advice led to a higher likelihood of posting again than other sequences of initial posts and replies. Adding the question-statement sequence to Model 3 does not improve the model fit, and the question-statement sequence was not a significant predictor of posting again.

Predicting Characteristics of the Replies

Because characteristics of the replies were associated with the likelihood of posting again, we conducted exploratory analyses to determine whether characteristics of the initial posts had any influence on characteristics of the replies. We use a probit analysis for binary characteristics of the replies (Reply IsQuestion) and ordinary least squares regression for predicting continuous characteristics of the replies. Table 4 shows these results.

Length of the Replies. Length of the replies predicts the length of replies using OLS regression (Table 4, Model 1). The Mozilla group had longer replies than the other five groups (note the significant negative coefficients for all groups in Model 1). Controlling for group, replies were longer when they responded to longer posts, suggesting some type of speech entrainment. An order-of-magnitude increase in the length of the initial post was associated with a 17% increase in the size of the reply. We had previously cleaned the data so that replies did not include embedded quotations in the initial message, so the association of length of initial post and length of replies was not an artifact of longer replies quoting longer posts.

Model 2 in Table 4 predicts the probability of the reply being either a question or a statement of information, advice, or opinion. Most replies were statements (89%). The groups differed in the likelihood that replies in them would be questions. More interestingly, initial posts that were questions were about 6% more likely to receive statements as replies than were initial posts that themselves were statements (93.4% vs. 88.7%, respectively). This finding suggests that some variant of a norm involving adjacency pairs (Sacks, Schegloff, & Jefferson, 1978), that questions should be followed by answers, applies in the online world as it does in the world of conventional conversation.

Model 3 predicts the proportion of words in the reply that would fit into the LIWC positive-emotion category. The groups differed, with the diet group using a higher proportion of positive emotion words than the other groups, and the baldspot and gun rights groups using a lower proportion. Perhaps this difference in use of positive emotional terms reflects the gender composition of the groups. Contrary to our expectations, though, entrainment did not occur. That is, replies contained fewer positive words when the initial post contained a higher proportion of words of assent or negation, indicating a decline in emotional expression in the reply, the greater the emotional valence (either positive or negative) of that first post.

Model 4 predicts the proportion of words in the reply that would fit into the LIWC negative-emotion category. Groups differed in their use of negative-emotion terms, with the health support groups (diet, baldspot, and breast cancer) using these terms more than other groups. However, none of the features of the initial posts that we measured predicted the use of negative-emotion words in the reply.

Model 5 predicts the proportion of words in the reply that were in the LIWC assent category. The baldspot group used words of assent more than the other groups. Replies to posts containing fewer positive-emotion words themselves contained more words of assent.

Model 6 predicts the proportion of words in the reply that were in the LIWC negate category. Again, groups differed in their use of negative words, with the diet group using them more than average and the breast cancer, gun rights, and the N.Y. Rangers group using them less than average.


Figure 2 summarizes the empirical relationships among the variables in the analyses, and Table 5 compares the results to the hypotheses. Attributes of the initial post and whether the initial post received a reply both predicted whether a newcomer posted again. People who wrote longer messages were more likely to return to the group. It is possible that the length of their initial post was a reflection of pre-existing commitment or propensities to engage with the group. For example, people who are already interested in the group or the topic around which the group is organized may write longer messages as a result of their pre-existing interests. Alternatively, it is possible that these contributory behaviors may build commitment and not just index it. For example, writing more may increase investment in the community through processes of cognitive dissonance (Aronson & Mills, 1959) or self-presentation (Bem, 1967). In other words, the more a newcomer writes, the more investment he or she is making in the group before joining it. This cost of initiation may itself cause people to become more committed to the group.

Figure 2.

Summarizing the relationships.

Table 5.  Summary of results
1Receiving a response to an initial post will increase the likelihood that the poster will post again.Supported
2Receiving a response that provides information rather than asks a question will increase the likelihood that the poster will post again.Disconfirmed
3Receiving a response will be more likely to encourage someone to post again if the response is a direct answer to an initial question.Not supported
4Receiving a response that has a positive emotional tone rather than a negative one will increase the likelihood that a newcomer will post again.Not supported
5Receiving a response that affirms what the poster said rather than disagreeing with it will increase the likelihood that a newcomer will post again.Not supported
6Receiving a longer reply to an initial post rather than a shorter one will increase the likelihood that a newcomer will post againNot supported

In addition to this endogenous influence, interaction among participants in the group also influenced whether newcomers posted again. In particular, newcomers were more likely to post again if anyone responded to their initial post. This confirms Patterson’s (1994) thesis, discussed in the introduction, that getting a response continues the conversation, making it more likely that posters will want to further their participation in the group. The way others treat a member of a group—acknowledging or ignoring—clearly influences that member’s future behavior in the community. These results are consistent with Williams, Cheung, & Choi’s (2000) experimental work, showing that cyberostracism—being ignored online—causes participants in a group unhappiness and reduces their sense of belonging to the group.

It is possible that receiving a reply can build the newcomers’ commitment to the individuals who responded or to the group as a whole, at least for the short term. That is, the newcomers may feel a sense of obligation to continue the conversation or the relationship with the person who responded to them. Alternately, they make take the responsiveness of the individual who responded to them and generalize this to the group as a whole, thinking perhaps that this group is a friendly or useful place.

Since earlier research (Butler, 1999) suggests that the majority of people who post to a specific online group post only once, lack of commitment from potential new recruits to a group can be a major problem for its survival. To encourage development of commitment, one could alert members that a newcomer has posted but has received no response within a specified time period. These alerts, for example, could be used to recruit volunteer responders who might welcome newcomers or answer their questions (Kim, 2000).

Surprisingly, we found little evidence in this study that once we accounted for the existence of a reply, the nature of the reply influenced newcomers’ commitment to the group. The likelihood of posting again was not associated with the length of the reply, whether it was filled with words indicating agreement or negation or with words indicating positive or negative emotions. The one exception was that newcomers were more likely to return to the group to post again when the reply to their initial post was a question, suggesting that they felt bound by the norms of conversation to answer questions. One might have expected that newcomers would return at higher rates when they received responses that offered advice, opinions, or information rather than questions, but this was not the case.

The research we presented here is preliminary, a first step in a larger agenda to understand what makes online groups successful. It examined only one aspect of the success of an online group—the likelihood that a newcomer will return to a group after posting to it once. Success of online groups exists at multiple levels of analysis. One could examine success at the level of the single conversation (e.g., is an exchange civil and informative?), the individual participant (e.g., does the individual get information or social support at the level desired?), or the group as a whole (e.g., can the group recruit sufficient new members to maintain a stable size or to grow?).

Successes at these multiple levels of analysis are not necessarily compatible. The dialogue that is essential for developing success at the conversational and participant level of analysis could lead to information overload that might harm the group as a whole. For example, interventions to increase the likelihood that someone will respond to newcomers’ initial posts may increase newcomers’ commitment. Paradoxically, these same interventions may alienate already existing group members. The classic problem is that newcomers ask questions that have been asked and answered multiple times in the past. Demands from old-timers that newcomers read the FAQs can be rude and may alienate the newcomers, while the rehashing of well-known materials may alienate the old-timers. More generally, increases in message traffic in an online group lead to defection of existing members (e.g., Jones et al., 2004; Lampe & Johnston, in press). One solution to this paradox of message traffic could be some partitioning of the message space in online communities so that some of the interaction between newcomers and already existing participants can be hidden from the membership at large. This design suggestion, however, would require natural language processing to assess the redundancy in initial posts.

This research is also preliminary in the sense that it looked at a limited number of precursors of group commitment in a small number of groups. Although these groups differed in their likelihood of newcomers’ receiving a reply or posting again, they did not differ in terms of the interaction qualities that were associated with newcomers’ returning. That is, in statistical terms, in analyses not reported here we found no significant interactions between group type and the variables listed in Table 3 in predicting the likelihood that newcomers would return to the group. These null results, however, do not mean that all groups are identical. As of April, 2005, there were well over 180,000 Usenet groups alone, organized around a wide variety of topics or member characteristics. Undoubtedly, the same factors will not lead to commitment in all types of groups. For example, the emotional expression that one would expect to be important to build commitment in online illness support groups may be unnecessary or even counterproductive in technical support, political discussion, task, or hobby groups. Further research needs to sample types of groups systematically to examine these differences.

In addition, this research used a minimalist indicator of commitment to an online group—the likelihood that a newcomer will post even one additional time in the group—and examined only a small fraction of variables that could influence commitment. We looked at the interaction among participants, including structure (e.g., whether a post received a response) and content (e.g., the type, emotional tone, and agreeability of both the post and the reply). We used the LIWC program to assess emotional tone and agreeability. This program has clear limitations (e.g., it cannot identify speech acts, conversational intent, or such subtleties as irony, sarcasm, or word nuance [Pennebaker & King, 1999]). However, in the future other, more sophisticated natural language processing techniques could be used to assess other attributes of message texts in an automated process (e.g., Cohen, Carvalho, & Mitchell, 2004). Other language features are likely to be important in determining members’ commitment to groups, including such acts as the provision of emotional support or the expression of solidarity. In addition to language features, structural features of the group (e.g., the proportion of new members already present or communication volume) and features of the group environment (e.g., the number of other groups offering similar content) may also influence the commitment of new members.

Ours is not experimental research. Therefore, we cannot definitely say that the empirical relationships shown here between message characteristics and the likelihood of getting a reply, and between getting a reply and posting again, are causal. Because we know the temporal ordering of the conversational events, we can rule out some threats to causal inference often characteristic of correlational data. In particular, because the posters construction of the messages precedes others’ decisions to reply to them or not, we can rule out the possibility that getting a reply caused posters to construct particular types of messages. Similarly, because the presence of a reply precedes the decision to post again, we can rule out the possibility that the likelihood of posting again caused the community to respond to an initial post in a particular way. However, we cannot rule out the possibility that the relationships between messages and the likelihood of getting a reply, and between getting a reply and posting again, are artifacts, conditioned on the existence of some unmeasured additional variables that cause both the independent variables and the outcomes. For example, it is possible that some unmeasured features of the group, post, or messages are correlated with the messages-level variables in Table 3 and directly influence the likelihood of a message getting a reply.

Despite these limitations, however, this research provides evidence that interaction in online groups increases newcomers’ commitment to the group, or at least their willingness to return to those groups for at least one additional post, which we believe is an initial step on the road to commitment.


This research was supported by National Science Foundation grants IIS-0325049 and SGER-0450515 and by Edinboro University of Pennsylvania’s Senate Faculty Research Grants and Office of the Dean of Graduate Studies. Kiana Matthews, Umit Guvenc, and Claire Palmgren participated in initial data gathering and data coding. Prescott Tollinger wrote programs for capturing newsgroup posts. Steve Mozel, Marc Smith, and the Netscan project at Microsoft Research provided additional data, and David Housman and Zoe Ouyang assisted in accessing it.


  • 1

    Read the fucking manual.

  • 2

    Model fit statistics between Models 1 and 2 and Models 3 and 4 cannot be compared, because they are based on different samples. Models 1 and 2 are based on the complete sample and Models 3 and 4 are based on only newcomers who received a reply.

About the Authors

  1. Elisabeth Joyce is an associate professor in the Department of English and Theatre Arts at Edinboro University of Pennsylvania. She received her Ph.D. in modern American poetry from Temple University in 1991. She has a book from Bucknell University Press, Cultural Critique and Abstraction: Marianne Moore and the Avant-Garde (1999), and she is currently working on a book-length project on poetry and space and Susan Howe’s work. She is also working on developing ways to use machine learning to carry out conversation analysis and assess group behaviors in online communities.Address: Department of English and Theatre Arts, Edinboro University of Pennsylvania, 500 Meadville St., Edinboro, PA 16444 USA

  2. Robert Kraut is Herbert A. Simon Professor of Human-Computer Interaction at Carnegie Mellon University. He received his Ph.D. in Social Psychology from Yale University in 1973 and has previously taught at the University of Pennsylvania and Cornell University. He was a research scientist at AT&T Bell Laboratories and Bell Communications Research for twelve years. Dr. Kraut has broad interests in the design and social impact of computing and conducts research on everyday use of the Internet, technology and conversation, collaboration in small groups, computing in organizations, and contributions to online communities. His most recent work examines factors influencing the success of online communities and ways to apply psychology theory to their design. More information is available at Human-Computer Interaction Institute, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213 USA