- Top of page
- Diffusion of SuggestBot in Wikipedia
- About the Authors
This paper studies the diffusion of SuggestBot, an intelligent task recommendation system that helps people find articles to edit in Wikipedia. We investigate factors that predict who adopts SuggestBot and its impact on adopters' future contributions to this online community. Analyzing records of participants' activities in Wikipedia, we found that both individual characteristics and social ties influence adoption. Specially, we found that highly involved contributors were more likely to adopt SuggestBot; interpersonal exposure to innovation, cohesion, and tie homophily all substantially increased the likelihood of adoption. However, connections to prominent, high-status contributors did not influence adoption. Finally, although the SuggestBot innovation saw limited distribution, adopters made significantly more contributions to Wikipedia after adoption than nonadopter counterparts in the comparison group.
All communities, online and off, seek to motivate members to participate and continue contributing to the betterment of the group (Kanter, 1972; Olson, 1965). Whether posting messages, welcoming newcomers, building information databases, or helping to administrate the group's policy, online communities need member contributions to survive1. This is a serious problem for both new and existing communities because many face the challenge of undercontribution and/or inactivity over extended period of time (Cummings, Butler, & Kraut, 2002; Ling et al., 2005). Even in active communities, the levels of contribution among participants can be extremely uneven. For instance, in open-source development communities, Lakhani and von Hippel (2003) found that 4% of members contributed 50% of the answers on a user-to-user help site, while Mockus, Fielding, and Andersen (2002) found that 4% of developers contributed 88% of new code and 66% of code fixes. In our data we found that during a randomly selected 28-day data collection window for this study, 10% of the 6,570 randomly selected participants did not edit any Wikipedia content at all. In contrast, the most motivated contributor made 62,838 edits2; the top 5% of contributors made 44% of the total edits during this time. These contributors are obviously valuable, but uneven participation also has costs: It can lead to a few voices dominating the group and leave the group vulnerable if those few contributors depart. Thus, tools that encourage participation may help online communities thrive.
However, motivating contributions to these groups is difficult. As many scholars have observed, online communities and the resources they generate often take the form of a public good, in which all members of the community/public can enjoy the good regardless of their individual levels of contributions (e.g., Ling et al., 2005). Because community members can free ride on others' contributions, people will in general contribute less than would be optimal for the group. Although the critical mass model of collective action (Marwell & Oliver, 1993; Oliver & Marwell, 2001) predicts that a public good can be realized with the contribution of a small number of highly resourceful individuals so long as the provision level of the collective good reaches a level of self-sustainability, involving more contributors can make the group's participation patterns more democratic and robust.
One general strategy to increase participation is to reduce contribution costs, as suggested by the critical mass model of Marwell and Oliver (1993). The cost of contributing to online communities can take multiple forms, including financial cost, emotional cost, and cost in the time and effort in uploading/downloading information. Empirical studies on knowledge management in organizations show that employees were more likely to contribute their expertise to corporate knowledge repositories when contribution did not require too much time or effort (e.g., Yuan et al., 2005). Scholars of online communities have also found that reducing the cost of contribution by improving the design of technologies, e.g. by making it easier to find contributions a person would like to make, could motivate more contributions to a movie website's database (Cosley, Frankowski, Terveen, & Riedl, 2006) or to a discussion group (Ludford, Cosley, Frankowski, & Terveen, 2004). Following a similar logic, one of the authors created a recommendation tool, SuggestBot (Cosley, Frankowski, Terveen, & Riedl, 2007), and deployed it in Wikipedia to motivate more contributions to this online information commons. Wikipedia has hundreds of thousands of articles marked as needing improvement (usually lengthening), but no tools to help people find articles they are likely to be able to contribute to. Thus, there is a high cost to finding useful contributions to make. Building on the theory of collective action, SuggestBot uses a strategy called “intelligent task routing” to reduce a person's cost of finding articles to work on by recommending articles that both need attention and that are similar to articles that person has edited in the past3. Such articles are likely to be close to a person's interests, making it easier for them to contribute.
In this study, we examine two aspects of SuggestBot's use: First, how did it diffuse through the community, and second, how did it affect the contribution behavior of those who used it, compared to those who did not? Both questions are crucial when introducing technologies into online communities: If potential users do not adopt the tool, or it has minimal effects on their behavior, the technology will not benefit the community.
Diffusion of innovation has attracted decades of attention from scholars from diverse disciplines (Burt, 1987; Strang & Soule, 1998; Valente, 1996). However, the difficulties in tracking diffusion processes impose constraints on empirical research. Most studies use retrospective self-report data to examine the diffusion process, and the few studies that collect actual behavior data have sporadic information. For instance, in Coleman et al.'s (1966) study on the diffusion of tetracycline, doctors' prescriptions of the drug were sampled for only three consecutive days a month. Errors in recall or gaps in data sampling can add substantial noise to the data, influencing both statistical analysis and conceptual interpretation. The rise of the Internet has opened up new possibilities for observing diffusion processes. A plethora of digital traces of human online activities can be logged unobtrusively for academic research, giving scholars the opportunity to use objective measures of human behavior over time that are not contaminated with recall biases (Welser, Smith, Gleave, & Fisher, 2008). This may allow researchers to confirm and replicate findings from earlier studies on a much larger scale, as well as to investigate some of the issues that used to be too demanding to study empirically.
Wikipedia makes an excellent site for digital research because almost all activities on the site, including details of article edits and interpersonal communication, are archived and freely available for download. This data, plus our access to SuggestBot's internal logs, allowed us to obtain a complete, time-stamped record4 of (a) who has adopted SuggestBot, (b) who has interacted with whom, and (c) who has edited articles suggested by SuggestBot. The data also allow us to find nonadopters, a much overlooked segment in existing diffusion research (Rogers, 2003) to compare and contrast with adopters. Finally, because all the data collected have a precise time stamp, the resulting empirical measurements can be arranged along a clear temporal order, which gives us more power to make causal inferences. Overall, we believe that our work can contribute to diffusion of innovation research from multiple dimensions.
In addition to furthering our understanding of diffusion, the project improves our understanding of how to motivate contributions to online communities. Motivating contribution to electronic commons is a challenging task because when community members are distributed globally, some conventional incentive strategies such as fostering strong local norms of cooperation (Coleman, 1988) become more difficult to implement. A promising alternative, suggested by Kraut (2003), is to use social science theories to inform the design of tools that motivate participation. SuggestBot, as briefly described above, dovetails with Kraut's call in that its design followed the basic premises of collective action theory (Olson, 1965, Marwell & Oliver, 1993) and theories of individual motivation to participate in groups (Karau & Williams, 1993), with a goal to involve more people in community development via cost reduction. Through examining SuggestBot's diffusion process, as well as the effect of its adoption, we can better understand how to motivate contributions to online communities.
Using a sample of 6,570 Wikipedia contributors, we explored possible answers to the following questions: (a) which factors influenced adoption of SuggestBot? And (b) has the adoption of SuggestBot made a difference in individuals' contributions to the community? In the following section of the paper, we will first review related literature about factors that may influence diffusion of innovation and online contributions. We then present an empirical test of the research questions/hypotheses raised. The paper ends with a discussion on substantive implications of our findings, practical implications, and directions for future research.
- Top of page
- Diffusion of SuggestBot in Wikipedia
- About the Authors
Hypotheses 1 to 6 examine factors that influence adoption. Since they shared the same dependent variable, the research variables were entered into the logistic regression model in steps; the models are shown in Table 2 for both the whole sample and the subsample. In general, results hold in both cases; we report the results from both samples below.
Table 2. Results of Logistic Regression Analysis
|Variable||Model 1||Model 2||Model 3|
|Whole Sample||Subsample||Whole Sample||Subsample||Whole Sample||Subsample|
|Number of active months||1.02**||1.02*||1.00||1.01||1.00||1.00|
|Admin status|| || ||1.05||1.45||.77||1.05|
|Preadoption contribution|| || ||11.23**||11.89**||5.14**||5.42**|
|Interpersonal exposure|| || || || ||3.08**||5.01*|
|Cohesion|| || || || ||1.93**||2.45*|
|Tie homophily|| || || || ||1.97**||2.34**|
|Ties to opinion leaders|| || || || ||.88||.46|
Model 1 contained only the control variable, number of active months. When number of active months was the only predictor variable in the analysis, it had significant influence on likelihood of adoption in both the whole sample and the subsample (B = .02, odds ratio = 1.02, p< .05 for both). The changes in the deviance scores from the null, baseline model showed that the improvement in model fit was significant ( , df = 1, p < .05; , df = 1, p < .05). However, Nagelkerke's pseudo R-square for Model 1 was .01, indicating that the overall fit was poor.
Model 2 added the attribute variables of admin status and total activity. Conceptually, these are both indicators of individual level involvement to Wikipedia. The control variable, number of active months, became nonsignificant when the attribute variables were added. Counter to Hypothesis 1, admin status was not a significant predictor of likelihood of adoption (B whole = .05, o.r. whole = 1.05, p> .05; Bsub = .37, o.r. sub = 1.45, p> .05). That is, although the odds ratios showed that those with admin status had higher likelihood of adopting SuggestBot, the increased likelihood was not statistically significant. Consistent with Hypothesis 2, preadoption contribution was a significant predictor of adoption (B whole = 2.42, o.r. whole = 11.23, p< .05; B sub = 2.48, o.r. sub = 11.89, p< .05). That is, heavy contributors had much higher likelihood of adoption. Nagelkerke's pseudo R-square for Model 2 increased to .15, compared to .01 for Model 1; changes in likelihood ratios between the control-variable-only model and the current model showed that this improvement in model fit was significant (χ2whole = 709.64, df = 2, p< .05; χ2sub = 105.56, df = 2, p< .05). To sum up, although both administrator status and volume of contribution prior to adoption had clear theoretical reasons for being positively associated with adoption, the substantial increase in model fit was largely attributed to preadoption contribution, the only significant variable in Model 2.
Model 3 added four network variables. Consistent with Hypothesis 3, interpersonal exposure was a significant predictor of likelihood of adoption (B whole = 1.12, o.r. whole = 3.08, p< .05; B sub = 1.61, o.r. sub = 5.01, p< .05). Also consistent with Hypothesis 4, cohesion through reciprocal ties was a significant predictor of likelihood of adoption (B whole = .66, o.r. whole = 1.93, p< .05; Bsub = .90, o.r. sub = 2.45, p< .05). Supporting Hypothesis 5, tie homophily was a significant predictor of likelihood of adoption (B whole = .68, o.r. whole = 1.97, p< .05; Bsub = .85, o.r. sub = 2.34, p< .05). However, counter to Hypothesis 6, ties to opinion leaders did not increase likelihood of adoption (B whole = −.13, o.r. whole = .88, p> .05; B sub = −.79, o.r. sub = .46, p> .05). Nagelkerke's pseudo R-square for Model 3 increased to .22, compared to .15 for Model 2. Again, the change in likelihood ratios showed that the observed improvement in model fit from Model 2 was statistically significant (χ2whole = 359.46, df = 4, p< .05; χ2sub = 71.32, df = 4, p< .05).
Comparing odds ratios and model fit between Model 2 to Model 3 sheds light on how both individual commitment (preadoption contribution) and social influence (communication network variables) affect adoption. Adding social influence variables significantly increased model fit, and three out of four of the social influence variables had positive and clearly significant effects. However, the addition of these variables did not eclipse the role of involvement, as preadoption contribution remains strongly and positively associated with adoption. Thus, our results suggest that both individual involvement and social influence from network ties can have important and independent positive effects on the likelihood of adoption in communities of collaborators such as Wikipedia.
According to Models 2 and 3, comparison of odds ratios and model fit seems to suggest that preadoption contribution accounts for most of explanatory power of the final model. However, because variable effects are multiplicative and vary across the range of a predictor variable, odds ratios cannot be directly interpreted in terms of effect on probability. Further, because they are unstandardized, they cannot be easily compared across variables (see Menard 2001)8. To get a better sense for the relative importance of the model variables, we used Model 3 to compute the probability of adoption for several hypothetical users who differ on key theoretical variables that predicted adoption. Table 3 reports the results.
Table 3. Predicted Probabilities Computed From Model Three, Full Sample
|Variable||Model coefficient||Sample average||High homophily||High cohesion||High contribution||High exposure||High composite|
|Intercept||−6.485|| || || || || || |
|Number of active months||−0.002||13.65|| || || || || |
|Admin status||−0.258||0.037|| || || || || |
|Preadoption contribution||1.635||1.735|| || ||1.995|| ||1.995|
|Interpersonal exposure||1.122||1.165|| || || ||1.745||1.745|
|Cohesion||0.654||1.248|| ||1.745|| || ||1.745|
|Tie homophily||0.682||1.242||1.495|| || || ||1.495|
|Ties to opinion leaders||−0.127||1.229|| || || || || |
|Predicted probability|| ||.30||.33||.37||.39||.45||0.67|
|Increase in predicted probability over baseline|| ||—||+ .03||+ .07||+ .09||+ .15||+ .37|
The first column of Table 3 shows the coefficients from Model 3 based on the whole sample. The second column presents a hypothetical average contributor who has the sample average for each of the variables, along with the probability that such a person would adopt SuggestBot. The next four columns explore how much impact each individual variable has by estimating the probability of adopting for a person who is average in every respect except that they have an unusually high value for one of the four key predictors. The last column presents the probability of adoption for a “supercontributor” who has high values for all four of the key predictors9.
The model computes a baseline probability for the “average” contributor to adopt of .3010. Ceteris paribus, elevating tie homophily has only a modest effect on adoption (+.03, compared to the baseline). Increasing cohesion to a similarly high level almost doubles the increase in probability to +.07, setting contribution to a high level further increases the predicted probability by +.09 over the baseline, and interpersonal exposure has the greatest effect on the probability of adoption, raising it by +.15 over the baseline. Hypotheses 1–6 all referred to mechanisms that were predicted to increase the probability of adoption. While we found support for Hypotheses 2–5, these results suggest that the strongest effects arise from high levels of contribution (Hypothesis 2) and interpersonal ties (Hypothesis 3). Furthermore, when these mechanisms work in concert with tie homophily and cohesion, the model shows evidence of a strong increase in the probability of adoption by + .37 compared to the baseline.
The second research question explores how adoption affects contribution. Hypothesis 7 predicts that adopters would contribute more in the future than nonadopters. To test this, we conducted independent sample t-tests to compare the mean level of contributions between adopter and nonadopter groups both before and after adoption. Levene's test showed that equal variance between the two groups in the whole sample could not be assumed (F (1,6, 568) = 6.37, p < .05). The corresponding t value was statistically significant, t(4, 776) = −2.34, p < .05, indicating that before adoption, adopters on average contributed less to the community than the nonadopters (M adopters = 926.52, SD = 1875.71 versus Mnonadopters = 1, 049.55, SD = 2, 067.72). However, the contribution pattern was reversed after adopters adopted SuggestBot. An independent sample t-test on postadoption contribution found significant difference in means, t(4, 317) = 4.99, p < .01 between adopters and nonadopters (M adopters = 788.86, SD = 1, 741.31 versus M nonadopters = 562.82, SD = 1, 713.83), indicating that adopters contributed significantly more than the nonadopters after they adopted SuggestBot.
The same independent sample t-tests were also conducted with the randomly generated subsample. The tests produced similar results. Before adoption, adopters and nonadopters (M adopters = 769.65, SD = 1, 613.94 versus M nonadopters = 910.51, SD = 1, 677.96) did not differ significantly in their levels of contribution to the community (equal variance of two groups can be assumed (F (1, 958) = 2.41, p > .05), t(958) = −1.24, p> .05). However, after adoption, the adopters contributed significantly more to the community than nonadopters (M adopters = 787.36 SD = 1805.67 versus M nonadopters = 479.75, SD = 1, 806.83), t(638) = 2.49, p < .01 (equal variance between the two groups could not be assumed (F (1, 958) = 4.30, p < .05)). Taken together, the results based on both the whole sample and the subsample supported Hypothesis 7, suggesting that adoption of SuggestBot enabled adopters to make significantly more contributions to the community even though nonadopters contributed equally or more than adopters prior to adoption.
About the Authors
- Top of page
- Diffusion of SuggestBot in Wikipedia
- About the Authors
Y. Connie Yuan is an Assistant Professor in the Departments of Communication and Information Science at Cornell University. She received her Ph.D. from the University of Southern California. Dr. Yuan's research focuses on social networks, communication technology, online communities, knowledge management and computer-supported distributed work in organizations. She has recently received funding from the National Science Foundation (2008–2011) to study how the development of network relations and the usage of communication technology influence the transfer and retention of organizational knowledge via the development of transactive memory systems. She has received best papers or distinguished article awards from the annual conferences of the Academy of Management and the National Communication Association. Her work has been published in Communication Research, Human Communication Research, the Information Society, and Journal of Computer-Mediated Communication, among others. She is on the editorial board of Journal of Applied Communication Research and Journal of Computer-Mediated Communication.
Address: 308 Kennedy Hall, Department of Communication, Cornell University, Ithaca, NY 14853.
Dan Cosley is an assistant professor of information science at Cornell University. His primary interest is helping groups make sense, use, and reuse of information; he studies this by using social science theory, HCI design principles, and models of behavior to build and evaluate systems in real contexts. His work spans a broad range of problems, most recently using people's behavior around community goods such as Wikipedia to motivate them to contribute more, and re-using content people create in social media systems to support individual and social reminiscence. He is also interested in the more general problem of how to move from theory, principles, and models to actual design, and how those designs can then inform the principles that inspired the designs. He has a Ph.D. in computer science from the University of Minnesota and is the recent recipient of an NSF CAREER award.
Address: 301 College Ave., Department of Information Science, Cornell University, Ithaca, NY 14850.
Howard T. Welser is an Assistant Professor of Sociology at Ohio University. He received his Ph.D. in 2006 from the University of Washington. Dr. Welser's research investigates how microlevel processes generate collective outcomes, with application to status achievement in avocations, development of institutions and social roles, the emergence of cooperation, and network structure in computer mediated interaction. He recently received a grant from Microsoft Research to study emergent social roles in online question and answer systems. His work has been published in Rationality and Society, Journal of Social Structure, Journal of Computer-Mediated Communication, the proceedings of International AAAI Conference on Weblogs and Social Media, Hawaii International Conference on System Science, World Wide Web Workshop, as well as chapters in e-Research: Transformation in Scholarly Practice, and the Sage Handbook of Online Research Methods.
Address: Bentley Annex 109, Department of Sociology & Anthropology, Ohio University, Athens, Ohio 45701.
Ling Xia is a graduate student in the Department of Communication at Cornell University. She received her M.S. degree in Communication from Cornell University. She is interested in knowledge management and social network analysis, especially in the context of the adversarial network. She has received top paper award from the annual conferences of National Communication Association. Her work has been published in Journal of the American Society for Information Science and Technology and Management Communication Quarterly.
Address: 209 Kennedy Hall, Department of Communication, Cornell University, Ithaca, NY 14853.
Dr. Geri Gay is the Kenneth J. Bissett Professor and Chair of Communication at Cornell University and a Stephen H. Weiss Presidential Fellow. She is also a member of the Faculty of Computer and Information Science and the director of the Human Computer Interaction Lab at Cornell University. Her research focuses on social and technical issues in the design of interactive communication technologies. Specifically, she is interested in social navigation, affective computing, social networking, mobile computing, and design theory. Professor Gay has received funding for her research and design projects from NSF, NASA, the Mellon Foundation, Intel, Google, Microsoft, NIH, the Robert Wood Johnson Foundation, AT&T Foundation, and several private donors. She teaches courses in interactive multimedia design and research, computer-mediated communication, human-computer interaction, and the social design of communication systems. Recently, she has published in IEEE, International Journal of Human-Computer Interaction, Journal of Computer-Mediated Communication, Journal of Communication, CHI, HICCS, ACM Digital Libraries, SIGIR, JASIST, and CSCW.
Address: 325 Kennedy Hall, Department of Communication, Cornell University, Ithaca, NY 14853.