• Wiki;
  • reputation;
  • reliability;
  • Wiki mining;
  • Wikipedia;
  • Web 2.0


Collaborative systems available on the Web allow millions of users to share information through a growing collection of tools and platforms such as wikis, blogs, and shared forums. By their very nature, these systems contain resources and information with different quality levels. The open nature of these systems, however, makes it difficult for users to determine the quality of the available information and the reputation of its providers. Here, we first parse and mine the entire English Wikipedia history pages in order to extract detailed user edit patterns and statistics. We then use these patterns and statistics to derive three computational models of a user's reputation. Finally, we validate these models using ground-truth Wikipedia data associated with vandals and administrators. When used as a classifier, the best model produces an area under the receiver operating characteristic (ROC) curve (AUC) of 0.98. Furthermore, we assess the reputation predictions generated by the models on other users, and show that all three models can be used efficiently for predicting user behavior in Wikipedia. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 126-139, 2010