Get access

A link graph-based approach to identify forum spam


  • Shin participated in this work while he was a PhD student at Indiana University. He is currently with Korea Internet and Security Agency, Seoul, South Korea.


Web spammers have taken note of the popularity of public forums such as blogs, wikis, webboards, and guestbooks. They are now exploiting them with the purpose of driving traffic to their malicious or fraudulent websites, such as those used for phishing, distributing malware, or selling counterfeit pharmaceuticals. A popular technique they use is to spam these forums with URLs to their spam websites. We consider the problem of classifying URLs posted to forums as spam or legitimate by considering the link structure of the graph rooted at the posted URL. We investigate various graph metrics and associated metadata to analyze link structures. To lessen noisy structural characteristics of the link graphs for spam classification, we also examine two techniques: differing depths and aggregating sub-graphs of the link graphs. Our results show that a support vector machine classifier based on combinations of graph metrics and metadata of link graphs can achieve a pragmatically high performance in forum spam detection. Copyright © 2014 John Wiley & Sons, Ltd.