SEARCH

SEARCH BY CITATION

Keywords:

  • compression-based model;
  • spam filter;
  • text categorization;
  • knowledge-based system;
  • machine learning

ABSTRACT

Nowadays, e-mail spam is not a novelty, but it is still an important problem with a high impact on the economy. Spam filtering poses a special problem in text categorization, in which the defining characteristic is that filters face an active adversary, which constantly attempts to evade filtering. In this paper, we present a novel approach to spam filtering based on a compression-based model. We have conducted an empirical experiment on eight public and real non-encoded datasets. The results indicate that the proposed filter is fast to construct, is incrementally updateable, and clearly outperforms established spam classifiers. Copyright © 2012 John Wiley & Sons, Ltd.