Compression-based spam filter

Authors

  • Tiago A. Almeida,

    Corresponding author
    1. Department of Computer Science, Federal University of São Carlos – UFSCar, Sorocaba, SP, Brazil
    • Correspondence

      Tiago A. Almeida, Department of Computer Science, Federal University of São Carlos – UFSCar, 18052-780, Sorocaba, SP – Brazil.

      E-mail: talmeida@ufscar.br

    Search for more papers by this author
  • Akebo Yamakami

    1. School of Electrical and Computer Engineering, University of Campinas – UNICAMP, Campinas, SP, Brazil
    Search for more papers by this author

Abstract

Nowadays, e-mail spam is not a novelty, but it is still an important problem with a high impact on the economy. Spam filtering poses a special problem in text categorization, in which the defining characteristic is that filters face an active adversary, which constantly attempts to evade filtering. In this paper, we present a novel approach to spam filtering based on a compression-based model. We have conducted an empirical experiment on eight public and real non-encoded datasets. The results indicate that the proposed filter is fast to construct, is incrementally updateable, and clearly outperforms established spam classifiers. Copyright © 2012 John Wiley & Sons, Ltd.

Ancillary