OSN Crawling Schedulers and Their Implications on k-Plexes Detection

Authors

  • Cristina Pérez-Solà,

    Corresponding author
    • Departament d'Enginyeria de la Informació i les Comunicacions, Universitat Autònoma de Barcelona, Bellaterra, Cerdanyola del Vallès (Barcelona), Spain
    Search for more papers by this author
  • Jordi Herrera-Joancomartí

    1. Departament d'Enginyeria de la Informació i les Comunicacions, Universitat Autònoma de Barcelona, Bellaterra, Cerdanyola del Vallès (Barcelona), Spain
    2. Internet Interdisciplinary Institute, Universitat Oberta de Catalunya, Barcelona, Spain
    Search for more papers by this author

Author to whom all correspondence should be addressed; e-mail: cperez@deic.uab.cat.

Abstract

Web crawlers are complex applications that explore the Web for different purposes. Web crawlers can be configured to crawl online social networks (OSNs) to obtain relevant data about their global structure. Before a web crawler can be launched to explore the Web, a large amount of settings have to be configured. These settings define the crawler's behavior and they have a big impact on the collected data. Both the amount of collected data and the quality of the information that it contains are affected by the crawler settings and, therefore, by properly configuring these web crawler settings we can target specific goals to achieve with our crawl. In this paper, we review the configuration choices that an attacker who wants to obtain information from an OSN by crawling it has to make to conduct his attack. We analyze different scheduler algorithms for web crawlers and evaluate their performance in terms of how useful they are to pursue a set of different adversary goals.

Ancillary