Get access

Mining and tracking evolving web user trends from large web server logs

Authors

  • Basheer Hawwash,

    1. Knowledge Discovery and Web Mining Laboratory, Department of Computer Engineering and Computer Science, University of Louisville, Louisville, KY 40292, USA
    Search for more papers by this author
  • Olfa Nasraoui

    Corresponding author
    1. Knowledge Discovery and Web Mining Laboratory, Department of Computer Engineering and Computer Science, University of Louisville, Louisville, KY 40292, USA
    • Knowledge Discovery and Web Mining Laboratory, Department of Computer Engineering and Computer Science, University of Louisville, Louisville, KY 40292, USA
    Search for more papers by this author

Abstract

Recently, online organizations became interested in tracking users' behavior on their websites to better understand and satisfy their needs. In response to this need, web usage mining tools were developed to help them use web logs to discover usage patterns or profiles. However, since website usage logs are being continuously generated, in some cases, amounting to a dynamic data stream, most existing tools are still not able to handle their changing nature or growing size. This paper proposes a scalable framework that is capable of tracking the changing nature of user behavior on a website, and represent it in a set of evolving usage profiles. These profiles can offer the best usage representation of user activity at any given time, and they can be used as an input to higher-level applications such as a web recommendation system. Our specific aim is to make the hierarchical unsupervised niche clustering (HUNC) algorithm more scalable, and to add integrated profile tracking and cluster-based validation to it. Our experiments on real web log data confirm the validity of our approach for large data sets that previously could not be handled in one shot. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 106-125, 2010

Ancillary