7. Preprocessing for Web Usage Mining

  1. Zdravko Markov Ph.D.1 and
  2. Daniel T. Larose Ph.D. Professor of Statistics Director2

Published Online: 17 JUL 2006

DOI: 10.1002/9780470108093.ch7

Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage

Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage

How to Cite

Markov, Z. and Larose, D. T. (2007) Preprocessing for Web Usage Mining, in Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, John Wiley & Sons, Inc., Hoboken, NJ, USA. doi: 10.1002/9780470108093.ch7

Author Information

  1. 1

    Department of Computer Science, Central Connecticut State University, New Britain, CT, USAwww.cs.ccsu.edu/∼markov/

  2. 2

    Data Mining@CCSU, Department of Mathematical Sciences, Central Connecticut State University, New Britain, CT, USAwww.math.ccsu.edu/larose

Publication History

  1. Published Online: 17 JUL 2006
  2. Published Print: 11 APR 2007

ISBN Information

Print ISBN: 9780471666554

Online ISBN: 9780470108093

SEARCH

Keywords:

  • web usage mining preprocessing;
  • data cleaning and filtering;
  • directories and basket transformation

Summary

This chapter contains sections titled:

  • Need for Preprocessing the Data

  • Data Cleaning and Filtering

  • De-Spidering the Web Log File

  • User Identification

  • Session Identification

  • Path Completion

  • Directories and the Basket Transformation

  • Further Data Preprocessing Steps

  • References

  • Exercises