SEARCH

SEARCH BY CITATION

Abstract

We present the design and implementation of a web mining system that creates a hierarchical clustering of web documents retrieved by commercial web search engines. The cluster hierarchy is produced by a novel method called the Cluster Hierarchy Construction Algorithm (CHCA) and it can be used to explore the topics of interest related to the search query and their relationships. We discuss important design issues for our system, including stemming and dimensionality reduction, as well as some implementation details. We show examples of system results, compare them with results from similar systems, and analyze the responses to a survey of the system's users. © 2005 Wiley Periodicals, Inc. Int J Int Syst 20: 607–625, 2005.