A personalized search engine based on Web-snippet hierarchical clustering


  • This work was done while the second author was a PhD student at the Dipartimento di Informatica, University of Pisa. The work contains the complete description and a full set of experiments on the software system SnakeT, which was partially published in the Proceedings of the 14th International World Wide Web Conference, Chiba, Japan, 2005


We propose a (meta-)search engine, called SnakeT (SNippet Aggregation for Knowledge ExtracTion), which queries more than 18 commodity search engines and offers two complementary views on their returned results. One is the classical flat-ranked list, the other consists of a hierarchical organization of these results into folders created on-the-fly at query time and labeled with intelligible sentences that capture the themes of the results contained in them. Users can browse this hierarchy with various goals: knowledge extraction, query refinement and personalization of search results. In this novel form of personalization, the user is requested to interact with the hierarchy by selecting the folders whose labels (themes) best fit her query needs. SnakeT then personalizes on-the-fly the original ranked list by filtering out those results that do not belong to the selected folders. Consequently, this form of personalization is carried out by the users themselves and thus results fully adaptive, privacy preserving, scalable and non-intrusive for the underlying search engines. We have extensively tested SnakeT and compared it against the best available Web-snippet clustering engines. SnakeT is efficient and effective, and shows that a mutual reinforcement relationship between ranking and Web-snippet clustering does exist. In fact, the better the ranking of the underlying search engines, the more relevant the results from which SnakeT distills the hierarchy of labeled folders, and hence the more useful this hierarchy is to the user. Vice versa, the more intelligible the folder hierarchy, the more effective the personalization offered by SnakeT on the ranking of the query results. Copyright © 2007 John Wiley & Sons, Ltd.