Studying query reformulation strategies in search logs


Extended Abstract

Users frequently modify a previous search query in hope of retrieving better results. These modifications are called query reformulations or query refinements. Existing research has studied how web search engines can propose reformulations, but has given less attention to how people perform query reformulations. In this research, we aim to better understand how web searchers refine queries through the analysis of search logs, and form a theoretical foundation for query reformulation. Effectiveness of reformulations is measured by user click behavior.

We study users' reformulation strategies in the context of the AOL search logs [9]. We create a taxonomy of query reformulation strategies by examining and merging query reformulation types that were identified in eight search log studies [1][2][4][5][6][7][10][11], and by introducing new types. A total of 13 query reformulation types have been identified. These are: spelling correction; add words; remove words; whitespace and punctuation; abbreviation; url stripping; substring; superstring; stemming; word substitution; word reorder; form acronym; expand acronym.

To validate the reformulation types we used a set of 100 users and the queries they generated. For each user, session boundaries were manually marked. Then, the generated set of 9,091 query pairs was hand coded to determine whether the second query was a refinement of the first. Once this training task was completed we analyzed the entire set of AOL queries to identify query reformulations and to categorize them based on the eleven reformulation types. Of the 36,389,567 AOL queries the classifier identified 16,069,421 new queries, 14,861,326 same queries, and 3,411,706 reformulations which were analyzed and presented in this poster.

Analyzing query logs is popular in query reformulation research because results in a high level of realism, precision, and generalizability compared to methods such as field studies and surveys [8]. However, it lacks context and therefore we cannot be completely confident of the intent behind the queries [3]. While we cannot measure user satisfaction directly, we use metrics related to the effectiveness of a reformulation to deduce user satisfaction.

A set of metrics, including precision, recall, and accuracy, were used as indicators of effectiveness. The metrics we used are based on click behavior and help show the usage pattern and effectiveness of specific reformulations. In the analysis, we also included new and same queries for comparison. Differences between reformulation strategies were all statistically significant, thanks to the large number of events in our dataset.

A reformulation is composed of an original query and a reformulated query. For each of these queries, the user can decide to “click” or “skip” (not click) a result. Therefore, there are 4 possible click pattern pairs when interacting with the results of the original query and then the reformulated query. These click patterns are: SkipClick, ClickClick, ClickSkip, SkipSkip.

Effectiveness of query reformulations is measured by user click behavior, from the results of the initial query to those of the reformulated query. These include: click patterns as described above; same or different URL clicks; time between queries; and rank change of clicked results.

The results show that most reformulations positively affect these metrics as expected. Certain reformulation strategies, like add/remove words, word substitution, acronym expansion, and spelling correction, result in higher ranked unique clicks. On the other hand, users often don't select any results or click the same result as their previous query when forming acronyms and reordering words. Perhaps the most surprising finding is that some reformulations are better suited to helping users when the current results are already fruitful, while other reformulations are more effective when the results are lacking.

This contribution will help our understanding of users and inform designs that improve search interaction, such as interfaces supporting query reformulation assistance and personalized interaction.