Image query reformulation over different search stages

Authors


Introduction

Since index terms cannot fully represent an image document and users cannot express their needs with a query completely, an image retrieval system which entirely relies on matching between index terms and queries often fails to return what users want. In a sense, one of plausible approaches is to design an image retrieval system which helps users effectively reformulate their initial search queries and explore image collections throughout the searching process. This study attempts to investigate how initial queries are reformulated through the search process. Especially, this study examined if there are different query types based on the process users reformulate queries. Mainly, this study aims to understand that to what extent there are differences and similarities in query types depending on different search stages.

Data Set

This study used the Web search log of Excite 2001, which has been used frequently in several Web query studies (Spink, Jansen, Wolfram, & Saracevic, 2002; Eastman & Jansen, 2003; Jansen & Spink, 2005). The Web search log of Excite 2001 contains 262,025 sessions and 1,025,910 queries (Spink, et al. 2002). Out of the total of 1,025,910 queries, 32,664 queries remained after selecting image queries. A total of 8,434 queries and 5,680 sessions were remained. Among the 5,680 sessions, 4,204 sessions, i.e., 74% of the session included only initial queries, and others 26% (1,476 sessions) include revised queries. The search sessions which have three or more queries per session were identified. As a result, a total of 592 sessions and 2,445 queries were remained for the final data set.

Findings

In order to investigate the /uery changes in terms of different search stages, this study categorized queries from two perspectives. First, overall intention expressed in a query was analyzed using Batleys four visual information types and second, individual terms consisting a query was analyzed using the categorization schemes which were developed based on previous query reformulation studies (Rieh & Xie, 2006; Lau & Horvitz, 1999; Bruza & Dennis, 1997) and image search query analysis studies. As shown in Figure 1, the overall pattern of the Batleys types demonstrates consistent results over five search stages. The Specific and General/Namable showed the majority, while General/Abstract and General/Subjective were the opposite. In the fifth stage, however, the General/Subjective type presents higher compared to other stages.

Figure 1.

Categories of search queries with five different stages

The second analysis of queries is regarding the query reformulation patterns among different search stages. The graphical representation of query changes over different search stages is shown in Figure 2. In this analysis, three groups of patterns are recognized: increasing, decreasing, and irregular patterns. First, increasing pattern includes Parallel Movement in content and Operator Usage in format as search stages progress. As searches progress, more Parallel movement appears. In addition, more operators are used as searches move through. Second, the decreasing pattern can be found in Specification, Replacement with synonyms, and Use the exact query in content, and Term variation in format. As users progress the searches, they are less in specification, replacement with synonyms, and usage the exactly same queries. In terms of format, users use less term variation as they revise their queries. Third, there is a pattern with no identified regular behaviors such as Generalization, and Interruption. It would be desirable to investigate this pattern with other data sets in further studies.

Figure 2.

Categories of search queries with five different stages (S: Specification, G: Generalization, Replace: Replacement with synonyms, Add synonyms, P: Parallel movement, Request: Request for additional results, Interruption, U: Use the exact query, TV: Term variations, O: Operator usage, E: Error correction)

Discussion and Conclusion

Users are likely to search images with re-formulated search queries iteratively since it is almost impossible for users to have the perfect search results in a single occurrence. This study attempted to investigate the changes of queries over different search stages as searches progress. Using the Web search log of Excite 2001, this study categorized 592 sessions and 2,445 queries based on Batleys four visual information types and the categorization schemes for query reformulation studies (Rieh & Xie, 2006; Lau & Horvitz, 1999; Bruza & Dennis, 1997). Since it is common for users to revise their initial queries when they are not satisfied with the initial search results, it is desirable to understand the characteristics of search query changes over different search stages. Further studies and analyses are necessary to understand the characteristics of search queries and reformulation processes.

Ancillary