Multivariate analysis of image search strategies

Authors


Abstract

This poster reports on a work in progress on the analysis of search strategies of users of an online image database. We combine transaction logging with qualitative query content analysis which enables a rich multivariate analysis of search behavior. The aim is to explore the suitability of transaction log analysis methodology in image retrieval and generate a wider understanding of the uses of image databases to inform system design.

Introduction

Transaction log analysis has been applied to information retrieval in multiple contexts (Jansen 2006). It provides insight into the uses made of systems and the strategies searchers adopt in their search processes. Bates (1979) defined a search strategy as a plan the searcher has for successfully completing a search. Search strategies consist of search tactics, i.e. moves made to further a search. Similar issues have been studied as “query reformulations” (Rieh & Xie 2001), “query refinement” (Chen & Dhar 1990), or “query transformations” (Whittle et al 2007) referring to the changes made to queries during a session.

Image search strategies have been studied previously by Goodrum et al (2003), who examined search moves made by test subjects searching images on the Web. Hung (2005) used a coding scheme adapted from Bates in her user study on image retrieval tactics. Jörgensen and Jörgensen (2005) used a framework containing search formulation and term tactics for analyzing logs from a commercial image database. We describe combining transaction logs with conceptual frameworks on image attributes to analyze image searching. The aim is to combine the analysis power of transaction log analysis while still accounting for the unique nature of image retrieval through query content analysis.

Data collection and coding

We collected 15001 queries input to a commercial online image database in late 2007. The transaction log contained the following information on each query: date, time (hh:mm:ss), unique session ID, user ID, search type and query terms as input by the user. We based our study on combining quantitative and qualitative metrics of search strategies as captured by the log. Session identification and analysis of query content were done manually. We coded the following aspects for all queries submitted:

  • 1.Queries as formulated by terms
  • a)Semantic levelWe coded the semantic level of query terms based on Shatford (1986). Generic semantic terms describe types of objects or scenes while specific terms refer to identified, named objects or scenes. Abstract terms refer to what the image represents, symbolic aspects or theme.
  • b)Type of termType of query term was assigned by function in the search string following Jörgensen and Jörgensen (2005).Types included: Adjective, Noun, Verb, Date term, Concept term (thematic, non-visual term) and Visual construct (Keister 1994). We further added the type Proper noun.
  • c)Image attributesImage attributes referred to in queries were coded based on a framework modified from Jörgensen (1998). Following attribute classes were added: Material, Organization, Image source and Work of art. Proper names, Animals and Weather were also added as suggested by Laine-Hernandez and Westman (2006).
  • 2.Query modifications

We coded “term tactics” and “new term tactics”, adapting tactics defined by Bates (1979) and Fidel (1985). Term tactics specify the addition, elimination, change and correction of terms. New term tactics specify the conceptual relationship of a new term compared to terms in the previous query. The new term may be narrower, broader, coordinate, opposite, synonym, respelling, or correction of the previous term or represent a word class, language or viewpoint change.

We further noted the number of query terms in a query and the number of queries per session. We also coded if queries were unique within the session. Table 1 shows the codes assigned to one session.

Table 1. Example of query modifications and coded aspects of queries
original image

Analysis of search strategies

Based on multivariate analysis of the coded aspects we attempt to typify the constructed queries and modifications made to these during sessions. In our analysis of queries and query modifications we analyze both linguistic and conceptual strategies of searchers. We also include, based on our qualitative analysis of query terms, semantic changes in query modifications. Multivariate statistical methods will be used to analyze the data in order to discover factors which explain image research strategies.

We also aim to typify image search strategies by discovering repeated patterns of query modifications. For this purpose we will test Maximal Repeating Pattern analysis and Markov models. Maximal repeated patterns aim to uncover repeated search patterns across several sessions and searchers. Markov models may be used to reveal connections between observed queries or query modifications indicating for example if employing one type of term tactic is likely to lead to adopting another type of tactic in subsequent queries. Whittle et al. (2007) use graph analysis to represent sequences of queries and semantic links between the queries. We follow their logic and visualize search sessions as nodes and edges. Figure 1 displays examples of search graphs. The graph labeled “a” is drawn based on the session in Table 1 while b, c and d visualize less straightforward search sessions.

Figure 1.

Examples of search strategy graphs

Conclusions

This poster presents an extension of past work in transaction log analysis in the field of image retrieval. We employ conceptual frameworks based on literature and conduct multivariate data analysis of the transaction log in order to analyze search strategies. We are able to consider all the coded aspects of query formulation and modification as factors in our analysis. They may also be employed as properties of the links between the nodes in the graphs. However, we are limited in that our analysis only considers the link between query n and query n-1, in other words, between a query and one previous query. The viewpoint change tactic overcomes this to a certain extent by allowing analysis of holistic query modifications. The log analysis described here will be complemented by interviews of users of the image database in order to gain more qualitative insight into searchers' cognitive processes behind the observed search behavior.

Ancillary