Developing a detailed view of query reformulation: One step in an incremental approach

Authors


  • Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee

Abstract

A key goal of current research on interactive information seeking is to develop personalized search systems that respond to individual user needs in real time. Ideally, such systems will provide customized recommendations that help the user generate more effective queries. This paper reports on one experiment in a larger study that tests the hypothesis that the visual scanning of ranked search results interferes with the user's ability to recognize potentially useful query terms. Our experiment uses the well-documented phenomenon of semantic priming – people recognize a word more quickly when they have just encountered (e.g., seen, heard, or thought about) a semantically related word. Recognition is slower when they have not been “primed” by such an encounter. Using methods from standard semantic priming experiments, we carefully manipulated the semantic relationships between words in a display, and asked participants to indicate their recognition of the words. We also varied the task we asked participants to complete when indicting recognition. This allowed us to measure differences in the level of semantic priming for each type of task. Our results show that in a visual scanning task, word recognition is not advantaged by semantic priming. The finding supports our hypothesis. The primary contributions of this research include: 1) demonstration of the value of the semantic priming paradigm in the study of query formulation and search interaction; 2) development of a new task specifically designed to tease apart the impact of visual and linguistic processing during query reformulation; 3) bringing a novel technique to information retrieval research for the study of interaction, as part of a general movement to apply techniques from cognitive science in the study of cognitive factors that affect search interaction.

INTRODUCTION

Much of what people do when using a search system is similar to interaction with any personal computer application: we look at the screen, process whatever we see on the screen, maneuver a cursor, click active buttons, and hyperlinks, and enter text. However, using a search system also involves formulation of a query, which is a unique and fundamental aspect of search interaction (assuming a text-based search system). Since the early days of information retrieval, query formulation has been recognized as a difficult task for users (Belkin, Oddy, & Brooks, 1991, 1982; Taylor, 1968; Swanson, 1960). One response to this difficulty has been the development of systems that suggest queries to users (Kelly 2009; White and Marchionini, 2007; Xu and Croft, 1996). The limited success of such suggestions (Anick, 2003; Kelly & Fu, 2006) has led to attempts to develop systems that provide customized query recommendations (Stamou & Mtoulas, 2009; Teevan, Dumais & Liebling, 2008). In order to develop more effective systems, we need a better understanding of how people go about the query reformulation process.

The research we report here is part of an ongoing project whose objective is to discover how interface design and the underlying functionality of the system hinders or facilitates efficient and effective reformulation. From a practical point of view, a descriptive model of query reformulation will help system designers develop more effective search systems. In order to investigate how the design of the system affects reformulation, we adopt methods from linguistics and psychology. These methods allow us to isolate and examine factors we hypothesize to be important in the model. Our current research focuses on the effect of interactive behavior, such as scanning a results list, and the use of lexical knowledge – the knowledge of word meaning and the associations between words (Miller et al., 1990). For example, salt and pepper are related in that they are widely used to add flavor to food.

We report here on one experiment in a larger study designed to test the hypothesis that the very act of visually scanning search results interferes with the user's ability to recognize potentially useful query terms. Successful query reformulation requires that the user select terms that represent an information need better than the prior query. The basic intuition behind our hypothesis is that the interaction with the system inhibits the user's ability to recognize relationships between words that they would otherwise notice.

The phenomenon that we study to assess recognition of related words is semantic priming, “an improvement in performance in a perceptual or cognitive task, relative to an appropriate baseline, produced by context or prior experience” (McNamara 2005, p. 3). The first convincing evidence of the existence of this phenomenon came from an experiment in which students were each shown three pairs of strings (Meyer and Schvaneveldt, 1971). One pair consisted of two related words, e.g., nurse-doctor. One pair consisted of two words that are not related, e.g., bread-door. One pair consisted of a word and a non-word, e.g., bread- marb. Each pair of words was displayed separately and the participants were asked to perform what has come to be known as a lexical decision task, i.e., to decide, immediately on seeing a pair of strings whether both strings are real words. The interesting result is the difference in the amount of time it took the participants to decide whether the pairs were words. On average, participants made the decision 85 milliseconds (ms.) faster when the words were related than when the words were unrelated. The finding that word recognition is advantaged when preceded by processing of a related word has been reproduced many times in many various experimental conditions, including in our own baseline experiments (Smith & Wacholder, 2010). The finding is generally interpreted as evidence that lexical knowledge is organized in the brain in a manner that makes it is easier (faster) to access related words in memory. The lexical decision task has proven to be a reliable basis for assessment because i) the response decision does not require that participants pay attention to the relationships between words and ii) with appropriate equipment, the time at which recognition occurs can be measured at the exact moment of the response.

In theorizing about search interaction, we consider the cognitive processing associated with tasks that require the use of language. If retrieving word knowledge from memory is advantaged (made faster than it would be otherwise) by recent exposure to related words, one might surmise that a user engaged in scanning a list of search results would be “primed” to notice related words suggested by the system for query reformulation. But people often don't pick up on related words (Anick, 2003; Kelly & Fu, 2006), even when they might be useful. This suggests that related words have little or no advantage during search interaction.

We note that our hypothesis is not a claim that semantically related words more valuable than any other type of word for query reformulation. Also, our experimental tasks do not fully duplicate or simulate the complex cognitive processing involved in interactive search and query reformulation ‘in the wild’. Instead, our intention is to isolate and examine specific effects in controlled conditions (Wacholder, in press). For example, we isolate the task of looking for a word on a screen and factors that affect that task. In our experimental design, we purposefully attempt to eliminate or control for other cognitive tasks and processes that are involved when a searcher scans a ranked results list.

The first part of this paper discusses lexical knowledge in relation to query formulation and reviews the literature on how people scan search results. Next, we describe the design of our experiment, for which we developed a novel decision task, which we use to differentiate lexical and visual processes. Then we report on our results.

BACKGROUND

The stage of the reformulation process on which we focus takes place immediately after the user has submitted a query to a ranked-list retrieval system and has received a list of results. If the user decides that these results are not helpful, the user will revise (reformulate) the query.

Figure 1.

Simplified model of search interaction

Figure 1 shows a highly simplified model of the typical interaction. A user has a need, formulates and submits an initial query, scans the results page, and then either clicks, re-queries, or quits.

Descriptively, query reformulation involves (at least) two cognitive domains, visual processing and lexical processing. Visual processing involves recognition of visual patterns that form contextually meaningful symbols. For example, in reading, a key part of visual processing involves the recognition that a pattern consists of a set of letters that form a word, as opposed to, for example, a picture. Lexical processing involves understanding the meaning of words; to do this, the user must retrieve word knowledge from the mental lexicon. From a linguistic perspective, lexical processing is distinct from and less complex than syntactic (grammatical) processing.

The user engages in visual processing when scanning a list of search results. The user engages in linguistic processing when retrieving possible query terms from the storehouse of word knowledge accumulated in memory, and when determining the meaning of words displayed on a screen.

In this section, we briefly review some of what has been learned recently about how users process search results visually and lexically. A key point is that visual and lexical processing take place very quickly, below the level of user consciousness.

Visual processing

Recent studies of query logs and visual scanning behavior indicate that searchers tend to make the very rapid decisions about their next search actions.

Eye-tracking studies support the view that searchers process results pages using fast and frugal heuristics. It is well established that when searchers scan a ranked list, they use the rank position of an item as a cue to the expected relevance of the underlying information source (Cutrell & Guan, 2007; Granka, Joachims, & Gay, 2004; Guan & Cutrell, 2007; Joachims, et al., 2005; Klockner, Wirschum, & Jameson, 2004; Lorigo, et al., 2006). Searchers typically scan only the top two items on the list before taking the next search action.

Two query log studies provide insight into the rapidity of search actions. In a large-scale query log study, Downy, Dumais, and Horvitz (2007) found that post-query action is most likely to occur within 55 seconds of a query submission. A click-through11 is the most likely next action in the first 15 seconds after submission. After 15 seconds, the probability of a re-query exceeds to that of a click-through and it is more likely than a click thereafter. The probability of a re-query peaks 20 seconds after submission. The probability of quitting grows over time, exceeding the probability of a re-query at 55 seconds, after which it is the most likely next action.

It is important to point out that Downey et al. defined a re-query as a query submission of any type after an initial query, including reformulation (a query with words that have been changed by the searcher) and page queries (a request for another section or page of a results list). In an earlier query-log study, Lau & Horvitz (1999) compared the probabilities for page queries and reformulation queries over time. They found that immediately after an initial query submission a re-query is most likely to be a page query (90%). This probability drops off over time, as the probability of reformulation grows. Twenty seconds after an initial submission, when the probability of a re-query peaks22 , about 45% of re-queries are reformulations. After 45 seconds, a page query and a reformulation are equally likely. Clearly, when reformulation occurs it is most likely to be completed within a minute of a prior query submission. These findings suggest that searchers make very rapid decisions about the words they use when revising queries.

In theorizing about query reformulation for purposes of modeling the process, we make several simplifying assumptions. We predicate the experiment reported here on the supposition that when searchers visually scan results pages, they are attempting to visually locate their immediately preceding query terms within the text on the page. Of course, professional searchers are trained to interact more strategically by scanning for related terms and new vocabulary. However, we are interested in the more typical user, someone who has been trained, as it were, by the ubiquity of the single-box-ranked-list search engine, to look at only of the top-most items on the results page. Our theory is that this type of searcher uses relatively fast, visual procedures during interaction, and that these procedures are often performed without conscious awareness or control.

Lexical processing

To-date, most research in information science has focused on sources of query terms external to the user; examples are the description of the information need, the list of search results, recently viewed web pages, query terms suggested by the search system, and search intermediaries (e.g., Kelly & Fu (2006), Vakkari (2002), Ferber et al. (1995), and Spink (1994)). However, there is another source of query terms internal to the user – the individual's own lexical knowledge.

In linguistics, lexical knowledge is conceived of as residing in the mental lexicon, the individual's storehouse of word knowledge (Aitchison, 2003). The connection between the mental lexicon and the query terms that an individual will choose to use is quite direct – if the individual has never encountered a word, or doesn't know what it means, the word is not available for (correct) use in a search query. Lexical knowledge includes, but is not limited to, word meaning, form (e.g., the plural of mouse is mice), spelling and pronunciation. The mental lexicon also stores the knowledge about word relationships that are manifested in semantic priming. Aitchison (2003, p. 86) lists four kinds of relationships: coordination, collocation, superordination, and synonymy. A variety of competing models have been proposed to account for the semantic priming phenomenon (see McNamara (2005) for a summary of the main issues), but in general they consist of different proposals about how knowledge about word relationships is stored so that lexical knowledge can be rapidly retrieved from memory.

Figure 2.

Sequence of screen displays and response in lexical decision task

In the next section, we report on an experiment designed to test the hypothesis that the process of visually scanning search results interferes with the user's ability to recognize the potential value of query terms that in other situations they might easily identify.

METHOD

In this section, we describe our experimental method in detail. Since readers of this paper may not be familiar with the semantic priming paradigm, we begin with an overview and provide explanatory detail.

As noted above, the standard methodology for measuring semantic priming is the lexical decision task. Here we refer to the variant we use as the word recognition task (WRT). During the task, a participant sees a sequence of simple computer screens (see Figure 2, above). The first screen displays a fixation point (a ‘+’ character), which draws the participant's eye to the center of the screen. Next, a real English word is displayed very briefly at the center of the screen (∼150 milliseconds); because it is processed first, this word is called the prime. A blank screen then flashes very quickly (∼50ms) after the prime disappears. Finally, a second string of letters is displayed; this string is called the target. The target can be a real English word or a pronounceable non-word. The participant must decide very quickly (within 1 second) whether the target string is a real English word (the lexical decision). The participant indicates the decision by pressing one of two buttons. The time taken between the initial display of the target string and the button press is called the response time (RT). The software and equipment used in the experiment allows us to measure RT in milliseconds (1000ths of a second). We used this version of the standard task in our baseline study.

In experiments using the standard task, participants complete the task repeatedly, responding to a controlled series of word-sets (explained below) for which the target strings are varied systematically. Each iteration of the task is called a trial. For each trial, a participant may see one of three possible types of target strings:

  • Related-word: the target is a real word that is related to the prime

  • Unrelated-word: the target is a real word that is unrelated to the prime

  • Unrelated-nonword: the target is a nonword

We measure the effect of the relationship between words by comparing response times when the target is a related-word (RTrw) to response times when the target is an unrelated-word (RTuw). The “semantic priming effect” is the difference between mean RT for the two types of targets. Generally, RT is faster when the target is a related word. Generally, responses to the non-word target strings are not a factor in the effect.

In semantic priming studies, the experimental materials (stimuli) are carefully designed to control relationships between strings in each display. On each trial, a draw is made from a small set of strings that have a known relationship to other words in the set. The strings that are drawn comprise the word-set for that trial. A word-set will always include at least one prime and at least one target.

Experimental materials

For purposes of our larger research program, we have developed a comprehensive set of experimental materials, which we organize in word families. Each word family consists of a set of strings that have very specific relationships. Word families are structured around targets, so that we can measure the effect of different types of primes on response times for the same target. Each target is associated with two word families. Table 1 shows the two word families that contain the targets “bones.” “Bones” is related to the primes in Word Family 6, and not to the primes in Word Family 13.

Table 1. Two word-families for the target “bones”.
Word Family#6#13
Target: relatedbonesorange
Target: unrelatedorangebones
Primefossillemon
Primedinosaurcitrus
Non-wordhurotege
Unrelated wordmirrorsoccer

Each word family comprises a related target, and unrelated target, two words that prime the related target, a non-word, and an additional word that is unrelated to both primes and both word targets. All of these strings are needed for balanced presentation of word-sets in our various experiments, though not all of these strings are used in this particular experiment.

We constructed 160 word-families in all. We continue to validate our materials in ongoing experiments. While building the word-families is time-consuming, we expect to reuse these materials in many future experiments.

Objectives

The goal of experiment reported here was to investigate how the availability of words in memory is affected by visually scanning to locate a word on a screen.

Table 2. Fourteen types of target pairs: example for the prime word CAT.
original image

Experimental design

In this experiment, we extended the standard WRT paradigm and controlled the type of relationship between prime words and target strings. This allowed us to measure the priming effect of related words under the two conditions produced by our second independent variable: the type of response decision requested of participants (detailed below). In our design, the type of response decision is a “between-subjects factor”; each participant was assigned to one of the two response tasks and responded to all word-sets in the experiment by completing only that single type of task. As with standard WRT, response time is our independent variable.

Trials: For both tasks, each experimental trial mirrored the standard WRT with a fixation point, a single word prime, a blank screen, and a target display. In this experiment all target displays contained two strings (described in detail below). For both tasks, participants completed their responses by pressing a key. We used the display timings from the baseline study for fixation points, primes, blank screens, and targets (see above).

Response-tasks: Participants assigned to the WRT task responded by indicating whether both strings in the target were real words in English. Previous studies of semantic priming have used similar two-string designs (Meyers & Schvaneveldt, 1971).

Participants assigned to the visual scanning task (VST) responded by indicating whether the prime word was present among the two strings in the target. We know of no prior studies that have used this particular design, however, several studies have found that semantic priming is reduced or eliminated when a response task involves visual scanning of the prime (for a discussion, see McNamara, 2005, pp. 117–122).

For both response tasks (WRT and VST), the positive response was a left-hand key press, and the negative response was made with the right hand.

Word-sets: All the targets displayed in this experiment were composed of two strings. Each string in a target pair could be one of four types: (1) a repeated prime, (2) a related word, (3) an unrelated word, or (4) a non-word. The four types of strings, and two possible orders for each combination of types, generated 16 possible types of target pairs, as shown in Table 2 above. None of the target pairs contained two repeated primes or two non-words, so 14 target pairs were used in the experiment. For this experiment, only responses to the four bold and shaded cells were analyzed. When displayed on the screen, the two strings were always centered one above the other.

In order to minimize potential biases in response time (McNamara, 2005), we controlled the frequency of presentation for each type. So that participants could not learn to anticipate the location of non-words and repeated primes, we controlled the probability of each type appearing in the top or bottom position; both positions were equally likely. For example, the probability that a non-word would appear in the top position was equal to the probability that it would appear on the bottom. As in our baseline study, each word family was assigned to one of four blocks, using the same blocks used in the baseline study. Within each block, word families were selected randomly without replacement for each trial, and within each family, word-sets were selected according to frequency controls.

Procedure. In each experimental session up to five participants worked simultaneously, each assigned to the same response task, and seated at adjacent workstations (see Figure 3). Tasks were pre-assigned to each session so that the types were rotated. Participants learned about the experiment from a prerecorded introduction, which told them that the study was about how people recognize words, and that results would help in the design of better search systems. After participants completed a consent form and demographic questionnaire, the experimental equipment was introduced. Using the equipment, participants read silently about details of the experiment and then practiced their assigned task on two blocks of 20 trials each. Questions were answered throughout the introduction and practice. Once the final practice block was complete, participants started the experiment, completing four blocks of 40 trails each. A short rest was give between blocks. Participants worked simultaneously, but not in unison. When a participant competed the final block, they sat quietly waiting for others to finish. Once all participants in the room were finished, they were thanked, paid, and dismissed. Participants received $10 for the 30-minute experiment.

Figure 3.

Laboratory setup for sessions with up to five participants

Data collection. The experiment was administered using Psychology Software Tools, Inc.'s e-Prime software (e-prime) running on Hewlett-Packard tablet computers. No other applications were active in the system during the experiment. Dell flat-screen monitors displayed the word-sets. A five-key serial response box recorded key-presses in milliseconds. In order to minimize timing differences due to hardware configurations, all hardware was permanently assigned to each workstation, so that all participants using a station used the same equipment.

Participants

112 participants were recruited from the general student population of a large mid-Atlantic university, including graduate and undergraduate students. All were over 18 years old. Two volunteers who attended an experimental session were ineligible because they did not meet the criteria as native English speakers. The remaining 110 spoke English frequently either before or during their elementary and high school educations, considered themselves native speakers, and did not report any history of dyslexia or other learning difference. 55 participants were assigned to each task.

Response accuracy was computed for each participant. For the WRT, the mean accuracy rate was 93.3% (s.d. 5%). Mean accuracy for the VST was 96.4% (s.d. 8%). The difference between the rates is statistically significant (two-sample t(104)=2.4, p<.05). One participant in the VST group had a mean accuracy rate of 44% (6.9 s.d. from the mean (93.3%) for VST); we excluded all 160 trials from the analysis. 72 trials with zero response times were also removed; 66 from the WRT and 6 from the VST. A zero response time occurs when a participant fails to respond within the 1-second time limit

Analysis

As with the standard WRT, we are interested in the effect of relationships between words, and not in the effect of non-words and repeated primes. Thus, we analyze only the trials in which (1) the target contains only real words, (2) neither word repeats the prime, and (3) one of the following three conditions is true:

  • Aboth words are related to the prime (kitten and tiger in Table 1), or
  • Bone word is related to the prime and the other is unrelated (glive and kitten or in reverse order, kitten and glive), or
  • Cboth words are unrelated to the prime (army and table).

RESULTS AND DISCUSSION

Our results show that there is strong evidence of semantic priming in the word recognition task (WRT) but no evidence of semantic priming in the visual scanning task (VST). This constitutes evidence that the WRT invoked both visual and lexical processing, while the VST invoked only visual processing. Figure 4 illustrates the overall difference in response times for the two types of tasks.

Figure 4.

Comparison of response times for the word recognition task and the visual scanning task

We analyzed the data using one-way ANOVAs, one for each task type, treating participants as a random factor in the model, which included only main effects.

Not surprisingly, we find strong evidence of semantic priming in the word recognition task. There is a main effect for target-type: F(2, 2857) = 62.32, p<.001. A Tukey post-hoc HSD test indicates that response times were significantly different for each of the three types of targets Figure 4 shows that response times were fastest for targets in which two related-words appeared (labeled A on the chart). WRT RT-A was significantly faster than both other types (MRT-A = 877 ms., CI [863 ms., 891 ms.], p<.001). Response times were slowest for targets in which two unrelated words appeared (labeled C on the chart; MRTC = 975 ms., CI [965 ms., 985 ms.], p<.001). For targets with one related word and one unrelated word, RTs were significantly different and were in between those for the other two types (MRT-B = 936 ms., CI [922 ms., 950 ms.], p=.001); the presence of only one related word among two produced a significant priming effect.

In contrast, for the visual scanning task, we find no evidence of semantic priming. There is no effect for target-type, F(2, 2827) = 1.78, p=.187.

Over all trials analyzed, response time for the WRT (MRT-WRT = 941 ms.) was significantly slower than for the VST (MRT-VRT = 631 ms.), F(1, 5795) = 2278, p<.001. Clearly, it takes more cognitive processing, hence more time, to determine whether two strings are real words than it does to determine whether a string is present on a visual level.

Our results show that in a visual scanning task, word recognition is not advantaged by semantic priming. The finding supports our hypothesis.

In our theorizing, we assume that when searchers scan results pages they spend at least some of their time and attention on determining whether their query terms appear on the page. We theorize that this involves cognitive processing akin to the visual scanning task in our experiment. As discussed above, we know that searchers scan ranked results pages very quickly and in a rank-dependent pattern. We propose that this learned visual scanning behavior has the effect of reducing the utility of typical query suggestion mechanisms. We conclude the paper by considering the implications for this finding and future work.

IMPLICATIONS AND NEXT STEPS

Broadly, our research endeavors to describe, in detail, the lexical and visual processes searchers use during interaction. We report here on the first experiment in a planned incremental series of experiments.

Search interaction is complex and we humans have limited cognitive resources. The trade-offs involved in using fast visual “short-cuts” and more taxing semantic processes (Gwidzka, in press) are no doubt affected by many factors, each of which must be investigated in turn.

Because this experiment is only one step in a longer series, we are reluctant to speculate on the implications of our findings for search system design. After many experiments, we may find that query reformulation (as performed by the searcher, not the system) is best supported by a system that somehow impedes speedy automatic visual scanning, but leaping from our finding to that conclusion is clearly too far a stretch.

We have demonstrated the measurement of semantic priming as a technique for understanding the effect of task demands during search interaction. As our work proceeds, we will examine priming effects in other controlled and isolated task conditions. For example, we know that searchers don't simply look for words that have been flashed on a screen; they look for words that they themselves have used to describe their own internal information needs. Our next experiments will investigate how priming is affected when participants produce their own “prime” words and then scan for them in the target.

In extending the external validity of our experiments, we will test our assumptions about what searchers look for when visually scanning a results page. As part of this goal, we are interested in how characteristics of the display of search results affect recognition of related words. Much of this work will combine measures of semantic priming and eye tracking, so that we can learn in detail about the behaviors searchers use during interaction with results pages, and how those behaviors are affected by the characteristics of the system,

And of course, we must study the relationship between semantic priming and the searcher's selection of words in query reformulation. This work will focus on the role of prior word knowledge, in conjunction the impact of words to which the user has been exposed. For example, we are interested in how a description of an information need affects the subsequent selection of query terms in an experimental setting.

CONCLUSION

The primary contributions of this research include: 1) demonstration of the value of the semantic priming paradigm in the study of query formulation and search interaction; 2) development of a new task specifically designed to tease apart the impact of visual and linguistic processing during query reformulation; 3) bringing a novel technique to information retrieval research for the study of interaction, as part of a general movement to apply techniques from cognitive science in the study of cognitive factors that affect search interaction.

Acknowledgements

We thank our many patient and attentive participants. We also thank our anonymous reviewers for their thoughtful and constructive comments.

This research has been funded by a grant from Google.

This research was performed while the first author was a post-doctoral researcher at Rutgers University.

Footnotes

  1. 1

    A click-through occurs when a searcher clicks on an active hyptertext link (url) in the search results.

  2. 2

    According to Downey et al, see above

Ancillary