Users engaged in information search often reformulate or modify their queries. This paper reports on an investigation of how task type and task situation influence users' query reformulation behavior. A controlled experiment was conducted with 48 participants, each working on six web search tasks classified into three types according to the task structure: Simple, Hierarchical and Parallel. We developed a taxonomy of query reformulation and used an automated method to detect the reformulations. Our results showed that Specialization was most frequently used in Simple tasks, and Word Substitution was most frequently used in Parallel tasks. After visiting and saving a useful web page, Generalization was less likely to be used while New query was more likely to be used. We also found that the effectiveness of each query reformulation type varied in different task types. The results of this study demonstrate the effect of task type on users' query reformulation behavior and have implications for the design of query suggestions that are offered to users during searching.
In a typical query-based online information search, searchers enter a query into the search system, get a result list from the system, and evaluate search results; during the search, they may modify or reformulate previous queries until their information goals are accomplished. In search log analysis, it is found that users often modify or reformulate their queries. Ozmutlu (2006) found about 28% of queries were reformulations of previous queries; Jansen, Spink, Blakely and Koshman (2007) reported that about 37% of search queries were reformulations when repeated queries were not considered. Query analysis in user experiments also found that the more complex a search task, the more query reformulations users would issue in the tasks (Li, 2008).
Query reformulation is an iterative process between users and search systems in which users engage to find useful information that could satisfy their search goals. Current search systems often consider users' multiple queries in one search task as separate queries, but in fact, all the queries in one search task are related to the search task the user is working on. In addition, users may change their queries based on their judgment of the previous search result. Understanding how people reformulate their queries in different tasks or under different situations may help search systems provide better query suggestions and improve search results.
The goal of this study was to examine the usage of query reformulation types when users were engaged in pre-designed search tasks. The contextual factors we considered in this study included task type and users' satisfaction with previous search results. We also investigated the influence of these contextual factors on the usage of each query reformulation type, and its effectiveness. The results of this study will enhance our understanding of the influence of contextual factors on how people reformulate their queries, and on which type of reformulation is more effective in different circumstances.
Reasons Causing Query Reformulation
Fidel (1985) observed that searchers made query reformulations to improve query performance in three problematic situations: (1) retrieved sets were too large; (2) retrieved sets were too small; or when (3) retrieved sets were off-target. Hsieh-Yee (1998) analyzed users' search tactics in three situations: 1) at the starting point; 2) when too many items were retrieved; 3) when nothing useful was retrieved. With respect to query reformulation, she found that adding terms to the first query, using a more specific term, or trying a different search term were the three main tactics when too many items were retrieved; and using another term, using another search engine or asking help from others were the main tactics when nothing relevant was retrieved.
Shenouda (1991) found in his user studies that most searchers developed their initial search by selecting “more general terms”, and then adding new appropriate synonyms and more specific terms as reformulations. He also found that the most frequent reason for searchers to reformulate queries was when all documents examined in a subset had been judged irrelevant.
In this study, we compared users' query reformulation types in two situations defined by previous search results: 1) when some useful information was found in the previous search results, and 2) when no useful information was found in the previous search results.
Classification of Query Reformulation Types
Fidel (1985) classified query reformulation moves into two categories: operational moves and conceptual moves. Operational move referred to query modifications with the same meaning, while conceptual moves changed the meaning of query components. Rieh and Xie (2006) expanded Fidel (1985)'s classification, and identified three facets of query reformulation: content, format, and resource; each facet had several sub-facets. With respect to content change, they analyzed whether users attempt to narrow or broaden the meaning of their queries, and classified them into four sub-facets: specification, generalization, replacement with synonyms, and parallel movement. Such analyses provide a comprehensive description of how people reformulate their queries, but these taxonomies are difficult to implement as automatically applied algorithms.
Recently, researchers have developed several taxonomies of query reformulation types based on automatic detection methods applied to query log analysis. Using an Excite search log, Lau and Horvitz (1999) developed a method to automatically classify queries into four mutually exclusive types. The query reformulation types within one search task included: Generalization, New, Reformulation and Specialization, based on the change of query content and query length, and when identical queries were ignored. This taxonomy was also used by He, Goker and Harper (2002) and Jansen et al. (2007) to detect search boundaries in search logs automatically, and the query reformulation type of New was often considered as an indicator of a start for another search session. Huang and Efthimiadis (2009) developed an extensive taxonomy, in which they identified 12 query reformulation types. In addition to three of the types identified in Lau and Horvitz (1999): remove words (same as Generalization), add words (same as Specialization), and Word Substitution (same as Reformulation); they also detected other types including word reorder, stemming, abbreviation and others. All these analyses were conducted in server-side query log analysis, while our study applies their method to the detection of query reformulation type in a client-side search log captured in a controlled experiment. Identifying query types automatically allows researchers to analyze a large sample of search logs in a short time. It also enabled search systems to detect, in real time, query reformulation type and then to provide search assistance to users when necessary.
Factors Influencing Query Behaviors
In prior work, researchers have examined the effects of some contextual factors on users' query behaviors. For example, it has been shown that search experience and domain knowledge affect search tactics (e.g. Hembrooke, Granka, Gay & Liddy, 2005; Hsieh-Yee, 1993; Wildemuth, 2004).
The effect of task type on users' query behaviors has also been examined. Hsieh-Yee (1998) compared users' search tactics in four types of search tasks. However, no significant difference was found between searching for texts and graphic information, nor between known-item searches and subject searches. Toms et al. (2008) examined how query behavior differs according to two types of task structure and three types of task information goals. The two types of task structure were hierarchical and parallel, and the three types of task were: decision-making (DM), fact-finding (FF) and information-gathering (IG). They found that users formulated fewer queries but took a longer time to process the results of a query for Hierarchical tasks than for Parallel tasks. This result indicated that Hierarchical tasks required more effort considering most metrics other than number of queries. They also found that DM and FF tasks involved more queries than IG, and that users were more likely to add additional, unprompted terminology in IG than in DM. Liu, Gwizdka and Belkin (2010) investigated the effect of task type and one of users' cognitive abilities (working memory) on query reformulation behaviors. Their results revealed that only task type had a significant effect on query reformulations, and users' working memory did not have significant effect on users' query reformulation behaviors.
Learning from prior studies, the current study focused on the influence of task type (defined by task structure) on users' query reformulation types and their effectiveness.
Effectiveness of Query Reformulations
When the search results of a query contain relevant or useful information for the user, the query can be considered as an effective query. Huang and Efthimiadis (2009) analyzed an AOL search log, and examined the effectiveness of query reformulation types using different click patterns. They regarded clicking action after a query as an indicator of search relevance, and then compared the ratio of occurrence of each reformulation type leading to a click to the occurrence of that reformulation type leading to no clicking any search results. They found that substring and superstring reformulations were least helpful; and add words, word substitution, stemming, spelling correction, and expand acronym were most helpful.
However, assuming all pages clicked by a user are all useful to the user is problematic, because from our Web search experience we know we may judge a page not useful after clicking and reading it. In our experiment, participants were asked to bookmark and tag useful Web pages that can help them to complete their assigned search tasks, and in this sense a reformulated query is effective if it leads to tagging useful Web pages.
In general, this study examined the influence of task type and situations of the previous query on the usage of each reformulation type, as well as the effectiveness of each reformulation type. Specifically, there are six research questions in this study:
1Does task type have an influence on the usage of each reformulation type?
Considering two types of situation of the previous search results (at least one useful item; no useful items),
2Which situation is more likely to lead to query reformulation?
3Does the situation influence the usage of each reformulation type?
4In each situation, does task type influence the usage of each reformulation type?
Evaluate the effectiveness of query reformulation,
5What is the general effectiveness of users' query reformulations in different types of tasks?
5Does task type influence the effectiveness of each query reformulation type?
In order to address these problems, we first constructed a taxonomy of query reformulation types used by participants in our experiment, based on prior studies.
Forty-eight subjects participated in a question-driven, web-based information search study conducted in a controlled experimental setting. Participants were university students, from undergraduate and graduate programs. Most participants were very frequent Web searchers and only one person searched the Web relatively infrequently: once or twice a week11 .
Each study session took an hour and a half to two hours and was conducted in a university lab on a personal desktop computer running the Microsoft Windows XP operating system. Each session consisted of the following steps: an introduction to the study, consent form, search task practice, background questionnaire, six search tasks, and post-session questionnaire. The searchers bookmarked and tagged the web pages that they considered useful in completing the search task. User interaction with the computer (visited and bookmarked URLs, mouse and keyboard events, and video from a screen cam) was recorded using Morae software22 . The start and end of each search task was controlled by an external program that was used to start and end a Web browser session (Internet Explorer version 6).
The study search tasks were designed as questions that described what information needed to be found and provided a context for the search. The tasks were designed to differ in terms of their structure. A total of twelve questions were used in the study. Eight tasks were created by Toms and her colleagues (Toms et al., 2008), and four were created by us. Three types of tasks were designed according to the structure of the underlying information need (Toms et al., 2008), 1) Simple (S), where the information need is satisfied by a single, independent piece of information (by definition, simple task is of the fact finding type); 2) Hierarchical (H), where the information need is satisfied by finding multiple characteristics of a single concept; this is a depth search, where a single topic is explored; 3) Parallel (P), where the information need is satisfied by finding multiple concepts that exist at the same level in a conceptual hierarchy; this is a breadth search. The tasks were constructed using Simulated Work Task Situations (Borlund, 2003). The simulated situations were created by using task scenarios that provided participants with a search context and a basis for relevance judgments. A sample task of each type is shown as follows:
Table 1. Definition of five query reformulation types
Note: Qi+1 is the query immediately following the query Qi in the same session.
Qi and Qi+1 contain at least one term in common; Qi+1 contains fewer terms than Qi
“harmful chemicals in food” → “chemicals in food
Qi and Qi+1 contain at least one term in common; Qi+1 contains more terms than Qi
“2007 car” → “2007 car sales”
Word Substitution (WR)
Qi and Qi+1 contain at least one term in common; Qi+1 has the same length as Qi, but contains some terms that are not in Qi.
“castle in canada” → “fortress in canada”
Qi and Qi+1 contain exactly the same terms, but the format of these terms may be different
“Danmark fortress” → “fortress, danmark”
Qi and Qi+1 do not contain any common terms
“anthill” → “ant bites”
Simple task: You love history and, in particular, you are interested in the Teutonic Order (Teutonic Knights). You have read about their period of power, and now you want to learn more about their decline. What year was the Order defeated in a battle by a Polish-Lithuanian army?
Hierarchical task: You recently heard about the book “Fast Food Nation,” and it has really influenced the way you think about your diet. You note in particular the amount and types of food additives contained in the things that you eat every day. Now you want to understand which food additives pose a risk to your physical health, and are likely to be listed on grocery store labels.
Parallel task: Friends are planning to build a new house and have heard that using solar energy panels for heating can save a lot of money. Since they do not know anything about home heating and the issues involved, they have asked for your help. You are uncertain as well, and do some research to identify some issues that need to be considered in deciding between more conventional methods of home heating and solar panels.
Query Reformulation Type
Based on prior work in query reformulation type detection, we identified five reformulation types according to the common terms and query length difference in two successive queries. The taxonomy of five reformulation types is shown in Table 1. This taxonomy extends Lau and Horvitz (1999)'s taxonomy by considering Repeat queries, where the reformulated query uses the same query terms but only adds/deletes operators or punctuation. No other detailed reformulation types provided by Huang and Efthimiadis (2009) were detected in our comparatively small sample of experimental data.
There were 48 participants in this experiment, each of whom conducted 6 search tasks, thus, there were 288 search sessions in total. In 98 sessions participants issued only one query. These sessions did not contain query reformulations, and thus we did not consider them. All other sessions (190) that contained at least two queries were analyzed.
Overall Frequency of Query Reformulation Types
There were 712 reformulations issued by participants in all tasks, in which 116 (16.29%) were issued in Simple tasks, 257 (36.10%) were issued in Hierarchical tasks, and 339 (47.61%) in Parallel tasks. Parallel tasks contained the most reformulations and Simple tasks contained the fewest reformulations.
For the following comparison analysis, the frequency of each reformulation type for each search session was calculated. Since the frequency of each reformulation type was not normally distributed, non-parametric tests were used in examining significance of differences.
The mean frequency of each reformulation type when all tasks were considered is shown in Figure 1. Specialization (32.30%) was the most frequently used reformulation type, followed by Word Substitution (22.48%), Generalization (22.42%) and New (19.52%); Repeat (2.14%) was rarely used.
Task Type Effect on Reformulation Type
First we examined the task type effect on the usage of reformulation types. Figure 2 shows the frequency of each reformulation type in the three types of tasks. Kruskal-Wallis H tests revealed that two reformulation types were used significantly differently in the different types of tasks, Specialization (H=18.88, p<.01) and Word Substitution (H=21.12, p<.01). No significant difference was found in the usage of Generalization (H=1.22, p=.54), New (H=2.50, p=.29), or Repeat (H=5.44, p=.07).
A post-hoc analysis using Tamhane tests found that participants used Specialization in Parallel tasks significantly less frequently than in Simple and Hierarchical tasks (p<.05); and that they used Word Substitution more frequently in Parallel tasks than in the Simple and Hierarchical tasks (p<.05).
Query Reformulation after Two Situations
In this section, we compare the usage of query reformulation type in two situations: 1) when no page was tagged in the previous search results (after not tag); 2) when some pages were tagged in the previous search results (after tag).
Occurrence of query reformulations in two situations
First we compare the percentage of query reformulations in the two situations considering all tasks and then for the three types of tasks separately. As shown in Figure 3, 64.04% of reformulations were issued after no tag when all tasks were considered. However, when considering different task types, the patterns varied for the different types of tasks. In Simple tasks, 80.17% of all reformulations were issued after no tag; in Hierarchical tasks, about 67.32% of all reformulations were issued after no tag; while in Parallel tasks, 56.05% of all reformulations were issued after no tag.
Comparison of usage of each reformulation type in two situations
In this part, we examine the effect of situation on the frequency of each reformulation type. The frequency was not normally distributed in the two situations, so Wilcoxon Signed-Rank Tests were conducted to compare the frequency of each reformulation type in the two situations. When all tasks were considered, it was found that Generalization was significantly more frequently used in “after not tag” situation than in “after tag” situation (Z=−3.25, p<.01); and New was significantly more frequently used in “after tag” situation than in “after not tag” situation (Z=−3.29, p<.01); all other reformulation types were not significantly different in the two situations (shown in Table 2).
Here we examine whether the same pattern holds in each type of task. As shown in Table 2, no reformulation type was found to be significantly different between the two situations in Simple and Hierarchical tasks. For Parallel tasks we found the same pattern as for all tasks: comparing “after tag” situation with “after not tag” situation, Generalization was less frequently used (Z= −3.25, p<.01) and New was more frequently used (Z= −3.29, p<.01).
Comparison of query reformulation usage in each situation among different types of tasks
Here we examine whether the percentage of each reformulation type is the same across different types of tasks in each situation. The results of Kruskal-Wallis tests are shown in Table 3. The results demonstrate that Specialization and Word Substitution were the only two reformulation types that were significantly different across the three types of tasks in both situations. A post-hoc analysis using Tamhane tests found that, in the “after not tag” situation, Specialization was more frequently used in Simple tasks than in Parallel tasks, and Word Substitution was more frequently used in Parallel tasks than in Simple tasks. In the “after tag” situation, Specialization was more frequently used in Simple tasks than in Parallel tasks, and Word Substitution was more frequently used in Parallel tasks than in Hierarchical tasks.
Table 2. Comparison of the usage of each reformulation type in two situations
Table 3. Comparison of the percentage of each reformulation type in three types of tasks in both situations
Effectiveness of Query Reformulation Type
Some users' reformulations led users to finding useful pages and tagging behaviors, while some reformulations were not good enough to lead to tagging. The reformulations that led to tagging were considered as “effective” reformulations. In this part of the analysis, the percentage of effective reformulations and the effectiveness of each reformulation in different types of tasks are examined.
The percentage of effective query reformulations
First, we calculated the effective rate, which was defined as the percentage of effective query reformulations among all reformulations in all tasks and in each of the three types of tasks (shown in Table 4). In general, approximately half of all reformulations were effective when all tasks were considered. Parallel tasks had a slightly higher percentage (50.74%) of effective reformulations than non-effective reformulations. Other task types had a lower rate of effective reformulations (approx 44%).
Table 4. The percentage of effective reformulations
Number of effective reformulations
Number of all reformulations
Percentage of effective reformulations
Comparison of effectiveness of each query reformulation type
In this part, the effective rate of each query reformulation type is calculated, and then it is compared with the effective rate of all other query reformulation types except the current type, in order to evaluate the relative effective rate for each query reformulation type. For example, for Generalization when all tasks are considered, the effective rate of Generalization is 39.88%; and the effective rate of all other query reformulation types except Generalization is 49.54%; therefore, the percentage of the effective rate of Generalization over other reformulation types is −0.19. This means that compared with other reformulation types, Generalization less frequently leads to a tag. The results are shown in Table 5.
When all tasks were considered, New and Word Substitution prove to be more effective compared with other types of reformulations. Between these two reformulation types, Word Substitution has a higher effective rate than others.
In Simple tasks, New, Specialization and Word Substitution are relatively more effective. Among these three reformulation types, New has a higher effective rate than others. In Hierarchical tasks, Generalization, New, and Specialization are relatively more effective. Among them, the effective rate of Generalization and Specialization is only a bit higher than others, and only New has a much higher effective rate than others. In Parallel tasks, Word Substitution and New are relatively more effective, and among them, the effective rate of Word Substitution is much higher than the others.
Task Effect on Query Reformulation Behaviors
The results of this study show that there is a significant effect of task type on users' query reformulation behaviors. In general, Simple tasks contained the smallest number of query reformulations, Hierarchical tasks had a few more query reformulations, and Parallel tasks contained the largest number of query reformulations. This could be partially explained by the quantity of information needed for the tasks. In Simple tasks, users were asked to find one piece of information, and in the other two types of tasks, users needed to find multiple pieces of information. Therefore, users had to issue more queries for Hierarchical and Parallel tasks. This pattern is similar to previous results in Li (2008), the results of which indicated that the more information was needed to complete the task, the more query reformulations were issued during searching.
Task type was also found to influence the frequency of the two query reformulation types: Specialization and Word Substitution. Specialization was more frequently used in Simple and Hierarchical tasks than in Parallel tasks. Word Substitution was used more frequently in Parallel tasks than in Simple and Hierarchical tasks. This result is reasonable since users were searching for one specific piece of information in Simple tasks, and their queries should be very specific in order to find that piece of information. In Hierarchical tasks, the reformulated queries were sometimes dependent upon previous search results and users needed to find more specific information in their latter queries. In contrast, in Parallel tasks, where users searched for similar information about several related concepts, sometimes they only needed to change a single concept and did not need to change other query terms. These results suggest that search systems might be able to distinguish different types of search tasks through observing the percentage of some query reformulation types.
Table 5. Comparison of the effective rate of each reformulation type against all others
Situations' Effects on Query Reformulation Behaviors
Our results revealed that over half of all query reformulations were issued when the previous search results did not contain any useful pages. The patterns were different in the three types of tasks. In Simple tasks, a majority of all query reformulations (about 80%) were issued when the previous search results did not contain any expected information, and this percentage was higher than in the other two types of tasks. This result is reasonable because in Simple tasks users often stop searching after finding the required information, while in Hierarchical and Parallel tasks, users are searching for more than one piece of information, so they are more likely to continue searching even after they found some information.
Two query reformulation types were found to have significantly different usage frequencies in the two situations. Generalization was used more frequently when no useful information was found than when some useful information was found in the previous search results. A likely explanation for this is that Generalization was often used to generalize the meaning of query and to broaden the search results, and users were more likely to adjust their search scope when they did not find any useful pages.
The results also revealed that New was more frequently used when useful information was found than when no useful information was found in the previous search results. New is the type of query reformulation when the reformulated query does not contain any terms in common with the previous query. This type of reformulation has often been considered as the start of another search session in search log analysis (e.g. Jansen et al., 2007; Huang & Efthimiadis, 2009). However, in our experiment, New was also frequently used when users were searching for the same search task within one search session. The query terms in New reformulations may come from previous search results or from users' background knowledge. Therefore, it is reasonable that when users found some useful pages, they used a higher percentage of New to reformulate their queries.
When comparing the frequency of each query reformulation type between the two situations in the three task types, the frequency of Generalization and New between the two situations was significantly different only when Parallel tasks were considered. No reformulation type was found to be used differently between the two situations in Simple and Hierarchical tasks. This indicates that the situation of the previous search results affected the usage of reformulation types only in Parallel tasks. The frequency of Generalization and New in the two situations demonstrated a similar pattern in Hierarchical and Simple tasks as in Parallel tasks, but the frequencies were not significantly different between these two types of tasks. A potential reason is that the sample size of these two reformulation types in Parallel tasks was larger than Simple and Hierarchical tasks, and hence it was more likely to yield significant results.
In addition, we also examined the task type effect on the frequency of each reformulation type in each of the two situations. As shown in results of the task type effect on the overall frequency of each reformulation type, Specialization and Word Substitution were the two reformulation types that were found to be different across the three types of tasks. These results indicate that task type had a more important effect on the frequency of each query reformulation type than the type of situation.
Effectiveness and Usage of Query Reformulation Types
Our results show that for all tasks and for each type of task individually considered, approximately half of the query reformulations led users to finding some useful information. Thus, about half of all of the users' reformulated queries were effective queries.
Then we compared the effective rate of each query reformulation type with other reformulation types to see if one reformulation type was more effective than others. In general, when all tasks were considered, New and Word Substitutions were more effective than other reformulation types, and Word Substitution had a higher effectiveness rate than others.
In Simple tasks, New, Specialization and Word Substitution were relatively more effective than others. Compared with the overall percentage of each reformulation type in Simple tasks, Specialization was the most frequently used reformulation type, and it also had a high effective rate.
In Hierarchical tasks, only New had a much higher rate of effectiveness than other reformulations. This result can be explained by the task requirement in Hierarchical tasks, which required in-depth searches on one concept, and the searches for multiple facets of the concept were often dependent on what the user found in the previous search result; and New query sometimes meant the user had found some new information relevant to the task.
In Parallel tasks, New and Word Substitution were shown to be more effective than other reformulation types; especially Word Substitution, which had 62% higher effective rate than others. This pattern has also been found to be most frequently used in this type of task.
Implications and Future Studies
Understanding how users reformulate queries for different types of tasks and their effectiveness can help search systems provide better query suggestions for users. Results of the current study show that task type had an effect on the usage of query reformulation types and their effectiveness. Capturing the frequency of each query reformulation type may help systems identify the type of search task a user is working on, and then provide effective query reformulation suggestions accordingly.
The current study showed that the effectiveness of different reformulation types varied with the task type. This result indicates that search systems should provide different types of query reformulations according to the type of search task. In addition the usage of reformulation type seems to be related to its effectiveness; for example, Word Substitution was the most effective type in Parallel tasks, and it was also the most frequently used type in that task. Future studies will continue to examine whether users are able to select effective query reformulation types according to their search context, and whether any individual differences (e.g. domain knowledge, cognitive ability, etc.) influence how users select effective query reformulation types based on the search tasks they are performing.
The five query reformulation types we identified in this study only considered common term usage and the change of query length in the reformulated queries. Whenever there was no common term used in the reformulated query compared with the previous query, we classified it as New, as it was treated in Lau and Horvitz (1999) and Jansen et al. (2007). But it is possible that the New terms were related to the previous queries, for example, using acronyms or synonyms. For a detailed examination of how users reformulate queries, further studies will examine the relationship between the terms in the reformulated query and in the previous query, and the source of the new terms.
In this paper, we examined the relationship between task type, situations of previous search results and query reformulation behaviors. The results demonstrate that Specialization was more significantly used in Simple and Hierarchical tasks than in Parallel tasks, and that Word Substitution was more significantly used in Parallel tasks than in Simple and Hierarchical tasks. These patterns also held in the two situations of previous search results. The comparison of the usage of each reformulation type between the two situations revealed that Generalization reformulation was less likely to be used, while New query was more likely to be used after visiting and saving a useful web page. With respect to the effectiveness of each query reformulation type, the results show that in Simple tasks, New and Specialization were relatively more effective. In Hierarchical tasks, New was more effective, and in Parallel tasks, Word Substitution was the most effective query reformulation type. While these findings can only be generalized to the task types and search result situations as defined in our controlled experiment, they suggest that there are likely to be similar results when considering the same task types in other experiments.
This research was supported by IMLS grant LG-06–07–0105–07.
Another ASIST 2010 paper from our group (Liu, J., Gwizdka, J., Liu, C., Belkin, N.J. (2010). “Predicting task difficulty for different task types”) focuses on task difficulty and presents an analysis of different data collected in the same experiment.