Studying scatter/gather browsing for web search

Authors


Abstract

This study aims to investigate the effectiveness of Scatter/Gather browsing for information retrieval. We conduct a within-subject study of 24 subjects, in which each user conducts searches on a Scatter/Gather system and a classic web search system, and provides feedback and comments. Eleven out of twenty four subjects consider that Scatter/Gather system helps them complete the tasks more effectively as it sorts the results and filters out useless information. We discuss strengths and weaknesses of Scatter/Gather compared to a classic web search interface and examine influences of topic characteristics on Scatter/Gather effectiveness. It is suggested that topic familiarity and specificity have significantly influences on several categories of user perceived retrieval effectiveness. The influences appear to be greater with the Scatter/Gather system. We find that viewing time has a moderate positive correlation with relevance and usefulness of a web page, which are highly correlated to each other.

1. INTRODUCTION

Scatter/Gather is a document browsing and information retrieval method using text clustering as its primitive operation (Cutting et al., 1992). Different from keyword-based search, a Scatter/Gather system scatters a dataset (e.g., of web pages) into a small number of clusters/groups and presents short summaries of them to the user. After the user selects one or more clusters he or she is interested in, the system gathers related documents together and scatters them into multiple clusters again.

Scatter/Gather iterations may lead to a more focused data subset from which relevant information can be identified. This technique is potentially useful for information access with non-specific goals, for which query formulation is often difficult. Due to the computational complexity of text clustering, however, Scatter/Gather has not been widely adopted in large-scale IR systems and its effectiveness on the web has not received sufficient research attention.

Our research has focused on the investigation of efficient methods for large-scale clustering operations and the use of Scatter/Gather in supporting retrieval effectiveness (Ke et al., 2009). This research aims to study Scatter/Gather effectiveness in general web search tasks. One objective is to investigate influences of user and topic characteristics on users' searching behaviors in Scatter/Gather browsing iterations.

2. RELATED WORK

Searching and browsing are basic paradigms of information retrieval. While searching primarily relies on text queries, browsing supports user exploration of an information space without explicit articulation of information needs. In many situations, however, an information need may not be easily and accurately translated into a set of query terms. Bates (1989) argued that the classic IR paradigm of querying confines the user from a more flexible approach to information need representation and proposed the Berrypicking approach to dynamic information exploration, selection, and collection over the course of an evolving search.

Following the Berrypicking approach, Pirolli and Card (1998) developed the information foraging theory in which users follow information scents in the seeking, collection, and use of online information. Observing the different search task levels from “known-item” lookup to investigative tasks, research has proposed the exploratory search framework for better supporting user searching and learning (White et al., 2008).

The Scatter/Gather modality was proposed to facilitate user articulation of information needs through iterative clustering and interactive browsing (Cutting et al., 1992). In each iteration, the system presents a set of clusters (main topics) to the user based on the information space being explored. The user can pick one or more clusters he is interested in and re-cluster the subset. After each step, a “query” is better defined with the user's input. The system and the user can thus achieve a better mutual understanding about what is needed and together identify relevant information. This also helps the user explore the inherent associations among documents and topics in the information collection being served, enabling exploratory learning (Hearst and Pedersen, 1996).

The effectiveness of Scatter/Gather depends on a variety of factors associated with the nature and context of an information need. In a recent user study, users found Scatter/Gather helpful in some search situations but not in others (Ke et al., 2009). In particular, topic specificity and familiarity appeared to affect the user's perception of Scatter/Gather effectiveness in exploratory searches.

We reason that known item searches are specific and easy to formulate queries for whereas exploratory searches often involve topics that are broad (less specific) and hard to define. A user's familiarity with a search topic may also influence how the user explores, investigates, and learns in the search process.

In addition, learning is a major component of exploratory searching. To understand the potential of Scatter/Gather browsing in supporting exploratory investigation, it is essentially important to evaluate user learning and understanding in related search tasks.

3. RESEARCH QUESTIONS

In this research, we are interested in the following research questions11 based on above discussions:

  • 1.Is Scatter/Gather browsing more effective in exploratory tasks than in known-item searches?
  • 2.What is the impact of topic familiarity on Scatter/Gather effectiveness? Is Scatter/Gather less useful/effective, as compared to classic ranked retrieval, when the user is more familiar with the search topic?
  • 3.What is the impact of topic specificity on Scatter/Gather effectiveness? Is Scatter/Gather more useful/effective, as compared to classic ranked retrieval, when the search topic is less specific (broader, harder to articulate)?
  • 4.What is the impact of Scatter/Gather browsing on the user's ability to learn and investigate the searched topic? Does and how well does Scatter/Gather facilitate user learning and investigation?
  • 5.What user behavior evidence tells about (user perception of) document relevance? How well are user data such as clickthroughs and viewing time associated with relevance?

4. METHODOLOGY

We conducted a within-subject user study of 24 undergraduate students at Drexel University to study related research questions, to examine the strengths and weaknesses of a Scatter/Gather system, and to receive feedback from users to improve the system.

4.1 Systems

Two systems were developed for this study: 1) a classic web search system which returns search results in the classic, sequential order, and 2) a hybrid Scatter/Gather system which clusters search results and performs Scatter/Gather iterations thereafter.

The Scatter/Gather system integrates Bing search API for search and the Weka machine learning package for text clustering. After the user enters a search query, the system sends the query to Bing, retrieves the first 200 results, and clusters the results into a number of groups (7 clusters by default). The classic search system also integrates Bing search API but it directly presents search results to the user without clustering.

Figure 1 is a screen shot of the Scatter/Gather interface. User can select the clusters and the desired number of clusters to go to the next iteration. The results in the selected (checked) clusters will be gathered and then clustered into the desired number of clusters. The user can continue searching until they find what they need after a few iterations.

After viewing a web page, the user needs to provide feedback on the relevance and usefulness of the web page, which is saved to the use log. Information regarding when the user clicks a link and closes the popup window is also recorded.

4.2 Tasks

Four tasks with different levels of query topic specificity were chosen, including two known-item tasks and two exploratory tasks. Three tasks were from TREC 2005 HARD track whereas we created an additional exploratory task based on a well-discussed search scenario (White et al., 2008). The selected topics included tropical storms, price fixing, overseas tobacco sales (exploratory), and travel & sight seeing (exploratory).

4.3 Procedure

The investigators and subjects met one-on-one in a usability lab located at the authors' institute. Each subject was asked to complete a demographic questionnaire and then watch a 5-minute video tutorial. Following the tutorial, the subject was allowed some time to try out and get familiar with the systems. Then the subject was given up to 15 minutes to search on each task. Two tasks were performed on the classic search system and the other two on the Scatter/Gather system, with the order rotated among subjects to eliminate potential learning effects. The order in which systems were used and search tasks were conducted was arranged according to a Graeco-Latin square.

The systems kept track of activity information such as user clickthroughs and viewing time. Subjects evaluated the relevance and usefulness of each document they viewed during searches, which were also recorded. After each task, the subject was asked to evaluate the system using a post-task questionnaire. These questionnaires assessed, among others, subjects' familiarity with topics of the tasks, overall satisfaction with the systems and overall satisfaction with results received, etc. Following the completion of four tasks and four post-task questionnaires, the subject filled out an exit questionnaire to provide feedback on the Scatter/Gather system and research.

Two sets of post-task questionnaires were used for the two different systems, namely, the classic search and the Scatter/Gather systems. Both of them included questions about topic familiarity, specificity, effort to start the search (Easy Start) and do the search (Easy Search), satisfaction with results (Satisfaction), confidence with the search (Confidence), usefulness of a system (Usefulness), role of previous knowledge (Knowledge) and whether they have enough time to finish the task (Time). Three additional questions about effects of clustering on the user's ability to complete the task, satisfaction with the clusters and documents in the clusters were included in the questionnaire for tasks conducted on the Scatter/Gather system. The exit questionnaire collected information about their overall experiences with the two systems, including system preference, usability, and usefulness of the Scatter/Gather system.

Figure 1.

Scatter/Gather browser interface

5. RESULTS

In this section, we discuss data collected from the study and present preliminary results from statistical analysis. We focus on research questions 2 (impact of familiarity), 3 (specificity), and 5 (relevance evidence).

According to users' responses to the exit questionnaire, none of them had used any system similar to the Scatter/Gather system before. 13 subjects indicated that Scatter/Gather was more helpful in completing tasks as it filtered out some useless information and provided concentrated results, while 7 subjects told that classic search system was better without the difficulty to choose clusters. 4 subjects indicated no difference between the two.

From the usability point of view, half of the subjects said that the classic search system was easier to use as they were more familiar with the interface of a ranked information retrieval system and the Scatter/Gather system was more complex. 9 subjects viewed that Scatter/Gather was easier mainly because the results were well sorted and organized. 3 subjects indicated no difference. Overall, 11 subjects preferred Scatter/Gather whereas another 11 subjects liked the classic search system more.

5.1 Topic Familiarity

In earlier discussions, we reasoned that the two systems (classic search and Scatter/Gather) may support different types of search tasks. Scatter/Gather likely works better in exploratory topics. Hence the impact of task-related variables on (perceived) retrieval effectiveness in the study was likely different across the two systems. We performed interaction analyses of the system variable (0 for Scatter/Gather and 1 for classic search) and task-related variables (e.g., topic familiarity and specificity) to examine potential influences.

Relying on linear regression, we analyzed the the impacts of topic familiarity (x1), system (binary variable x2), and their interaction (x1 x2) on user perception of system effectiveness in the post-task questionnaire. Table 1 presents coefficient estimates (with significance codes) of x1, x2, and x1 x2 on each effectiveness perception (dependent variable). The user's familiarity with a search topic positively and significantly influenced perceived system effectiveness in terms of most questions in the post-task.

The system variable also had a positive influence in some aspects. For example, the classic search engine (system=1) was significantly easier to start with (easy start) and to perform searches (easy search), as compared to Scatter/Gather (system=0). The negative (significant) values of x1 x2 coefficients indicates that topic familiarity had a greater impact on perceived effectiveness (e.g., 0.415 on satisfaction) in the Scatter/Gather system (system=0) than in the classic search system (e.g., 0.415–0.376 = 0.04 on satisfaction).

Table 1. Impact of user's familiarity with topic on perceived effectiveness of the systems. System is a binary variable: 0 for Scatter/Gather and 1 for classic search.
 Familiarity X1System X2X1 X2
  1. a

    Significance codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘·’ 0.1 ‘·’ 1

Easy Start0.385**1.74**−0.346*
Easy Search0.467***1.67*−0.37*
Time0.184*0.953*−0.216·
Satisfaction0.415***1.19·−0.376*
Confidence0.252**0.733−0.23·
Usefulness0.322**0.85−0.342*
Knowledge0.765***0.832−0.23

5.2 Topic Specificity

We conducted a similar analysis of the impacts of topic specificity (x1), system (binary variable x2), and their interaction (x1 x2). As shown in Table 2, topic specificity had significant influences on the user's perceived ease to use the Scatter/Gather system and the perception about whether there was sufficient time to conduct searches. The influences on satisfaction and prior knowledge used in searches were significant at the 0.1 level. However, the interaction with the system variable did not result in significant influences.

Table 2. Impact of topic specificity on user perception about the systems.
 Specificity X1System X2X1 X2
  1. a

    Significance codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘·’ 0.1 ‘ ’ 1

Easy Start0.512**1.04−0.121
Time0.389**1.4−0.243
Satisfaction0.295·0.145−0.0712
Knowledge0.443·1.14−0.274

5.3 User evidence and relevance

We analyzed clickthrough data and users' relevance judgment to identify potential associations. On average, subjects spent 57 seconds viewing a page after the link was clicked. Table 3 shows Pearson correlations among user perceived relevance, usefulness, viewing time, and relative reviewing time22 . Perceived usefulness and relevance are highly correlated (Pearson 0.864) – users tended to regard a useful page as a relevant one as well. Viewing time (relative viewing time) also has a positive correlation with both relevance and usefulness.

Table 3. Correlations among relevance evidence
(Pearson)UsefulnessView TimeRelative Time
Relevance0.8640.2660.288
Usefulness0.3030.319

6. CONCLUSION

Based on data collected from a user study, we discussed users' perceptions of the Scatter/Gather system vs. a classic search interface, analyzed the influences of task-related variables on perceived system effectiveness, and examined correlations among relevance-related variables in user logs. Overall, subjects were divided in terms of their preferences toward Scatter/Gather. The classic search interface appealed to those who viewed Scatter/Gather as a more complex system; while others viewed Scatter/Gather easier to interact with because its new way of results presentation/organization. Topic familiarity and specificity had significant influences on several categories of user perceived retrieval effectiveness. The influences appeared to be greater with the Scatter/Gather system, suggesting that there were special situations in which interactive clustering better supported information retrieval. We found positive correlations of the user's (relative) viewing time with both perceived relevance and usefulness of a web page, which are highly correlated to each other. This paper represents our first step to analyze the user study data. We plan to conduct further analysis and report on more in-depth results.

Footnotes

  1. 1

    We present preliminary results from the study and do not discuss all research questions in this poster paper.

  2. 2

    Relative viewing time is the time a user spent on a specific page normalized by the user's average viewing time.

Ancillary