SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. RESEARCH DESIGN
  5. RESULTS
  6. DISCUSSION
  7. CONCLUSION
  8. Acknowledgements
  9. References

Clustering web search results into dynamic clusters and cluster hierarchies has been shown to be promising in reducing the information overload typically found in the ranked list search engines. The study compared sixteen participants' search performance and subjective satisfaction level in using textual clustering and ranked list search interfaces towards conducting assigned and self-designated search tasks. The results show participants searched slightly faster, better, and were more satisfied using the ranked list interface. However, it is worth noting that participants performed slightly well in easy type of questions with the clustering interface, and obtained non-repetitive relevant results not found from using the ranked list interface. The study shows the clustering interface provides the values of highlighting prominent concepts and offering richer context for exploring, learning and discovering related concepts; yet it also induces certain degree of information uncertainty, lost, and anxiety. Discussions on the contrast view of clustering search and suggestions for future studies are also provided.


INTRODUCTION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. RESEARCH DESIGN
  5. RESULTS
  6. DISCUSSION
  7. CONCLUSION
  8. Acknowledgements
  9. References

Finding information through search engines has become one of the most important online activities. In response to a user's query, web search engines mostly return a ranked list of search results. The user often needs to sift through large amounts of mixed results to locate pages of interest. Web users struggle with information overload, coping with an overabundance of information that lacks a comprehensible organization (Kules, & Shneiderman, 2006). One of the key techniques used to address the above issue involves organizing and presenting search results in a way that helps users find documents of interest (Venkatsubramanyan & Perez-Carballo, 2007). It is suggested that classifying or categorizing information appropriately to be a feasible approach to make users search or browse information more quickly and efficiently (Samler & Lewellen, 2004).

Clustering web search results into dynamic clusters, in which results are grouped hierarchically by similarity measures has been shown to be promising in reducing the information overload typically found in the ranked list search engines. There have been many studies comparing different combinations of search results organization (e.g., categorization and clustering) and presentation methods (e.g., list and graphic interface) using self-developed research prototypes. Julien, Leide, & Bouthillier (2008) provided a review on 31 controlled user studies of information visualization tools for textual information retrieval. These studies indicate that the proposed techniques have promise, but it is clear that user experiments are required to carry out more thorough evaluation (Leouski & Croft, 1996; Käki & Aula, 2008). There are very few studies investigating the nature of interaction using real world clustering search engines; even less research explain whether, why and under what circumstances web search results clustering are effective.

The study attempts to investigate users' search performance and satisfaction level of using textual clustering interface in web search. Usability and comprehension tests have been conducted using multiple data collection methods, and various objective and subjective measures were applied. The former test is to understand whether and how user performs better in web search with and without clustering interface. The latter test is to understand user's subjective feeling towards using clustering interface consisted of clusters and cluster hierarchies. 16 masters' and doctorate participants have been recruited from various disciplines; within-subjects of 8 Library & Information Science (LIS) students conduct the assigned search tasks divided by question type and task level, while the between-subjects of the other 8 non-LIS students conduct the self-designated search tasks. The results can be used as a vehicle for further discussion regarding user evaluation research into this area.

RESEARCH DESIGN

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. RESEARCH DESIGN
  5. RESULTS
  6. DISCUSSION
  7. CONCLUSION
  8. Acknowledgements
  9. References

The study consists of two parts of evaluation, one is the usability test of clustering interface measured by users' search performance; and the other is the comprehension test measured by users' satisfaction levels. Various methods were used to support the two tests, including experiment, observation, questionnaire, interview, and search log analysis. Meanwhile, the study also employed some of the design concepts from Aula & Käki's study (2008), which suggested methods for controlling the complexity of user-centered evaluations of search user interfaces through within-subjects designs, balanced task sets, time limitations, pre-formulated queries, cached result pages, and through limiting the users' access to result documents. The details of the research design are described as follows.

Test Platform and Participants

The study chose Vivisimo as the platform for study, since it presents search results in both textual clustering and ranked list interfaces; it is also currently one of the most popular clustering search engines and its scope for search is relatively appropriate based on previous studies. Vivisimo's original interface is, therefore, treated as the clustering search interface in this study (hereinafter refer to CS); and another simple ranked list search without clustering results has been designed in the study as a base for the comparison (hereinafter refer to RLS). 16 master's and doctorate students in Taiwan (12 females and 4 males) with adequate English proficiency were recruited from different subject domains, 8 from LIS, and the other 8 from social science, engineering, business, and medical science respectively. The 16 participants use Internet frequently, with an average 3 hours per day spent in Internet activities. They are familiar with web search engines and conduct search on a daily basis; the most frequently used search engines include Google and Yahoo! Kimo (a localized Yahoo! search engine in Taiwan). Among 16 participants, only 3 have used Vivisimo or the like clustering search engines (e.g., Gorkker) before, and the 3 participants were all LIS background and their usage of web clustering search engines was low. As all participants were in the process of completing their thesis or dissertation during the study period, their information needs and domain knowledge for searching related literature were stronger than those of general users.

Usability Test

The purpose of the usability test is to realize in which interface participants would search faster and better. Experiments include assigned and self-designated search tasks. The previous analysis of search goals suggested that the informational goal of obtaining information about the query topic was the highest than those of others like navigational goal (Rose & Levinson, 2004), therefore, the assigned tasks were basically designed in this regard. The study developed 4 questions with 2 dimensions of question type and task level. As shown in Table 1, the question type consists of closed and open, and the task level is divided to easy and difficult. The 4 questions all focus on topics related to information organization in LIS field. The 8 LIS students were invited to attend the assigned task. A brief explanation was given on the contexts of the questions and the functionalities of the two search interfaces. Each participant was also guided through practice searches in each interface before performing the formal task. Each formal task was limited to five minutes according to the study's pilot test of the average time spent on answering each question. Participants needed to use the pre-formulated query terms to ensure the same search results were accessed. As for the self-designated search tasks, the other 8 non-LIS participants performed their own search tasks with individual research interest. The purpose was to observe participants conducting their own search in a more natural way; therefore, no limitation is required during the search process.

Table 1. 
Thumbnail image of

Meanwhile, the study used Morae client logging software to record each participant's search process. The log data was to measure the participants' search efficiency and effectiveness, and items collected were focused on various search actions like click streams on web links, pages, and clusters.

In addition, the 8 LIS participants were equally divided into Group A and Group B to avoid possible learning effect caused by the usage order, i.e., participants in Group A used CS first then followed by RLS; and those in Group B used RLS first then followed by CS. During each search process, the researcher observed reactions of the participants without interference. The observation data was further clarified combining with the analysis of interviews and search logs.

Comprehension Test

The purpose of the comprehension test was to understand whether and how search results clustering would help users' web search. Questionnaires and in-depth interviews were conducted to collect users' subjective satisfaction levels and experiences on using CS and RLS. Upon completion of the search tasks, each participant filled out the evaluation questionnaire divided into search efficiency, search effectiveness, and satisfaction level; and the satisfaction level was sub-divided into overall, structure and content of cluster hierarchies. There were 25 evaluation items in total, which required each participant to rate the performance of CS and RLS in the scale from 1 (completely disagree) to 5 (completely agree). The satisfaction levels of structure and content were specifically designed to evaluate the quality of CS. The structure mainly refers to the hierarchical structure presenting the search results, and the content refers to the names or labels of the clusters. The one-to-one interview was conducted after the evaluation questionnaire was finished. The interview aims to collect participants' thoughts and feelings about using CS and RLS for web search; also their intention to continue using CS in the future.

A few sources for possible variability in research design are briefly described as follows. Firstly, the automatic clustering techniques differ greatly among various search engines, and this may result in different user experiences from using their interfaces. To decrease the bias caused by the deficiency of system scale and functionality, the study chose the current one of the most popular clustering search engines as the study target, and also adopted its search results for both CS and RLS to maintain the data consistency. Secondly, with the diverse and dynamic nature of web resources, the study arranged the participants to finish the experiments within a week, thus to decrease the inconsistency caused by different clustering results. Thirdly, the sample size was not ample enough to maximize the statistical power of the study, and the study mainly provides descriptive statistics and focuses on qualitative analysis of user's experience. Finally, the limitation of five minutes in assigned tasks made the users' behavior closer to the real search behavior and can serve as a cutoff time for the convenience of comparing the performance of using CS and RLS.

RESULTS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. RESEARCH DESIGN
  5. RESULTS
  6. DISCUSSION
  7. CONCLUSION
  8. Acknowledgements
  9. References

Search Performance

Search Efficiency The efficiency was defined in this study as the average time and clicks taken to complete the task. As shown in Table 2, RLS (4'15”) performs slightly better than CS (4'33”) in terms of the average task completion time. Participants also obtain relevant pages quicker in RLS than in CS, i.e., on average it takes 58” for RLS to retrieve one relevant page than that of 1'32” for CS; and RLS takes 1'07” to locate the first relevant page than that of 1'39” in CS. Moreover, participants obtain each relevant page with less number of clicks in RLS (4.40) than CS (4.87). However, it is likely that participants produce more clicks on CS due to the clicks on clusters; and as CS provides more information for search, it is well assumed that participants may spend more time browsing and the search speed hence slower. Though there was a trend that participants retrieved relevant pages faster in using RLS than CS with almost all types of questions, there seemed no significant differences in efficiency between RLS and CS.

Table 2. 
Thumbnail image of

The effectiveness was defined as the quantity and quality of relevant pages retrieved. The quality was measured by the accuracy of retrieved pages judged by the tester other than the participants. As shown in Table 3, participants obtained more relevant pages using RLS than CS, with an average of 4.63 using RLS than 2.88 using CS. The study further invited one LIS expert as the tester to examine the accuracy of the relevant pages retrieved by the participants using a 1-5 point scale. The average quality of the retrieved web pages either from RLS or CS was considered fairly the same, yet it is noted that RLS got higher accuracy in the Difficult questions and CS in Easy questions. In addition, in terms of the number of participants not being able to complete the assigned tasks within 5 minutes, CS (5.25) got higher incompletion participants than that of RLS (4.25), particularly when dealing with the Difficult/Closed questions.

Meanwhile, it was found that the repetitive rate of RLS (54.05%) was higher than that of CS (33.33%). In other words, over half of the relevant pages found in RLS were the same, while the results were more diverse in CS. It is appropriate to assume that participants incline to select the same relevant pages due to those pages appear in the ranked list with the same order. Further, the repetitive rate of pages appearing both in RLS and CS were rather low: 22.58% for Easy/Closed question, 15.79% for Easy/Open question, 24.14% for Difficult/Closed question, and 21.88% for Difficult/Open question. The low repetitive rate reveals that participants did locate some relevant web pages on CS not found on RLS. In other words, participants may obtain more relevant pages with the aid of clustering interface. In short, participants found more relevant pages using RLS than CS with the same quality; while CS also provides good and diverse relevant pages not found in RLS. Participants may search better using RLS plus CS than using only one of them.

Table 3. 
Thumbnail image of

Satisfaction Level

Satisfaction level describes the participant's subjective level of satisfaction when using the two interfaces. The overall satisfaction level of the 16 participants was slightly higher with RLS (3.29) than CS (3.10) with a 1-5 point scale (the larger the number, the higher the satisfaction). As shown in Table 4, participants in the assigned tasks got higher satisfaction level than those in the self-designated tasks. RLS had slightly higher satisfaction level in reducing the cognitive load than that of CS. It is assumed that it would take participants more time and efforts to browse the information on CS. It was observed that participants tend to perceive and feel controlled over the search results quicker with RLS than CS; yet some participants indicated that CS helped reducing the information overload when they were not so familiar with the search topics or overwhelmed by the number of search results. However, it was also observed that CS sometimes triggered certain degree of anxiety, such as some participants felt lost among clusters and hierarchies, and unable to get focused on the topics searched; some felt easy to get divert to other search results; some worried about missing important results buried in the clusters; and some felt exhausting in browsing clusters with duplicated results or unwanted results, etc.

Table 4. 
Thumbnail image of

As to the satisfaction level on the usefulness of search results, participants in the assigned tasks didn't perceive great differences between RLS and CS, while participants in the self-designated tasks considered the search results on RLS were more useful. According to the interviews, some participants had doubts towards the completeness of clustering search results on CS, such as they were not sure where the possible results would reside, or afraid of bypassing important results due to the small amounts of search results in each cluster. Concerning with the multi-clustering of search results, some participants thought it helpful to prevent from missing important results; on the other hand, some considered it annoying to sift through clusters with repetitive results. In short, participants felt satisfied for both interfaces in reducing cognitive load and providing useful search results, though the average satisfaction level was not very high. In addition, the non-LIS participants got lower satisfaction level than that of LIS participants, even lower in using CS. According to the analysis of interviews, it seemed that non-LIS students got less trust in search engines, and often had doubts mentioned above.

Evaluation of Clusters and Cluster Hierarchies

Usage of Clusters and Cluster Hierarchies The cluster usage was measured by the clicks on the clusters and cluster hierarchies. Table 5 showed the cluster usage among the 8 LIS students in the assigned tasks. The average click rate of clusters (44.24%) was rather high. It was noted that the click rate of clusters was higher when participants were less familiar with the questions, such as the Easy/Open question had lower level of familiarity 2.25 and higher click rate of 51.95% compared with Easy/Closed question having higher level of familiarity 3.13 and lower click rate of 42.78%. For participants in the self-designated tasks, the average click rate of clusters (21.51%) was lower than that in the assigned tasks, yet the rate was not low either. Besides, 15 of 16 participants had clicked on the clusters. Generally, the usage of clusters was not low.

Table 5. 
Thumbnail image of

Table 6 showed the usage of cluster hierarchies. For both tasks, the click rate of the first layer was the highest, and almost no clicks on the third layer and above. It was noted that participants in the self-designated tasks concentrated more on the first layer, while participants in the assigned tasks clicked more in depth. Overall, participants mainly used the first and second layers, and they considered the first layer most informative and useful.

Table 6. 
Thumbnail image of

Satisfaction Level of Clusters and Cluster Hierarchies

As shown in Table 7, in general participants felt satisfied with the quality of individual cluster (3.77) and cluster hierarchies (3.22); and the overall impression 3.33 was above average. For each individual cluster, participants indicated the concept presented and label used were appropriate, clear and comprehensible. On the other hand, participants' average satisfaction level of cluster hierarchies was comparatively lower than that of individual cluster; particularly the cohesiveness, isolation, and completeness of the clusters were all below average level. The lower satisfaction level of cluster hierarchies was corresponded to the doubts participants had showed aforementioned, where they were concerned about whether search results have been appropriately and thoroughly presented in the cluster hierarchies. As to the balance of overall cluster hierarchy, some participants suggested that 10 clusters or below would be appropriate in terms of the cluster breadth, since this would allow them more easily to identify the overall concept and structure of the search results revealed. As for the depth, most participants found it sufficient to have a two-layer structure, which corresponded to the click analysis of the cluster usage mentioned above. As for the readability, most participants indicated the over cluster hierarchy fitting well with their cognizance; nevertheless, some participants mentioned the doubts towards the cluster logic, particularly when they were familiar with the search topics. In short, participants were quite satisfied with the quality of each individual cluster, and left some doubts about the cluster hierarchies. In addition, most participants agreed that clusters and cluster hierarchies both provided the value for associative or related search.

Table 7. 
Thumbnail image of

DISCUSSION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. RESEARCH DESIGN
  5. RESULTS
  6. DISCUSSION
  7. CONCLUSION
  8. Acknowledgements
  9. References

In summary, most participants searched slightly faster, better, and were more satisfied with RLS than CS in terms of search efficiency, search effectiveness, and satisfaction level with most types of questions, though the difference was not obvious. The lack of a measurable performance advantage could be explained by the familiarity of most participants with ranked list interfaces, which could put the clustering interface at a disadvantage. However, it is worth noting that participants performed slightly well in easy types of questions with CS than RLS considering the quality of search results. It is likely that participants might find relevant results more easily since the clustering interface presents prominent concepts more visibly and no immediate necessity digging into details of search results as for difficult types of questions. Further, participants obtained other relevant search results from CS not found in RLS; even each participant's result set from CS was not similar. This reveals the value of CS in improving the recall and enhancing the diversity in web search.

As for the satisfaction on the clusters and cluster hierarchies, it is noted that participants felt more satisfied with each individual cluster and less satisfied with cluster hierarchies in terms of the comprehensibility and relevancy of search results presented. In all, participants found CS interesting and intuitive to use, particularly its value for serendipitous browsing associated concepts among clusters and cluster hierarchies. This stimulated their associative thinking with or without familiarity with the search topics, and thus enabled them to further discover new information. All participants unanimously showed their interests in revisiting CS; half of them even added CS to their personal bookmarks after the experiments. Yet when the intention for future use was concerned, most participants stated they would still use RLS as the first choice and CS as an extended search aid. This somehow brings interesting contrast views on the situations in using clustering interface. Some observations are briefly discussed below.

Highlight Important Concepts vs. Miss Important Results

Through clusters and hierarchies, it is easier for users to have a quick glimpse of the overall and related concepts included within the search results. As many related studies pointed out that clustering interface is helpful when users are able to access results that locate far in the ranked list with the categories or clusters (Chen & Dumais, 2000; Käki & Aula, 2005; Turetken & Sharda, 2005). However, it sometimes causes users ignoring other important relevant search results due to users' limited attention and efforts, etc. In the study, though users obtained some relevant results not found in the ranked list, they also missed some important results found in the ranked list. For many participants, the extra efforts in browsing the clusters and hierarchies mostly bring the value of serendipitous exploration of related concepts, while obtaining more relevant search results is the next. Some also indicated that they would use clustering interface when there is no immediate time pressure. It is suggested that clustering and ranked list interfaces both satisfy the need for only a few relevant results, while clustering interface can add another value for ‘researching’ on the concepts presented in the search results. Nevertheless, users need to use both interfaces to achieve better recall of search results.

Increase Multi-faceted Views vs. Decrease Search Focus

According to the study, most participants agree with the fact that the contents and structure of clusters and hierarchies are helpful in clarifying the topic searched, learning related concepts, stimulating diverse thinking, and discovering new information, etc. In other words, users have a more multi-faceted view of the topics searched and the search results as well. The process is rewarding, yet it unavoidably would cost participants more cognitive efforts in comprehending, analyzing, and synthesizing the concepts and results obtained. Obviously the cognitive activities are different from those using ranked list only, such as participants mostly focus on relevance judgment of search results when using ranked list interface. Further, users may be occasionally attracted or distracted by other information presented in clusters and hierarchies, and hence get less focused in their original search tasks. It was observed that the extra information is mostly useful for those participants with medium familiarity on the topics searched. For participants with less or high knowledge on the search topics, they often feel less interested in browsing the clusters and hierarchies. One of the participants in self-designated tasks even ignored the clustering information entirely since the participant indicated that clustering would divert the search focus and thus less useful for the search task itself.

Reduce Information Overload vs. Raise Information Anxiety

As repeated in many studies, one of the most important purposes of clustering is to reduce information overload. Organizing search results with clusters and hierarchies does allow users to focus on items in clusters of interest rather than having to browse through all the results sequentially. However, this raises other possible anxieties as discussed as follows. Some participants worried about missing important results and keep checking on each cluster to make sure not bypassing any important result. If multi-clustering has been applied, participants would feel annoyed by the repetitive search results included. Moreover, some participants note that the number of search results included in each cluster is too few and they have to keep flipping through the clusters to check search results within. The situation would get even worse if the hierarchies are broad and many. However, if several clusters were merged into a larger cluster, the clusters would get less distinguishable. The adequate number of sub-clusters and search results within the clusters is still an open problem, yet a critical factor effecting user's search experience. Some participants mentioned that if they use the clustering search engine first, they would still use ranked list search engine to make sure search is complete; while not the case vice versa. Most participants think clustering search an additional and advanced function instead of a must. When dealing with simple searches requesting only a few results, it seems no differences either using clustering or ranked list search; while when the search is difficult or complex, the anxiety caused may overshadow the merit of reducing information overload in using clustering search.

CONCLUSION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. RESEARCH DESIGN
  5. RESULTS
  6. DISCUSSION
  7. CONCLUSION
  8. Acknowledgements
  9. References

Clustering web search results into dynamic clusters, in which results are grouped hierarchically by similarity measures has been shown to be promising in reducing the information overload typically found in the ranked list search engines. The study shows that such interface offers richer and more diverse information, and brings users new search experiences of exploring, learning and discovering related concepts concerning with search topics and results; yet it also induces certain degree of information uncertainty, lost, and anxiety. The clustering adds value to the ranked list, and it is more like an extended and advanced service to the ranked list search instead of competing with such services. The authors recognize that there will be certain issues regarding the generalization of these results, due to the constraints of small sample size and comparing specific search engine. However, the evaluation performed in this study can form a basis of recommendations and design considerations for clustering search results interfaces. A few suggestions for future studies are briefly listed as follows: first, since the study focuses on informational goal of search, it would be valuable to design exploratory or navigational tasks to investigate how users explore large sets of search results with what specific contexts; further, longitudinal and repeat studies are necessary to evaluate changes in search behaviors as users adopt and adapt to new interfaces; finally, as users in the study have higher acceptance of individual clusters than cluster hierarchies, it would be interesting to further compare the practical value of clusters and other prevalent related search techniques like term suggestion.

Acknowledgements

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. RESEARCH DESIGN
  5. RESULTS
  6. DISCUSSION
  7. CONCLUSION
  8. Acknowledgements
  9. References

This work was supported in part by the National Science Council, Taiwan, under the grant NSC96-2413-H-003-025.

References

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. RESEARCH DESIGN
  5. RESULTS
  6. DISCUSSION
  7. CONCLUSION
  8. Acknowledgements
  9. References
  • Chen, H., & Dumais, S. (2000). Bringing order to the Web: automatically categorizing search results. ACM SIGCHI-00, The Hague, The Netherlands.
  • Julien, C-A, Leide, J. E., & Bouthillier, B. (2008). Controlled user evaluations of information visualization interfaces for text retrieval: literature review and meta-analysis. Journal of the American Society for Information Science & Technology, 59(6), 10121024.
  • Käki, M., & Aula, A. (2005). Findex: improving search result use through automatic filtering categories. Interacting with Computers, 17(2), 187206.
  • Käki, M., & Aula, A., (2008). Controlling the complexity in comparing search user interfaces via user studies. Information Processing and Management, 44, 8291.
  • Käki, M., & Aula, A. (2008) Controlling the complexity in comparing search user interfaces via user studies. Information Processing & Management, 44(1), 8291.
  • Krishnapuram, R., & Kummamuru, K. (2003). Automatic taxonomy generation: issues and possibilities. Lecture Notes in Computer Science, 2715, 5263.
  • Kules, B., & Shneiderman, B. (2006). Using meaningful and stable categories to support exploratory web search: two formative studies. Retrieved Jan. 28, 2009, from http://hcil.cs.umd.edu/trs/2005-31/2005-31.htm
  • Leouski, A. V., & Croft, W. B. (1996). An Evaluation of Techniques for Clustering Search Results. Technical Report IR-76, Department of Computer Science, University of Massachusetts, Amherst.
  • Rivadeneira, W., & Bederson, B. B. (2003). A Study of Search Result Clustering Interfaces: Comparing Textual and Zoomable User Interfaces. (Report No. HCIL-2003-36) College Park, MD: University of Maryland.
  • Rose, D. E., & Levinson, D. (2004). Understanding user goals in web search. WWW 2004. New York, NY, USA.
  • Samler, S. & Lewellen, K. (2004). Good taxonomy is key to successful searching. EContent, 27(7/8), S20.
  • Turetken, O., & Sharda, R. (2005). Clustering-based visual interfaces for presentation of web search results: an empirical investigation. Information Systems Frontiers, 7(3), 273297.
  • Venkatsubramanyan, S., & Perez-Carballo, J. (2007). Techniques for organizing and presenting search results: a survey. Journal of Information Science and Technology, 4(2).