SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. Research statement
  4. Hypothesis and Research framework
  5. System design
  6. Data collection
  7. Findings
  8. Conclusion and Future Research
  9. Author Notes
  10. Literature

This study investigated whether the existence of either the mediated condition through a collection in a specific domain or the combined display of a clustered structure and a linear ranked list of search results would affect the three main measures of the user's Web searching behaviors: effectiveness, efficiency, and usability. That is, mediated access through a structured collection helps the user explore a small well-structured document collection covering a specific subject domain in order to clarify her/his information needs before search on the World Wide Web. The results of the study demonstrated that all the measures had interesting results via the adaptive process on the system and the search task. The findings of this research contribute to a better understanding of how the mediation system or the combined display supports a Web information user.


Research statement

  1. Top of page
  2. Abstract
  3. Research statement
  4. Hypothesis and Research framework
  5. System design
  6. Data collection
  7. Findings
  8. Conclusion and Future Research
  9. Author Notes
  10. Literature

Information users still have difficulties in choosing search terms to present their problem and in reformulating their query to a better one (Nordlie, 1999). This situation frequently occurs when they use an unorganized database such as the World Wide Web, especially when their tasks are complicated. The purpose of this study is to investigate whether the new intermediary information retrieval (IR) system for Web information that has domain knowledge and its structure may improve information users' search results and behaviors. As an intermediary, the source collection concept (Muresan & Harper, 2004) was applied. A source collection is defined as a thematically focused small collection for aiding a user to have proper knowledge on a specific domain for her/his specific information needs. That is, mediated access through a structured collection helps the user explore a small well-structured document collection covering a specific subject domain in order to clarify her/his information needs before search on the World Wide Web.

Hypothesis and Research framework

  1. Top of page
  2. Abstract
  3. Research statement
  4. Hypothesis and Research framework
  5. System design
  6. Data collection
  7. Findings
  8. Conclusion and Future Research
  9. Author Notes
  10. Literature

It is hypothesized that this mediated IR system would be better than a direct IR system without support of a source collection (e.g., traditional Web engines), in information search performances. We expected that a user would have clearer knowledge on what she/he needs through browsing the outcomes from a small structured source collection and this will lead her/him to develop an improved search results and performances. The second hypothesis examines different results in information search performances from different ways of displaying the search results. Based on previous research outcomes, the combined mode of traditional ranked list of search results and structured display of search results (e.g., classification or clusters) would help a user have better understanding about her/his search results. The combined interface would give a user better knowledge on the structure of either of source collection or the results of the Web. The goal of this specific research project is explicitly to test these hypotheses.

Table 1. Framework of the study
Thumbnail image of

System design

  1. Top of page
  2. Abstract
  3. Research statement
  4. Hypothesis and Research framework
  5. System design
  6. Data collection
  7. Findings
  8. Conclusion and Future Research
  9. Author Notes
  10. Literature

The system, which was developed by Dr. Muresan, and the New Jersey Environment Digital Library (NJEDL) collection was adopted as a source collection. Among the two representatives of the document structuring methods, clustering and classification, the former was selected for this study because clustering has much strength in the Web environment. The design of the mediated system interface basically follows the functionality of the ClusterBook (Muresan et al., 2001); the user explores the domain of interest represented by a structured source collection, selects documents or cluster representatives for a certain information need, edits the query if necessary, and search the target collection, the Web. Thus, a user can benefit from either browsing the entire NJEDL collection (the N of documents = 1,300) with the static tight topic bound structure, or searching a relevant Web document from the topic bound clusters developed from a small retrieved Web document set (N = 100). As explained in Table 1, there are four system modes, and according to whether there is a source collection or a cluster interface, the four modes were decided. While the simplest mode, Non-mediated and Linear (Figure 1), is similar to a normal web search engine interface, the most complicate mode, Mediated and Combined (Figure 2), offers both the source collection (NJEDL) and the clustered structure interface. The other two modes have either the source collection or the cluster interface.

thumbnail image

Figure 1. Non-mediated Linear list System.

Download figure to PowerPoint

thumbnail image

Figure 2. Mediated Combined display System.

Download figure to PowerPoint

Data collection

  1. Top of page
  2. Abstract
  3. Research statement
  4. Hypothesis and Research framework
  5. System design
  6. Data collection
  7. Findings
  8. Conclusion and Future Research
  9. Author Notes
  10. Literature

Data analyses for the two hypotheses were conducted. In addition, we analyzed the effectiveness, efficiency, and usability by different units of the data: by each: by each Non-mediated and Mediated group; Linear and Combined group; per each task order (total 4 tasks); and in two different subject groups based on their system usage order (Non-mediated & Mediated order group vs. Mediated & Non-mediated order group). All 32 subjects were randomly divided in half and each group (16 subjects) was assigned to one of the two displays of the IR search results conditions: a linear ranked interface, and a combination of a linear ranked list and a classified display of search results. Each subject was allocated 4 different topic searches: two of them with a non-mediated system, and two with a mediated system. Therefore, the first hypothesis (Mediated vs. Non-mediated) was a within-subject experiment, which is based on the belief that a user has to have a chance to experience both (Non-mediated and Mediated) IR conditions; and the second hypothesis (Linear vs. Combined) was the between-subject experiment. Detailed user behaviors were logged via a logger. Subjects also participated in a Topic Questionnaire per topic and a System Questionnaire per system. Subjects also participated in an Exit Questionnaire Interview to provide feedback regarding their experiences and opinions.

Findings

  1. Top of page
  2. Abstract
  3. Research statement
  4. Hypothesis and Research framework
  5. System design
  6. Data collection
  7. Findings
  8. Conclusion and Future Research
  9. Author Notes
  10. Literature

Mediated vs. Non-mediated

The mediation function for the Web searching was influenced and improved by the adaptation process for a complicated multifaceted topic task and a novel system. It was found that the task order and the system usage order influenced the mediation function. The orders of the tasks and the system usage order were found to be strongly related to the measures with regards to the mediation conditions.

First, our findings indicate that the more information searching tasks a user conducts, the better the results the mediated condition produces in every aspect of Web searching than the non-mediated one. Second, the subjects that used the non-mediated condition system prior to the mediated one were able to experience an adaptation process before being exposed to a new IR system. Consequently, this had a significant impact on the relationship between the two mediation conditions across all the measures: the effectiveness, efficiency, and usability. Even though some of them were not statistically significant, when a user had such an adaptation process with the information search tasks and a rather familiar information system, such as the non-mediated one, most of the results clearly showed that the mediated condition was better with regards to effectiveness, efficiency, and usability. Only the important results are selected and introduced in this paper.

Table 2. Effectiveness-Objective and Subjective (per Task Order)
Task orderTask 1Task 2Task 3Task 4
Mediation conditionMNMNMNMN
Aspectual recall for saved aspects (percentage).12 (.17).18 (.17).18 (.21).17 (.20).17 (.19).16 (.19).28 (.24).20 (.21)
Satisfaction with the search results (7 Likert scales)2.19 (1.27)3.94 (1.73)3.31 (1.53)3.75 (1.98)4.00 (2.16)4.12 (1.99)5.00 (1.71)4.00 (1.78)
Perception of the effective search time (7 Likert scales)2.31 (1.49)3.63 (1.66)2.81 (1.68)3.44 (1.63)4.25 (2.11)4.31 (1.92)4.63 (1.99)4.00 (1.96)

In their final task (task 4), subjects did not only score the best aspectual recall from the saved aspects in the mediated condition, but the difference between the non-mediated and the mediated conditions was also the largest among all the tasks (Table 2). Also, the subjects increasingly became more satisfied with the mediated condition than with the non-mediated condition as they conducted more tasks (Table 2). Finally, as subjects conducted more tasks, they perceived that the mediated condition was better than the non-mediated one with regard to time effectiveness (Table 2).

Table 3. Efficiency-Objective (per Task Order)
Task orderTask 1Task 2Task 3Task 4
Mediation conditionMNMNMNMN
Task completion time (seconds)1190.63 (36.71)1194.19 (16.53)1188.56 (41.19)1143.56 (169.63)1172.00 (66.22)1170.06 (70.71)1053.19 (216.15)1147.31 (127.95)

Also in objective efficiency, in the final task (task 4), the average task completion time in the mediated condition was 1 minute and 34 seconds shorter than the one in the non-mediated condition, unlike the previous three tasks (Table 3).

Table 4. Usability-Subjective (per Task Order)
Task orderTask 1Task 2Task 3Task 4
  • *

    p > 0.05.

  • **

    p > 0.01.

Mediation conditionMNMNMNMN
Ease of starting on a topic (7 Likert scales)**3.44 (1.50)**35.19 (1.42)*3.94(1.48)**5.25 (1.48)5.37 (1.31)4.56 (1.67)*5.69 (1.40)*4.44 (1.67)
Topic easiness (7 Likert scales2.75 (1.44)4.31 (1.30)3.63 (1.50)4.37 (1.63)4.44 (1.71)4.00 (1.79)5.06 (1.69)4.44 (1.83)

In subjective usability, subjects gave significantly higher scores in the 7 Likert scale for easiness of starting the topic task to the non-mediated condition in the first two tasks. However, it was the opposite case in the latter two tasks; especially, subjects gave significantly higher scores to the mediated condition in their last task (Table 4). In addition, they answered that the topic was easier when they used the mediated condition than in the non-mediated one as they conducted more tasks (Table 4).

Table 5. Usability-Objective (per Task Order)
Task orderTask 1Task 2Task 3Task 4
Mediation conditionMNMNMNMN
Number of queries (frequency)7.94 (4.88)4.31 (2.87)9.69 (4.24)5.06 (.2.69)8.06 (4.22)6.06 (2.69)6.13 (2.89)5.88 (2.92)

In objective usability, even though the subjects changed their queries more often or used more queries in the mediated condition than in the non-mediated condition across all the tasks, in the final task (task 4), the difference was the smallest (Table 5).

Table 6. Effectiveness-Objective and Subjective (per Order of System Usage)
Order of system usageMediated mediated& Non-Non-mediated Mediated&
  • *

    p > 0.05.

  • **

    p > 0.01.

Mediation conditionMNMN
Aspectual recall for saved documents (percentage).36 (.28).43 (.28).38 (.26).35 (.22)
Satisfaction with the search results (7 Likert scales)2.75 (1.50)**4.06 (1.87)**4.50 (1.98)3.84 (1.83)
Perception of the effective search time (7 Likert scales)2.56 (1.58)**4.16 (1.92)**4.44 (2.03)*3.53 (1.63)*

When a subject experienced the non-mediated condition first, the aspectual recall from the saved documents in the mediated condition was higher than the one in the non-mediated condition, which is the opposite result in the opposite order of system usage (Table 6). In addition, when the subjects used the non-mediated first, they were more satisfied with the results and significantly felt better about the time effectiveness of their searches in the mediated condition than in the non-mediated condition, again which was the exact opposite results in the opposite order of the system usage (Table 6).

Table 7. Efficiency-Objective and Subjective (per Order of System Usage)
Order of system usageMediated mediated& Non-Non-mediated Mediated&
Mediation conditionMNMN
Task completion time (seconds)1189.59 (38.40)1158.69 (102.34)1112.59 (168.44)1168.88 (121.31)

Task completion time was shorter in the mediated condition than in the non-mediated one when the non-mediated condition was offered prior to the mediated condition, and vice-versa when the mediated condition was offered prior to the non-mediated condition (Table 7).

Table 8. Usability-Subjective (per Order of System Usage)
Order of system usageMediated mediated& Non-Non-mediated Mediated&
  • *

    p > 0.05.

  • **

    p > 0.01.

Mediation conditionMNMN
Degree of ease per topic    
Ease of starting on a topic (7 Likert scales)3.69 (1.49)*4.50 (1.64)*5.53 (1.34)5.22 (1.43)
Topic easiness (7 Likert scales)3.19 (1.51)4.22 (1.79)4.75 (1.70)4.34 (1.45)
Non-mediated vs. Mediated for Ease of learning between the two systems (frequency)    
Non-mediated12**3
Mediated1**4
NA39
Non-mediated vs. Mediated for Ease of using between the two systems (frequency)    
Non-mediated11*3
Mediated2*6
NA37
Usefulness    
Usefulness of the system for tasks (7 Likert scales)N = 3.87 (1.36), M = 3.75 (1.00)N = 4.56 (1.21), M = 5.00 (1.27)
Preference    
Preference between the two systems (frequency)    
Non-mediated92
Mediated410
NA34

While there were no differences with regard to ease of starting on a topic and topic easiness between the two mediation conditions in the Non-mediated - Mediated system usage order, the subjects in the opposite system usage order answered that the non-mediated condition was significantly better in starting on a topic and also in topic easiness (Table 8). The subjects' attitudes towards the ease of system were interesting as well. In the Non-mediated - Mediated system usage order, there was no difference in how easy a subject perceived the learning of the two mediation condition systems; and interestingly, over twice the number of the subjects answered that the mediated condition was easier than the non-mediated condition in terms of using and learning a system; in contrast, in the opposite system usage order, significantly more subjects felt that the non-mediated condition system was easier to learn and use (Table 8). Finally, the subjects perceived that the mediated condition was more useful than the non-mediated condition in the Non-mediated -Mediated system usage order unlike Mediated - Non-mediated system order; and the preference towards the two mediation conditions was reversed in the both system usage orders as well (Table 8).

Linear vs. Combined Display

Second, the combined display of the document cluster and the linear ranked list is also clearly influenced by the adaptation process for a search task and a novel system. The effectiveness and usability with regards to the display condition had interesting results again via analyses in two factors: the order of the four given tasks and the system usage order between the two mediation conditions. Several results of the effectiveness and the usability revealed that the combined display was more effective and useful than the linear display when a user conducted more tasks, and when he/she experienced a familiar non-mediated system prior to utilizing a new mediated one. Again, only the important results are introduced in this paper.

Table 9. Effectiveness-Objective and Subjective (per Task Order)
Task orderTask 1Task 2Task 3Task 4
Display conditionLCLCLCLC
Aspectual recall for saved aspects (percentage).15 (.18).15 (.17).17 (.24).18 (.16).17 (.21).16 (.17).22 (.20).26 (.26)

In the final task (task 4), the subjects scored the best aspectual recall from the saved aspects in the combined display; the difference between the combined display and the linear display was the biggest in their final task (Table 9).

Table 10. Usability - Objective (per Task Order)
Task orderTask 1Task 2Task 3Task 4
Display conditionLCLCLCLC
Number of saved documents (frequency)3.63 (2.03)2.81 (2.43)3.31 (1.49)3.06 (2.17)3.75 (2.24)3.06 (1.73)3.63 (1.15)3.88 (1.93)

In objective usability, the linear display helped its subjects to find and save the documents that contain more aspects compared to the combined display in the first task; however, it became reversed in the last task (Table 10).

Table 11. Effectiveness-Objective and Subjective (per Order of System Usage)
Order of system usageMediated mediated& Non-Non-mediated Mediated&
Display conditionLCLC
Aspectual recall for saved aspects (percentage).17 (.19).16 (.19).18 (.22).22 (.19)
Perception of the effective search time (7 Likert scales)3.66 (1.99)3.06 (1.83)4.47 (1.85)3.50 (1.81)

The results regarding the user's perception of task performance showed that the linear display condition was better regardless of the order of the system usage (Table 11). The findings imply that the subjects tend to perceive that they had better search results in a more familiar interface, such as the linear display, regardless of their actual retrieval performance in either of the system usage order.

Table 12. Usability-Subjective (per Order of System Usage)
Order of system usageMediated mediated& Non-Non-mediated Mediated&
Display conditionLCLC
Degree of ease per topic    
Ease of starting on a topic (7 Likert scales)4.56 (1.70)3.63 (1.38)5.56 (1.34)5.19 (1.42)
Topic easiness (7 Likert scales)4.09 (1.87)3.31 (1.49)4.53 (1.66)4.56 (1.52)
Degree of ease per system    
Difficulty of learning a system (7 Likert scales)1.87 (1.50)2.75 (1.06)1.81 (1.28)2.13 (1.02)
Difficulty of using a system (7 Likert scales)2.19 (1.56)2.94 (1.06)1.88 (1.50)2.31 (1.14)
Understanding the way to use the system (7 Likert scales)5.12 (1.31)4.19 (1.38)4.81 (2.10)5.44 (1.09)
Usefulness    
Support from Source Collection (mediated condition group, 7 Likert scales)3.94 (1.29)*2.69 (1.49)*4.25 (1.29)4.13 (1.59)
Usefulness of the system for tasks (7 Likert scales)4.00 (1.26)3.63 (1.09)4.63 (1.41)4.94 (1.06)

In subjective usability, the subjects in the Mediated- Non-mediated system usage order answered that they perceived both starting on the search tasks and topic easiness in the traditional linear display condition to be easier than in the combined display condition; however, there was smaller difference in both measurements in the two display conditions when a subject faced the opposite system usage order (Table 12). Also, the subjects in the combined display answered that it was more difficult to learn and use the system in the Mediated- Non-mediated system usage order compared to the ones in the linear display; there was smaller difference in the opposite order (Table 12). In terms of usefulness, when the subjects experienced the Mediated system first, the ones in the linear display answered that they received significantly more support from the source collection (NJEDL) than the ones in the combined display); in contrast, there was no difference between the two display conditions when they faced the Non-mediated condition first (Table 12). Also, the combined display of the search results was slightly more useful in their searching in the Non-mediated - Mediated system usage order than the linear display compared to the opposite result in the opposite order (Table 12).

Table 13. Usability-Objective (per Order of System Usage)
Order of system usageMediated mediated& Non-Non-mediated Mediated&
  • *

    p > 0.05

Display conditionLCLC
Number of queries (frequency)8.78 (4.28)*6.00 (3.24)*5.59 (3.93)6.19 (2.93)

Finally, in the objective usability, the subjects that used the Mediated condition before the Non-mediated condition changed significantly more queries in the linear display than in the combined display; on the other hand, when the subjects used the Non-mediated condition prior to the Mediated condition, the subjects in the combined display used slightly more queries than the ones in the linear display (Table 13).

Third, the comparison of the four modes1 is that it is not ideal to offer a Web user the combination of the mediation system and the structured display, regardless of the system's powerful functionalities; the users did not favor the most complicated Mediated-Combined mode in many aspects. The complexity of the mediation plus the combined display had an effect on the combined display users' judgments on their preference; the subjects tended to favor a less complicated system (the non-mediated), than a more complicated system (the mediated).

Conclusion and Future Research

  1. Top of page
  2. Abstract
  3. Research statement
  4. Hypothesis and Research framework
  5. System design
  6. Data collection
  7. Findings
  8. Conclusion and Future Research
  9. Author Notes
  10. Literature

This study contributes to the research on the development of Web IR system that has a mediation function and a structured display of the search results. The findings contribute to the research area on the Web IR system with the mediation function in terms of system testing because the mediated Web IR system has been rarely tested by real subjects or a specific task type such as the multifaceted topic. Furthermore, few studies have considered the possibility of comparing the effectiveness of the ranked list of search results to the combination of the ranked list and a clustered document display.

Results of this research suggest additional research agendas involving both the mediation condition and the display condition. Future research is needed to investigate what results would the mediation system and the combined display have if a subject experiences enough adaptation process; such a process should comprise not only an adaptation to a new system but also an adaptation to the characteristics of the search tasks. More work need to be conducted to understand the potential of other factors that may influence the measures of the display condition for Web searching. For example, more research seems to be necessary to investigate how different user groups having different domain knowledge levels would perform their Web searching with the mediation system or the combined display. Also, a study regarding different levels of topic difficulty needs to be conducted to find out which mediation condition or display condition fits better to a specific level of topic difficulty. Such research should have several practical implications for designing a better user-oriented Web IR system.

Author Notes

  1. Top of page
  2. Abstract
  3. Research statement
  4. Hypothesis and Research framework
  5. System design
  6. Data collection
  7. Findings
  8. Conclusion and Future Research
  9. Author Notes
  10. Literature

I would like to thank Nick Belkin, my dissertation advisor, as well as my committee members, Gheorghe Muresan, Dan O'Connor and David Harper, for their contributions and continuous support to my doctoral dissertation work.

  1. 1

    1 NL-Non-mediated and Linear, NC-Non-mediated and Combined, ML-Mediated and Linear, MC-Mediated and Combined.

Literature

  1. Top of page
  2. Abstract
  3. Research statement
  4. Hypothesis and Research framework
  5. System design
  6. Data collection
  7. Findings
  8. Conclusion and Future Research
  9. Author Notes
  10. Literature
  • Allan, J., Leuski, A., Swan, R., & Byrd, D. (2000). Evaluating combinations of ranked lists and visualizations of inter-document similarity. Information Processing & Management, 37, 435458.
  • Carpineto, C. and Romano, G. (2004). Exploiting the Potential of Concept Lattices for Information Retrieval with CREDO. Journal of Universal Computer Science 10 (8), 9851013.
  • Leuski, A., & Allan, J. (1999). The best of both worlds: Combining ranked list and clustering. CIIR technical report, 1999.
  • Leuski, A., & Allan, J. (2000). Improving interactive retrieval by combining ranked lists and clustering. In Proceeding of 6th Conference on Content-Based Multimedia Information Access (RIAO 2000), College de France, Paris, France, Apr.12–14, 2000, (pp. 665681).
  • Nordlie, R. (1999). User revealment - a comparison of initial queries and ensuing question development in online searching and in human reference interactions. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, USA, August 15–19, 1999. ACM.
  • Muresan, G., Harper, D. J., & Goker, A. (2001). Document Clustering and Language Models for System-Mediated Information Access. In Proceedings Series: Lecture Notes in Computer Science: Research and Advanced Technology for Digital Libraries 6th European Conference, ECDL 2001, Darmstadt, Germany.
  • Muresan, G. and Harper, D. J. (2004) Topic Modeling for Mediated Access to Very Large Document Collections, Journal of the American Society for Information Science, 55 (10), Special Topics Issue: Document Search Interface Design for Large-Scale Collections and Intelligent Access, August 2004, 892910.