Exploring information seeking processes in collaborative search tasks

Authors


Abstract

Many theories and models exist for understanding and explaining information seeking processes (ISP) for individuals. Such is not the case for collaborative information seeking (CIS), despite its growing importance. In this paper we take Kuhlthau's ISP model, designed for individual information seeking, and map it to a CIS situation. We present a laboratory study with 84 participants in 42 pairs and demonstrate how their information seeking processes over two sessions can be mapped to various stages of the ISP model. In addition, we explore the affective dimension of information seeking as well as perceived relevance expressed by the participants through their interactions. We discuss similarities and disparities of ISP for individuals and collaborative information seeking. In particular, we show that there is a logical progression from uncertainty about the task to being satisfied about the collected information among the participants; and at the same time, there is a lack of clear segmentation between stages of formulating information need, exploring information, and collecting it. The latter can be attributed to exploratory search tasks and interactions among the collaborators.

INTRODUCTION

Information seeking is often considered an individual pursuit, which has been challenged by several. For example, Twidale and Nichols (1996) argued that use of library resources should not be stereotyped as a solitary activity, and that introducing support for collaboration into information retrieval systems would help users to learn and use the systems more effectively. The issue of people working together in information intensive tasks has been studied extensively by communities such as Computer-Supported Cooperative Work (CSCW), and has been receiving increasing attention in the recent years (e.g., Foster, 2006; Shah, 2008). Despite these efforts, there is a lack of comparable CIS theories and models similar to those that exist for individuals information seeking, such as Belkin (1980), Marchionini (1989), and Wilson (1999).

In this paper we attempt to take Kuhlthau's information seeking process (ISP) model (Kuhlthau, 1991), and map it to collaborative information seeking (CIS). This is facilitated by conducting a laboratory study, and analyzing various actions and conversations by the participants. Using the analysis we demonstrate how the ISP for CIS, along with affective relevance, match and differ with the individual ISP model.

BACKGROUND

As one of the most representative models in information seeking, Kuhlthau's information seeking process (ISP) model (1991) formally described from user's perspective the flow of activities that he/she goes through while seeking information. Unlike other models, this model includes the cognitive and affective dimension as distinctive elements of each stage. More specifically, this model has six stages (Table 1), which include feelings, thoughts, and actions that were identified through empirical studies.

A key aspect of this model is the idea of uncertainty, defined by the author as a negative feeling; however, as pointed out by other authors, such as Anderson (2006), uncertainty can also be considered positive. Note that the identification of this particular affective state as well as the others was achieved subjectively through self-reports and interviews. To date, few studies have explored the affective dimension from a physiological, linguistic, and expressive perspective.

As mentioned before, the area of information seeking as social and collaborative phenomenon has been understudied. Only in recent years some researchers have embarked on the development of particular models of CIS, which include the evaluation and applicability of traditional information seeking models, like the one proposed by Kuhlthau's. Some examples in this regard are the works of Hyldegard (2006, 2009), Reddy and Jansen (2008), Reddy, Jansen, and Krishnappa (2008), and Yue and He (2010). In particular, Hyldegard (2009) studied Kuhlthau's ISP in the context of teams in educational settings, and found that although there were similarities at the general stages between individual and collaborative behaviors in information seeking, there were also important differences with regard to contextual aspects associated to social factors. As a result, the author concluded that Kuhlthau's ISP did not completely meet the social dimension of CIS and also that affective states (negative and positive) of participants did not necessarily coincide with those specified in the original model.

Table 1. Kuhlthau's model of ISP (Kuhlthau, 1991, p. 367)
StageFeelings (Affective)Thoughts (Cognitive)Actions
InitiationUncertaintyGeneral/VagueSeeking Background Information
SelectionOptimism  
ExplorationConfusion/Frustration/Doubt Seeking Relevant Information
FormulationClarityNarrowed/Clearer 
CollectionSense of Direction/ConfidenceIncreased InterestSeeking Relevant or Focused Information
PresentationRelief/Satisfaction/ or DisappointmentClearer or Focused 

Our interest in exploring Kuhlthau's model in collaborative context is not limited to the study of the stages and the affective dimension, but also extends to some of the actions associated to ISP, in particular those related to the identification of relevant information. When people seek information collaboratively, they also share and evaluate the collected information. In this regard, as pointed out by Kuhlthau (1991, p. 363), people's affective state may influence the relevance judgments, which is directly related to the concept of affective relevance (Saracevic, 2007). According to Saracevic, affective relevance corresponds to the “relation between the intents, goals, emotions, and motivations of a user, and information (retrieved or in the systems file, or even in existence)” (p. 1931). In addition, our particular interest in studying affective relevance in the context of CIS and ISP is due to works such as Carasik and Grantham (1988) who argue that in collaborative environments emotions play a role that should be studied in detail to understand the behaviors of teams. Indeed, it has been shown that both positive and negative feelings are necessary for improving the performance of teams (Losada and Heaphy, 2004). Several other works have also showed that by improving the emotional resources of the communication media used for collaborating, teams may achieve better results (e.g., García, Favela, and Machorro, 1999; González, 2006).

To identify and study ISP stages, as well as affective relevance in CIS setting, we designed a laboratory study that is described in the following section.

METHOD

We conducted a laboratory study involving 42 pairs, each given two exploratory search tasks and asked to work through two sessions using our experimental CIS system. The details of this system and the study are provided in the following subsections.

Coagmento – a system for collaborative information seeking

We have developed Coagmento,1 a CIS system that allows multiple people work together in synchronous or asynchronous mode, and co-located or remotely situated, for online information seeking tasks. Coagmento is implemented as a plug-in for Firefox, allowing one to perform various information seeking and synthesis, as well as communication activities from right within the browser (Shah, 2010). The design and implementation of Coagmento is inspired by similar systems, such as Ariadne (Twidale and Nichols, 1996) and SearchTogether (Morris and Horvitz, 2007), as well as several design studies and cognitive walkthroughs (Shah et al., 2009).

A screenshot of Coagmento is given in Figure 1. As we can see, it includes a toolbar and a sidebar. The toolbar has several buttons that helps one collect information and be aware of the progress in a given collaboration. The toolbar has three major parts:

  • 1Buttons for collecting information and making annotations. These buttons help one save or remove a webpage, make annotations on a webpage, and highlight and collect text snippets.
  • 2Page-specific statistics. The middle portion of the toolbar shows various statistics, such as the number of views, annotations, and snippets, for the displayed page. A user can click on a given statistic and obtain more information. For instance, clicking on the number of snippets will bring up a window that shows all the snippets collected by the collaborators from the displayed page.
  • 3Task-specific statistics. The last portion of the toolbar displays task/project name and various statistics, including number of pages visited and saved, about the current project. Clicking on that portion brings up the workspace where one can view all the collected objects (pages and snippets) brought in by the collaborators for that project, as well as organize the snippets in subcategories for a given task (Figure 2).
Figure 1.

Coagmento with enhanced views of its toolbar and sidebar.

Figure 2.

Coagmento workspace for organizing collected information.

The sidebar features a chat window, under which there are three tabs with the history of search engine queries, saved pages, and snippets. With each of these objects, the user who created or collected that object is shown. Anyone in the group can access an object by clicking on it. For instance, one can click on a query issued by anyone in the group to re-run that query and bring up the results in the main browser window.

Participants

We recruited 84 participants in 42 pairs from UNC Chapel Hill. These participants were asked to come to the lab for two different sessions, which were one to two weeks apart. Since the participants had to sign up in pairs, both the participants in a given pair already knew each other. In addition to this, it was required that the participants in a given pair should have done some collaborative work with each other before; thus, making sure they not only know each other, but also are comfortable working with each other on a collaborative project. The approval of a pair's participation in this study was based on these criteria. Participants were compensated $25 each for their participation in two sessions. Of the 84 participants, 27 were male and 57 were female, and their ages ranged from 17 to 50 with a median age of 21. Several of the pairs were co-workers or spouses. A majority of the participants were undergraduate or graduate students, while a few were university employees.

Sessions

Each individual pair of participants came to the lab for two sessions that were one to two weeks apart. Each session lasted about one and a half hours. The flow for each session is depicted in Figure 3.

During the first session the participants were shown a video tutorial demonstrating the use of Coagmento and the process of collecting relevant information (snippets of text). After the tutorial, the participants were placed in different rooms so that they could not talk to each other directly or see what the other person was doing. Both the participants used typical mid-end PC workstations, running Windows XP, with Ethernet connectivity and 19” monitors.

The supervisor (a researcher) took his place outside, stationed so that he could see both the participants. Once the participants logged in, they filled out a demographic questionnaire and began working on the first task.

As discussed below, the tasks were simulated work tasks. About 20 minutes into their work the researcher sent out a message via the sidebar chat asking them to stop the task and fill in a set of online questionnaires asking about their progress on the task, as if their supervisor had requested an update. This questionnaire included specific questions about the number of webpages they had viewed and bookmarked, the number of search queries they had used, and the number of snippets they had collected. Once both the participants finished their individual questionnaires, they were asked to start working on the second task. The participants were once again interrupted about 20 minutes later, and asked to complete the same questionnaires for that task.

For the second session, the participants were given a refresher of the system and shown how to compile their final report by visiting the workspace and grouping their collected snippets into different categories for a given task. The categories were presented in the task statement and corresponded to different aspects of the work task (see task statements below). The participants were then asked to take their places in the room other than the one they had used the previous session, to address any bias the participants may have had for the machine or the room they used before.

After 15 minutes of additional work on task one, they completed the post-task questionnaire with questions on task progress, and were asked to organize their collected snippets by placing each relevant snippet in one of the categories for a given task. When they had finished organizing their snippets, they worked through the second task, including collecting their information, post-task questionnaire, and organizing the snippets.

Tasks

The participants were asked to collect relevant information for two exploratory tasks that were designed to be realistic work tasks that might be of interest to the participant pool (Borlund and Ingwersen, 1999). Rather than asking participants to create their own organizations for the pertinent snippets, the task statements identified specific issues that should be addressed and these issues were used as organizing bins for the collected snippets. The task descriptions as given to the participants are provided below, which were presented in the same order. Note that both the tasks were relatively similar in their complexity.

Task-1: Economic recession

“A leading newspaper has hired your team to create a comprehensive report on the causes and consequences of the current economic recession in the US. As a part of your contract, you are required to collect all the relevant information from any available online sources that you can find.

To prepare this report, search and visit any website that you want and look for specific aspects as given in the guideline below. As you find useful information, highlight and save relevant snippets. Later, you can use these snippets to compile your report. You may also want to save the relevant websites as bookmarks, but remember - your main objective here is to collect as many relevant snippets as possible.

Figure 3.

Study sessions.

Your report on this topic should address the following issues: reasons behind this recession, effects on some major areas, such as health-care, home ownership, and financial sector (stock market), unemployment statistics over a period of time, proposal, execution, and effects of the economy stimulation plan, and people's opinions and reactions on economy's downfall.”

Task-2: Social networking

“The College Network News Channel wants to do a documentary on the effects of social networking services and software. Your team is responsible for collecting various relevant information (including statistics) from the Web. As a part of your assignment, you are required to collect all the relevant information from any available online sources that you can find.

To prepare this report, search and visit any website that you want and look for specific aspects as given in the guideline below. As you find useful information, highlight and save relevant snippets. Later, you can use these snippets to compile your report. You may also want to save the relevant websites as bookmarks, but remember - your main objective here is to collect as many relevant snippets as possible.

Your report on this topic should address the following issues: emergence and spread of social networking sites, such as MySpace, Facebook, Twitter, and del.icio.us, statistics about popularity of such sites (How many users? How much time they spend? How much content?), impacts on students and professionals, commerce around these sites (How do they make money? How do users use them to make money?), and examples of usage of such services in various domains, such as health-care and politics.”

ANALYSIS

In this section we will present an analysis of the data that we collected and coded. First, we will show how we identified various ISP stages from the study, and what they mean in collaborative setting. We will then discuss affective relevance and its implications. Explanation to our coding method is also provided here.

Mapping ISP stages

Since we did not measure a participant's mental state, nor observed them directly for their behavior during various phases of their tasks, we will depend on the log and other forms of data collected during the study for mapping six different ISP stages for our analysis. These mappings, pertaining to the presented study, are described below.

  • 1Initiation: This is the part when the participants read the task and greet each other. It is measured by the number of chat messages exchanged between the participants during this phase. Due to the interactive nature of our study, we decided to expand this stage to also include the messages that were exchanged in-between stages for checking on each other's status.
  • 2Selection: This is when the participants discuss how they want to divide up the task and proceed. It is measured by the number of messages exchanged discussing the strategy for a given task.
  • 3Exploration: This is mapped to the number of search queries used by a given team.
  • 4Formulation: This is measured by the number of webpages looked at by a given team.
  • 5Collection: This is measured by the number of webpages or snippets collected by a given team.
  • 6Presentation: This occurs only during the second session when the participants are asked to organize their collected snippets. It is measured by the number of moving actions performed by each team on their collected snippets.

Coding chat messages

Several works in CSCW literature have reported studying IM or chat messages in groups. Due to the high variability in study conditions and context, each of these works uses its own scheme for coding and analyzing chat messages (e.g., Handel, 2002; O'Neill, 2003). We decided to code chat messages for the following three aspects.

  • 1Participants' usage of them to coordinate starting of a task (Initiation) or discussing strategies for information seeking (Selection).Present status (Initiation) example:

    User–26: hows it going?

    User–27: It's been somewhat slow but its picking up a bit

    Information seeking strategy (Selection) example:

    User–27: do you want to divide the questions up?

    User–26: sure

    User–26: would you rather do causes or effects?

    User–27: Effects

    User–26: sounds good

  • 2Positive or negative feeling expressed.Such classification is an adaptation of the affective dimension of speech acts described in Losada and Heaphy (2004). In this sense, messages were classified as positive if they involve pleasant feelings, encouragement, positive judgments, satisfaction, and support, among others; on the other hand, negative messages included opposition, sarcasm, dissatisfaction, and so on. Since the dichotomy positive-negative may not apply to certain messages, especially objective ones, neutral category was incorporated into the coding system.
  • 3Perceived relevance expressed.An interesting aspect of the communication in CIS is that users sometimes report to their peers if they find relevant information according to their own criteria. In this manner, in addition to the categories above, expressions such as “Hey! Check this article, it is awesome” and “thanks, this is cool” were also coded as reflecting affective relevance.

Coding for the first aspect was done by three independent coders that achieved Cohen's kappa=0.762 as inter-coder reliability. Coding for the second and the third aspect was done by two independent coders after getting Cohen's kappa=0.773; thus, exhibiting high level of inter-coder reliability.

Results and discussion

To study ISP stages and participants' feelings, as well as affective relevance within a team, we analyzed the coded messages per session. Since each of the two sessions lasted for about 60 minute, discounting the preparation and the ending phases (from login to filling in end-session questionnaire), we divided up each team's session into 60 segments, roughly reflecting one minute each, and looked at six ISP stages as well as polarity of their feelings and the corresponding affective relevance expressed in each of those segments. The combined plots for all the 42 teams for their two sessions are presented in Figures 4 to 7 (shown at the end of the paper). Let us analyze these plots by studying the most prominent parts.

We can see that during the first session (Figure 4), the participants exchanged a few messages in the beginning (task-1) as well as somewhere in the middle (task-2), around the time they were reading a given task (Initiation). Right after these parts, the participants engaged in discussing the task and devising a strategy (Selection). For the rest of the session, we see the participants working through three stages: querying (Exploration), looking through webpages (Formulation), and saving relevant webpages and snippets (Collection).

During the second session (Figure 6), in addition to the above five stages, we also see two distinctive peaks occuring at the end of each task. These peaks represent the parts when the participants were organizing their collected snippets (Presentation).

In addition to the analysis above, the affective dimension of the CIS process was explored through the presence of positive and negative messages in the chat logs, discarding the neutral ones. Here, positive messages were associated to pleasant feelings such as clarity, satisfaction, and relief; while negative messages were related to unpleasant feelings such as uncertainty, confusion, and disappointment. Based on this dichotomy, we can observe abundance of positive messages in comparison to the negative ones during the initial segments of the first session (Figure 5). These messages were typically related to greetings between the participants and their positive attitude prior performing the tasks. After this short period, while both Initiation and Selection stages begin to converge and Exploration increases; negative messages grow and they start varying similarly to positive messages. This may be due to frustration and confusion experienced by users in the exploration stage as well as some problems with the system.

Overall, positive messages are greater (63%) than negative ones (37%); however, in certain segments the latter exceed the former, specially when stages intersect and also in the inflection points of the Collection stage. Moreover, as the participants make a transition from the first task to the second, negative message show a surge. As the second task continues, negative messages decrease and variations occur in a similar way as the first task. It is only in the last segments of the second task that we can once again see an important difference between positive and negative messages.

The second session exhibits similar behaviors as the first one in terms of the categories of the messages; however, a visible distinction can be observed at the end of each task, which coincides with the emergence of the Presentation stage. As presented in Figure 7, abundance of positive feelings and low levels of negative ones appear at the segments associated to this stage. This may be related to the experience of relief and satisfaction of the participants as they finish each task.

The constant variations of the polarity of messages can be linked to the discussion between the participants with regard to the information sources explored and shared. Indeed, it is in these periods where users exposed their individual and affective judgments about the information they collected, in addition to receiving affective (positive or negative) feedback from their collaborator. For each session, approximately 14% of the messages were coded as related to affective relevance.

Although the identification of specific affects within the ISP is limited when done by analyzing sentiments at the linguistic level, the use of the polarity approach allows us to cover the diversity of feelings at macro level. Hence, in contrast to Kuhlthau's ISP, individuals in CIS tasks may experience pleasant and unpleasant feelings in each stage, with the predominance of one over the other in certain points.

In relation to the affective relevance explained above, users expressed their feelings regarding the information they found and shared during the ISP, especially within the segments associated to high levels of Exploration, Formulation, and Collection. In the transitions from one task to another, as well as in the Presentation phase the affective relevance practically vanishes. It was also observed that the selection of relevant information was first done by an individual participant and then subjected to the group's judgment and reflection. Table 2 summarizes the predominant feelings and levels of affective relevance for the ISP stages and also for the transitions among them.

Table 2. Summary of participants' feelings and affective relevance through ISP stages.
ISP StagePredominant FeelingsLevel of Affective Relevance
InitiationPositive and NegativeNo
SelectionPositive and NegativeLow
ExplorationPositiveHigh
FormulationPositiveHigh
CollectionPositiveHigh
PresentationPositiveLow
TransitionsPositive and NegativeLow

We found that for both the sessions, Initiation was highly correlated with Selection (p<0.001), whereas Exploration, Formulation, and Collection were strongly correlated (p<0.001). Not surprisingly, Presentation was negatively correlated with Exploration, Formulation, and Collection (p<0.001).

CONCLUSION

While information seeking is a well-studied subject for individuals, models and theories for explaining information seeking processes (ISP) with collaborative search tasks are lacking. We conducted a laboratory study with 84 participants in 42 pairs, and collected log data as well as the messages the participants exchanged during the study. We showed how various actions and the kinds of messages exchanged in such a CIS project could be mapped to different stages of Kuhlthau's ISP model.

Mining the cumulative data of all the groups over two sessions, we clearly identified the stages of Initiation, Selection, Exploration, Formulation, Collection, and Presentation. In addition, we also analyzed all the chat messages for expressed feelings (positive or negative), and relevance by the participants. Our analysis revealed some interesting insights into ISP stages as well as affective relevance in CIS. We discovered that Exploration, Formulation, and Collection were not distinct stages. In fact, they were found to be highly correlated indicating quick switches between them by the participants. In other words, we discovered that the participants went back and forth between trying search queries, exploring various sources, and collecting relevant information as they worked through the task while interacting with their collaborators.

In a larger picture, we realized that ISP is a reasonable model to begin exploring various information seeking processes that take place during collaborative search tasks, and could provide us interesting insights into individual and group dynamics during a CIS project. In addition to Hyldegard's finding that ISP model lacks social element in a collaborative setting, the work reported here indicates that various ISP stages in CIS setting also need to be considered in the light of affective dimension for the collaborators, as well as group's affective relevance.

Figure 4.

ISP stages for session-1.

Figure 5.

Positive/negative feelings, and effective relevance for session

Figure 6.

ISP stages for session-2.

Figure 7.

Positive/negative feelings, and affective relevance for session-2.

Acknowledgements

This work was supported by the National Science Foundation under grant # IIS 0812363.

Ancillary