SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. BACKGROUND
  5. E-GOVERNMENT
  6. RELEVANCE AND USEFULNESS
  7. ASSESSMENT CRITERIA
  8. TASK-BASED STUDIES OF DOCUMENT ASSESSMENT
  9. METHODS
  10. DESIGN
  11. SCENARIOS & DOCUMENTS
  12. INSTRUMENTS AND PROCEDURES
  13. PARTICIPANTS
  14. DATA ANALYSIS
  15. RESULTS
  16. ASSESSING USEFULNESS
  17. THE EFFECT OF TASK
  18. DISCUSSION
  19. CONCLUSION
  20. Acknowledgements
  21. REFERENCES
  22. APPENDIX

This paper examines a key question in Information Seeking and Retrieval: how do people assess the usefulness of documents? In an experimental study, we presented 25 participants with five task-based search scenarios and asked them to assess and comment on the usefulness of Web documents from the Canadian government domain. Data was analyzed to test for the effect of five information task types: fact-finding, deciding, doing, learning and problem-solving. Participant assessments show a low level of agreement on usefulness scores overall, but consistency varied by task type. The criteria used to assess usefulness varied by level of usefulness, by task type and by participant. Findings contribute to our understanding of consistency in relevance assessments and the impact of tasks on information behaviour.


INTRODUCTION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. BACKGROUND
  5. E-GOVERNMENT
  6. RELEVANCE AND USEFULNESS
  7. ASSESSMENT CRITERIA
  8. TASK-BASED STUDIES OF DOCUMENT ASSESSMENT
  9. METHODS
  10. DESIGN
  11. SCENARIOS & DOCUMENTS
  12. INSTRUMENTS AND PROCEDURES
  13. PARTICIPANTS
  14. DATA ANALYSIS
  15. RESULTS
  16. ASSESSING USEFULNESS
  17. THE EFFECT OF TASK
  18. DISCUSSION
  19. CONCLUSION
  20. Acknowledgements
  21. REFERENCES
  22. APPENDIX

Making decisions about web documents is not a simple matter. Decades of research into the concept of relevance and relevance assessments have shown this to be the case: relevance is multi-dimensional, situational, dynamic and influenced by a range of contextual variables (Borlund, 2003a). As Harter (1996) points out, the inevitable result of this complexity is a lack of consistency in assessments that has never been fully addressed in experimental information retrieval (IR) research. Within the interactive IR research community, there has been a steady adoption of richer and more structured assigned search scenarios (Borlund, 2003b) in an attempt to provide study participants with enough context to simulate natural assessment behaviour. However, we still know very little about the impact of specific contextual variables on searchers' perceptions of the relevance or utility of documents.

The type of task in which a searcher is engaged is one aspect of context that is commonly considered to play a role in how searchers assess information. In purposive, goal-directed search domains, such as workplace search, serious leisure, or the case of e-government, which is our focus in this research, the “embedding task” (Pirolli, 2007) establishes the criteria for assessing usefulness and for determining success. Therefore, there is good reason to believe that the type of task may account for some of the variation in how people assess documents. While a small number of studies have examined the relationship between task types and document assessments (Larsen, Malik, & Tombros, 2008; Tombros, Ruthven & Jose, 2005; Kelly, 2006), this territory remains largely uncharted.

The identification of robust task effects on usefulness assessments has the potential to contribute to the design of improved search systems, as noted by Järvelin and Ingwersen (2004). In the e-government domain, the necessity of ensuring that citizens are able to access the Web-based information and services they require to run their own lives and participate in the body politic, makes the design of systems better able to predict the relevance or utility of government information a priority.

This work examines task-centred assessments of Web documents in the e-government domain through an experimental study with 25 participants. The aim of the study was to better understand how people assess information in this domain, including assessment consistency and the criteria used to make assessments. In order to determine if assessments are affected by the nature of the embedding task, we tested for the effect of five different information task types: fact-finding, deciding, doing, learning and problem-solving.

E-GOVERNMENT

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. BACKGROUND
  5. E-GOVERNMENT
  6. RELEVANCE AND USEFULNESS
  7. ASSESSMENT CRITERIA
  8. TASK-BASED STUDIES OF DOCUMENT ASSESSMENT
  9. METHODS
  10. DESIGN
  11. SCENARIOS & DOCUMENTS
  12. INSTRUMENTS AND PROCEDURES
  13. PARTICIPANTS
  14. DATA ANALYSIS
  15. RESULTS
  16. ASSESSING USEFULNESS
  17. THE EFFECT OF TASK
  18. DISCUSSION
  19. CONCLUSION
  20. Acknowledgements
  21. REFERENCES
  22. APPENDIX

Digital government or e-government refers to the practice of making government information and services available to the public via the Internet. Governments are among the largest and most important providers of information to the public (Mitchinson & Ratner 2004) and use of online government information has grown rapidly (Horrigan, 2004; Larsen & Rainie, 2002). Common uses of U.S. government websites are to find tourism and recreation information, do research for work or school, download forms, find out what services an agency provides, and obtain information about public policy, health or safety issues (Larsen & Rainie, 2002). Furthermore, finding government information is essential for citizens in the course of key life events, such as paying taxes and getting married (Haraldsen, Päivärinta, Sein, & Stray, 2004; Vintnar & Leben, 2002).

To date, user-centred e-government research has focused more on why people use or don't use e-government and what they use it for, than on how they go about searching for, assessing and selecting Web-based government information.

RELEVANCE AND USEFULNESS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. BACKGROUND
  5. E-GOVERNMENT
  6. RELEVANCE AND USEFULNESS
  7. ASSESSMENT CRITERIA
  8. TASK-BASED STUDIES OF DOCUMENT ASSESSMENT
  9. METHODS
  10. DESIGN
  11. SCENARIOS & DOCUMENTS
  12. INSTRUMENTS AND PROCEDURES
  13. PARTICIPANTS
  14. DATA ANALYSIS
  15. RESULTS
  16. ASSESSING USEFULNESS
  17. THE EFFECT OF TASK
  18. DISCUSSION
  19. CONCLUSION
  20. Acknowledgements
  21. REFERENCES
  22. APPENDIX

Relevance, the quality of one thing having direct bearing on another, has served as the conceptual basis for assessing information objects with respect to human information needs for decades. A substantial scholarly discourse has examined the concept in detail (see for example, Saracevic, 1996; Borlund, 2003a; Mizzaro, 1998; Cosijn & Ingwersen, 2000), identifying multiple levels or dimensions of relevance and some of the factors that influence relevance assessments.

From the outset, information retrieval research has focused primarily on topical relevance, rather than on the more complex notion of situational relevance (Cooper, 1971; Wilson, 1973). However, user-based approaches to IS&R have repeatedly emphasized the importance of the user's situation and/or task in assessing relevance. Borlund (2003a) defined situational relevance as a “user-centred, empirically based and realistic as well as potentially dynamic type of relevance…[which] expresses the relationship between the user's perception of usefulness of a retrieved information object, and a specific work task situation” (p. 922). Mizzaro's (1998) multi-dimensional model of relevance also emphasized the role of tasks, suggesting that certain characteristics of information are more desirable than others in the context of different tasks. The central role of task-based measures in determining relevance has also been demonstrated empirically (Toms et al., 2005). A natural extension of this path leads to the question of whether usefulness, which represents the value and/or applicability of something to a given situation, task or goal, is simply one dimension of relevance, or if usefulness is, perhaps, a more valid overarching concept for the evaluation of interactive IR than relevance (Belkin et al., 2009, Belkin, 2010) In this work, we have opted to focus on usefulness, while recognizing its close relationship with relevance.

ASSESSMENT CRITERIA

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. BACKGROUND
  5. E-GOVERNMENT
  6. RELEVANCE AND USEFULNESS
  7. ASSESSMENT CRITERIA
  8. TASK-BASED STUDIES OF DOCUMENT ASSESSMENT
  9. METHODS
  10. DESIGN
  11. SCENARIOS & DOCUMENTS
  12. INSTRUMENTS AND PROCEDURES
  13. PARTICIPANTS
  14. DATA ANALYSIS
  15. RESULTS
  16. ASSESSING USEFULNESS
  17. THE EFFECT OF TASK
  18. DISCUSSION
  19. CONCLUSION
  20. Acknowledgements
  21. REFERENCES
  22. APPENDIX

Numerous studies of how people assess documents have identified a large pool of criteria, many of which relate to non-topical characteristics and features of information objects, as well as to aspects of the searcher and the situation (Barry, 1994; Park, 1993; Schamber, 1991; Wang 1994). A meta-ethnography of sixteen relevance studies was able to consolidate 133 criteria to 14, including document-based criteria: topicality, discipline, novelty, quality, recency, rigor, saturation and visibility; and situation-based criteria: availability, accessibility, affordability, intelligibility, social, and serendipity (Bales & Wang, 2005). Xu & Chen (2006) recently proposed and tested a model of relevance based on Grice's maxims on human communication consisting of five criteria: scope, novelty, reliability, topicality, and understandability. The study found that topicality and novelty were the most important factors and that scope did not play a significant role in determining relevance; however, these results are of limited value due to the use of de-contextualized topics as experimental search tasks. Over all these studies, there is substantial agreement as to the main categories of assessment criteria, but very little is known about the effect of contextual variables on the criteria employed or their relative importance.

TASK-BASED STUDIES OF DOCUMENT ASSESSMENT

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. BACKGROUND
  5. E-GOVERNMENT
  6. RELEVANCE AND USEFULNESS
  7. ASSESSMENT CRITERIA
  8. TASK-BASED STUDIES OF DOCUMENT ASSESSMENT
  9. METHODS
  10. DESIGN
  11. SCENARIOS & DOCUMENTS
  12. INSTRUMENTS AND PROCEDURES
  13. PARTICIPANTS
  14. DATA ANALYSIS
  15. RESULTS
  16. ASSESSING USEFULNESS
  17. THE EFFECT OF TASK
  18. DISCUSSION
  19. CONCLUSION
  20. Acknowledgements
  21. REFERENCES
  22. APPENDIX

While considerable work has been done on task-based information searching since the publication of Vakkari's seminal review article in 2003, research in this area is still somewhat scattered, with few definitive results (Hansen, 2005). Valuable conceptual work has been done to distinguish different levels of tasks, including work tasks, information seeking tasks and search tasks (Byström & Hansen, 2005) and characteristics of tasks (Hert & Marchionini, 1997; Kim & Soergel, 2005; Li, 2004). Methodological work is also ongoing to establish best practices for creating experimental search tasks (Borlund, 2003b; Wildemuth & Freund, 2009).

Prior research has identified significant relationships between various characteristics of tasks, such as stage of completion, and aspects of information behaviour, such as selection of channels and query formulation (Vakkari, 2003). Experimental studies have also identified some tentative relationships between task types and document assessments that are worthy of further examination. A study of municipal employees' information seeking behaviour found that as work task complexity increased, study participants drew upon a larger number and broader range of information types (Byström, 2002). In another workplace study, Freund (2008) identified a set of five common information task types: fact-finding, deciding, doing, learning, and problem-solving, and found a relationship between these tasks and the utility of document genres in use in the environment.

Toms and colleagues (2008) studied the effect of two task characteristics, topic structure (parallel and hierarchical) and information task type (fact-finding, information gathering and decision-making) on querying behaviour in a Web-based information system. They found that more queries were used for fact-finding and decision making, but information gathering queries were longer. Searchers viewed and assessed significantly fewer pages for fact-finding tasks. Using the same search tasks, Larsen, Malik and Tombros (2008) conducted an analysis of graded usefulness assessments carried out in the construction of test collections for XML retrieval. They found that there was little agreement (and indeed little overlap) between two types of assessments carried out on the collection (ad-hoc and interactive). In general, assessors identified more relevant information for information gathering tasks and less for fact-finding, with decision-making tasks falling in the middle.

Tombros and colleagues (2005) studied the impact of three types of search tasks (background, decision making and collecting) as well as task stage and search time, on the document features searchers reported using to assess the utility of Web pages. Overall, they found that content features were the most frequently mentioned. Mentions of content, layout and authority features were quite evenly distributed across the three tasks. However, other features seemed to be associated with certain tasks. For example, links were mentioned more frequently for the information collecting task and the background task.

Taking a different approach to a similar question, two studies were conducted to compare the frequency of html features in documents assessed as relevant and non-relevant for two types of search tasks: procedural and fact-finding (Kelly et al., 2002; Murdock et al., 2007). Findings showed task effects for some elements, such as contact information, forms and tables.

DESIGN

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. BACKGROUND
  5. E-GOVERNMENT
  6. RELEVANCE AND USEFULNESS
  7. ASSESSMENT CRITERIA
  8. TASK-BASED STUDIES OF DOCUMENT ASSESSMENT
  9. METHODS
  10. DESIGN
  11. SCENARIOS & DOCUMENTS
  12. INSTRUMENTS AND PROCEDURES
  13. PARTICIPANTS
  14. DATA ANALYSIS
  15. RESULTS
  16. ASSESSING USEFULNESS
  17. THE EFFECT OF TASK
  18. DISCUSSION
  19. CONCLUSION
  20. Acknowledgements
  21. REFERENCES
  22. APPENDIX

We conducted an experimental user study with a within-subjects design to collect graded usefulness assessments of documents for 20 scenarios. Twenty five participants each completed five scenarios, one per task, and assessed 8 documents for each scenario. The five task types are those identified by Freund (2008). Documents were retrieved from the Canadian federal government web domain (gc.ca) using the Google search engine. Each of the 160 documents used in the study was assessed by at least five participants.

SCENARIOS & DOCUMENTS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. BACKGROUND
  5. E-GOVERNMENT
  6. RELEVANCE AND USEFULNESS
  7. ASSESSMENT CRITERIA
  8. TASK-BASED STUDIES OF DOCUMENT ASSESSMENT
  9. METHODS
  10. DESIGN
  11. SCENARIOS & DOCUMENTS
  12. INSTRUMENTS AND PROCEDURES
  13. PARTICIPANTS
  14. DATA ANALYSIS
  15. RESULTS
  16. ASSESSING USEFULNESS
  17. THE EFFECT OF TASK
  18. DISCUSSION
  19. CONCLUSION
  20. Acknowledgements
  21. REFERENCES
  22. APPENDIX

We designed 20 situated work task scenarios (Borlund, 2003b) related to two government domains of considerable public interest: health and the environment. Four scenarios (two about health, two about environment) were created for each of the following five information task types (Freund, 2008):

Fact-Finding: seeking information with the goal of finding specific factual information to answer a well-specified question or fill a known gap in knowledge;

Deciding: seeking information with the goal of identifying and comparing alternatives in order to determine a course of action;

Doing: seeking information with the goal of accomplishing some procedural task by identifying the steps to take and the issues involved;

Learning: seeking information with the goal of becoming familiar with an unfamiliar topic; gaining a general orientation and an understanding of key concepts;

Problem-Solving: seeking information with the goal of identifying possible courses of action that would result in correcting a malfunction or overcoming some obstacle, in particular when the cause of the problem is unknown.

Five sample scenarios are presented in the Appendix and the full set of scenarios is available from the project website11

Post-task questionnaires asked participants to rate each scenario on 7-point Likert scales for realism (How realistic is the scenario you just completed as an example of a real-life search problem?) and difficulty (How difficult do you think it would be for an average person to find the information needed for this scenario by searching the Internet?). The mean realism rating was 5.24 and the mean difficulty rating was 3.32 (1=low; 7 =high) and ANOVAs on these measures found no significant differences across the 20 scenarios, suggesting that participants generally considered the scenarios to be quite realistic and neither very easy nor very difficult.

We formulated simple keyword queries to represent each scenario and used these to retrieve documents from the gc.ca domain using the Google search engine. From each set of 20 top-ranked documents per scenario, we manually selected eight to be used in the study.

INSTRUMENTS AND PROCEDURES

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. BACKGROUND
  5. E-GOVERNMENT
  6. RELEVANCE AND USEFULNESS
  7. ASSESSMENT CRITERIA
  8. TASK-BASED STUDIES OF DOCUMENT ASSESSMENT
  9. METHODS
  10. DESIGN
  11. SCENARIOS & DOCUMENTS
  12. INSTRUMENTS AND PROCEDURES
  13. PARTICIPANTS
  14. DATA ANALYSIS
  15. RESULTS
  16. ASSESSING USEFULNESS
  17. THE EFFECT OF TASK
  18. DISCUSSION
  19. CONCLUSION
  20. Acknowledgements
  21. REFERENCES
  22. APPENDIX

The study was self-guided and hosted online, but was conducted in a university computer lab to provide a more controlled environment. Participants were shown to workstations, given an overview of the experiment, instructed to quickly assess the documents as they normally would when conducting a search for information, and provided with a printed copy of the instructions for reference. Participants were asked to complete five scenarios, which were rotated to distribute the scenarios and task types evenly among participants and to compensate for order effects.

The study consisted of five parts: an instruction page, a demographic questionnaire, the assessment tasks, post-task questionnaires following each assessment task, and an exit questionnaire. The document assessment screen displayed the scenario description across the top, a document viewer pane below it to the left, the document assessment questionnaire to the right and navigation buttons at the bottom right. The assessment tasks asked participants to rate each document on a 7-point Likert scale (1=Not at all useful to 7=Very useful) and to comment on how they decided what score to assign. The post-scenario questionnaire asked participants to rate the realism and difficulty of the scenario, and the set of eight documents just assessed as a source of information for the scenario. The exit questionnaire asked participants to rate how difficult they found it to assess documents and to explain their assessments, and to identify challenges they experienced in the assessment task.

PARTICIPANTS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. BACKGROUND
  5. E-GOVERNMENT
  6. RELEVANCE AND USEFULNESS
  7. ASSESSMENT CRITERIA
  8. TASK-BASED STUDIES OF DOCUMENT ASSESSMENT
  9. METHODS
  10. DESIGN
  11. SCENARIOS & DOCUMENTS
  12. INSTRUMENTS AND PROCEDURES
  13. PARTICIPANTS
  14. DATA ANALYSIS
  15. RESULTS
  16. ASSESSING USEFULNESS
  17. THE EFFECT OF TASK
  18. DISCUSSION
  19. CONCLUSION
  20. Acknowledgements
  21. REFERENCES
  22. APPENDIX

Participants were adults recruited from within the University of British Columbia community using listservs and flyers. Inclusion criteria were: a minimum age of 22, English language fluency and citizenship or long-time residency in Canada. These were meant to ensure comfort with English language documents and some familiarity with Canadian government information. The 14 male and 11 female participants ranged in age from 22 to over 50, with the majority between 22 and 25 years of age. Most were undergraduate (n=12) and graduate (n=8) students representing a wide range of academic disciplines within the sciences, social sciences and the humanities.

All participants reported being regular users of the Internet. Self-reported data on how often participants seek out different types of information on the Internet is summarized in Figure 1. Of the five categories of information, government information and services are sought least frequently.

When asked, “If you had to find information about a new Federal government initiative on global warming, where would you most likely start searching for that information?” participants most often responded by naming a popular search engine (n=13). A smaller number of responses (n=7) indicated some combination of a search engine and a government website. Four participants indicated that they would go directly to a government website, and one of the four noted the actual domain of the Canadian Federal government (gc.ca).

DATA ANALYSIS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. BACKGROUND
  5. E-GOVERNMENT
  6. RELEVANCE AND USEFULNESS
  7. ASSESSMENT CRITERIA
  8. TASK-BASED STUDIES OF DOCUMENT ASSESSMENT
  9. METHODS
  10. DESIGN
  11. SCENARIOS & DOCUMENTS
  12. INSTRUMENTS AND PROCEDURES
  13. PARTICIPANTS
  14. DATA ANALYSIS
  15. RESULTS
  16. ASSESSING USEFULNESS
  17. THE EFFECT OF TASK
  18. DISCUSSION
  19. CONCLUSION
  20. Acknowledgements
  21. REFERENCES
  22. APPENDIX

Quantitative data examined in this analysis consists of interval data collected using Likert scales and continuous timestamp data collected as participants moved through the experimental system. Qualitative data consists of participants' comments explaining how they assessed the usefulness for each document. Comments were coded using inductive content analysis to identify the criteria mentioned and group them into related meta-categories for statistical analysis. Correlation, cross-tabulation, reliability and variance analyses were conducted on the data using SPSS 13.

thumbnail image

Figure 1. Frequency of seeking different categories of Internet information

Download figure to PowerPoint

ASSESSING USEFULNESS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. BACKGROUND
  5. E-GOVERNMENT
  6. RELEVANCE AND USEFULNESS
  7. ASSESSMENT CRITERIA
  8. TASK-BASED STUDIES OF DOCUMENT ASSESSMENT
  9. METHODS
  10. DESIGN
  11. SCENARIOS & DOCUMENTS
  12. INSTRUMENTS AND PROCEDURES
  13. PARTICIPANTS
  14. DATA ANALYSIS
  15. RESULTS
  16. ASSESSING USEFULNESS
  17. THE EFFECT OF TASK
  18. DISCUSSION
  19. CONCLUSION
  20. Acknowledgements
  21. REFERENCES
  22. APPENDIX

In this section, we report on findings with respect to usefulness assessments at three levels: participants, scenarios and individual document assessments.

Participant Level Analysis

On average, it took participants about 1.6 minutes to complete each of 40 assessments (M=97 s; SD=71 s). Experimental sessions took about 1.5 hours on average. Over the whole study, the time spent assessing documents was negatively correlated with the order in which documents were assessed (r(973)=−.141, p<.001), indicating that assessments were made more quickly as sessions progressed.

On a 7-point scale, participants rated the difficulty of assessing the usefulness of documents at M = 3.4, SD = 1.6 and of explaining why a given document was useful or not at M = 3.7, SD = 1.8.

In feedback collected at the end of each session, participants identified a number of challenges in assessing usefulness, including dealing with lengthy documents, distinguishing between documents that were similar to one another, and deciding how to assess documents with links. Some participants noted that it was difficult to assess trustworthiness and value from the documents on their own, without following links or seeing the document in the context of others. One participant found that the requirements of the scenario made assessment more difficult: “hard to judge it when it was a good report, just not for me!” One participant argued strongly that “the usefulness of a document is very subjective to the person's preference of input,” noting, for example, that some people prefer pictures, and other text. Finally, one participant indicated that it was difficult to focus on the assessment task for so long: “i just got bored near the end. i dont study this long. its hard to expect people to respond fully to the later scenarios after doing these for so long.”

Scenario Level Analysis

At the end of each scenario, participants were asked to rate their familiarity with the scenario, its realism, its difficulty, and the overall usefulness of the set of documents viewed. A correlation analysis of these measures found a positive correlation between realism of the scenario and overall usefulness of the document set (r(122)=.386, p<.001) and a slight but significant negative correlation between difficulty and overall usefulness (r(122)=−.194, p<.05).

thumbnail image

Figure 2. Distribution of Usefulness Scores

Download figure to PowerPoint

Document Level Analysis

A total of 992 usefulness assessments were made over the 160 documents in the study. While scores of 1 and 7, representing, respectively, “Not at all useful” and “Very useful” were selected most frequently, the distribution of assessments among the 7 points of the scale is relatively even, as can be seen in Figure 2.

In our dataset, each document was assessed by 5 to 7 participants based on detailed task scenarios. To determine the consistency of these assessments we used Intraclass Correlation Coefficients (ICCs), which take into account both rater consensus and consistency (LeBreton & Senter, 2008). As this is a compound measure ranging from 0 to 1, high ICC values indicate a high level of consensus on the values of assigned scores and consistency of relative rank between them, while low values may indicate that only one of them is low. Following Shrout and Fleiss (1979), we ran a one-way random model on data from five assessors for each document. The Single Measure ICC over the whole dataset is .284, indicating a very low level of agreement in these scores. This finding is reinforced if we consider only ratings at the two ends of the scale, as is typical of studies that use binary relevance assessments. Of the 160 documents assessed, 46% received both very low (1 or 2) and very high (6 or 7) scores from different assessors.

To further investigate this substantial lack of agreement in usefulness scores, we calculated the Average Deviation (AD) of scores for each document using MS excel and sorted them to identify documents with the least agreement. We identified a set of 15 documents with very high AD scores (≥2.0 on a 7 point scale) and examined the documents and the participants' comments to better understand this phenomenon.

Comments indicate that some participants rate a document highly even if it contains information about only one aspect of the scenario. In a scenario asking participants to look for local bird-watching sites to visit with an enthusiast, several found a document with general background information on birds highly relevant. Two other participants commented that the same document contained "NO info on where to see birds" and that "[a]vid bird watcher knows all this". These same participants found a document with regional tourist information and very little information about birds highly relevant.

Another basis for disagreement seems to stem from different views on the importance of relevant links versus content within documents. There are a number of such examples in this set. For example, one document that consists of a list of links to external sources was rated 7 by a participant who wrote, "this is what I would have wanted–or at least it provides a list of topics in which the one I wanted to find was present." Several others felt that there was too much information on this page and assigned it low scores. In another scenario, participants disagreed on the usefulness of a cover page and table of contents that provided a link to a report that was clearly useful to the scenario. One participant rated this a 2, noting that it, "doesn't tell me much other then (sic.) where to find everything in the document". Another rated it a 7, commenting, “Exact information".

While the range of assessments in certain cases points to aspects of the document that have been interpreted differently, in a few cases it highlights differences in participants' personal preferences. In scenario 13, for example, participants were asked to find information to help protect apple trees from pests, including the possibility of using pesticides. One of the documents clearly describes an environmentally friendly method of getting rid of this pest. Ratings of this document span the entire scale. Participants who assigned low usefulness scores commented that it is "not the information I am looking for" and that it "[g]ives information on how to care for plants rather than risks associated with pesticide use." Participants who reported usefulness scores in the middle of the scale noted that the document may be biased and does not discuss pesticides at length. Those who rated the document as highly useful commented that it is "just what I was looking for", "lots of useful and relevant info.", and "great. straightforward information. provides how to treat the problem. why go anywhere else for info?"

Analysis of Relevance Criteria

To more fully understand the criteria and features of documents that participants in the study used when assigning usefulness scores, we conducted a content analysis of their comments in response to the question “How did you decide what score to assign this document?” Our analysis is based on 833 comments from 22 participants12 .

We identified six main codes, some of which have sub-codes, and assigned one or more codes per comment as applicable. Note that the categories do not differentiate between positive or negative references to the criteria; these are grouped together in this analysis. The main codes together with frequencies across all comments are shown in Table 1. Codes were developed iteratively and inductively from a close reading of the comments in conjunction with the assigned scenarios and the usefulness scores. One of the authors developed the coding scheme and coded the entire document set. An inter-rater reliability assessment based on a subset of 100 comments coded by the second author showed rates of agreement of between 64% and 88% depending on the criteria category.

Table 1. Frequency of Usefulness Criteria
Usefulness Criteria CodesFreq.%
Topic – refers to the subject matter, the specific focus or the coverage of a topic.44629%
Sub-codes: Topic Coverage, Level of Detail  
Situation – refers to the usefulness or suitability to the scenario.37224%
Sub-codes: Audience, Location/Context  
Purpose – refers to the purpose, intent or genre of the document.29419%
Presentation – refers to the layout, design and navigation features of the document.19012%
Sub-codes: Design, Links/References, images, Information Findability.  
Quantity – refers to the length or the amount of information contained in it.1429%
Quality – refers to some aspect of document quality.  
Sub-codes: Currency, Credibility, Readability, Importance/Interest.1017%
Number of codes assigned to 833 comments1545100%

Not surprisingly, topical relevance is the most commonly mentioned criteria, followed by the suitability of the information to the situation, and the purpose of the information. Discussions of information quality, quantity and presentation were less common. Of the sub-codes, the most commonly mentioned criteria were: Links, Topic Coverage, Level of Detail and Audience. The latter seems to have been prompted by the large number of documents addressed to a particular audience, such as pregnant women, children, or First Nations people.

Table 2. CrossTab: Criteria by Usefulness Score
 x2 valuedfSig.Eta*
  1. *criteria dependent upon usefulness level

Topical25.5636<.001.177
Situation21.1156<.001.160
Purpose31.9226<.001.197
Quantity22.5086<.001.166
Quality11.0996ns 
Presentation41.5996.001.225

To better understand how participants drew upon these criteria when assessing documents that varied in usefulness, we conducted a crosstab analysis of criteria frequencies across the 7 levels of usefulness (Table 2). Chi Square tests are significant for all criteria except for Quality, rejecting the hypothesis that the remaining criteria are evenly distributed.

thumbnail image

Figure 3. Criteria by Usefulness Score

Download figure to PowerPoint

The effect sizes indicated by Eta values are in the medium range, and strongest for Presentation and Purpose. A plot of this data (Figure 3) shows some interesting patterns. Topic is mentioned more on the two extremes of the scale, when documents are either not at all useful or very useful. In contrast, Purpose is mentioned more for documents with mid-range scores, and Situation peaks at both ends and in the middle. Presentation shows a clear trend of steadily increasing as the usefulness score increases.

THE EFFECT OF TASK

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. BACKGROUND
  5. E-GOVERNMENT
  6. RELEVANCE AND USEFULNESS
  7. ASSESSMENT CRITERIA
  8. TASK-BASED STUDIES OF DOCUMENT ASSESSMENT
  9. METHODS
  10. DESIGN
  11. SCENARIOS & DOCUMENTS
  12. INSTRUMENTS AND PROCEDURES
  13. PARTICIPANTS
  14. DATA ANALYSIS
  15. RESULTS
  16. ASSESSING USEFULNESS
  17. THE EFFECT OF TASK
  18. DISCUSSION
  19. CONCLUSION
  20. Acknowledgements
  21. REFERENCES
  22. APPENDIX

In this section, we report on task effects on usefulness assessments at the scenario level and the individual document assessment level.

Scenario Level Analysis

An analysis of post scenario user perception measures by the five task types does not reveal any clear differences. Mean scores do vary. For example, Difficulty ranges from M = 2.88 for Fact Finding to M = 4.08 for Problem Solving, and Overall Usefulness ranges from M = 3.6 for Doing to M = 4.76 for Learning. However, an ANOVA on these scores shows no significant differences.

Document Level Analysis

As reported above, the Single Measure ICC for the entire dataset is .284. When we tested the consistency of usefulness assessments at the task level, we found considerable variation that is masked in the general measure. Scores in order of decreasing consistency are: .437 (fact finding), .392 (deciding), .312 (doing), .190 (learning) and .064 (problem solving). Although the highest of these only indicates a moderate level of consistency in scores, these results do indicate that there is considerably more agreement when assessing documents for fact-finding tasks than for learning or problem solving tasks.

Analysis of Relevance Criteria

Based on the coding of usefulness comments described above, we conducted a crosstab analysis of criteria frequencies across the five task types (Table 3). Chi Square tests are significant for all criteria, although the result for Presentation is borderline. While results indicate that frequencies of criteria are not evenly distributed across these tasks, the Lambda and Goodman-Kruskal tau measures are low, indicating that the effects are not strong. The strongest task variation is for situational criteria, where it explains in the range of 4%–11% of the variation, and for topic, where the range is 2%–8%

Table 3. CrossTab: Criteria by Task Type
 x2 valuedfSig.Lambda*G-K tau*
  1. Criteria dependent upon usefulness level

Topical22.3904<.001.088.027
Situation36.3864<.001.118.044
Purpose13.3544<.05.000.016
Quantity13.1414<.05.000.016
Quality12.6344<.05.000.015
Presentatio n9.4224=.051.000.011

While the effect of task does not seem to be a major source of variation in assessments, the patterns that emerge do make sense. Topical criteria are most frequently cited for Learning and Deciding tasks and least frequently for Doing, while Purpose criteria are most frequent for Doing. Situational is highest for Problem Solving and Doing and lowest for Fact Finding. Quality and Presentation are highest for Fact Finding and Doing.

Table 4. CrossTab: Criteria by Participant
 x2 valuedfSig.Lambda*G-K tau*
  1. Criteria dependent upon usefulness level

Topical92.95720<.001.223.112
Situation100.84520<.001.237.121
Purpose126.96220<.001.102.153
Quantity76.09420.<001.000.091
Quality62.95120<.001.000.076
Presentatio n101.33920<.001.095.122

Given that the task effect, while significant, seems to be relatively minor, we decided to explore the usefulness criteria data further. We conducted another crosstab analysis of criteria frequencies across all participants in the study to test for an effect of individual differences (Table 4).

Chi Square tests are significant for all criteria and effects are much stronger than the analysis by task. For Topical and Situational in particular, as much as 23% of the variation may be the results of individual patterns of assessing and describing usefulness.

DISCUSSION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. BACKGROUND
  5. E-GOVERNMENT
  6. RELEVANCE AND USEFULNESS
  7. ASSESSMENT CRITERIA
  8. TASK-BASED STUDIES OF DOCUMENT ASSESSMENT
  9. METHODS
  10. DESIGN
  11. SCENARIOS & DOCUMENTS
  12. INSTRUMENTS AND PROCEDURES
  13. PARTICIPANTS
  14. DATA ANALYSIS
  15. RESULTS
  16. ASSESSING USEFULNESS
  17. THE EFFECT OF TASK
  18. DISCUSSION
  19. CONCLUSION
  20. Acknowledgements
  21. REFERENCES
  22. APPENDIX

Our results confirm and deepen our understanding of the complexity and variability of human assessments of information. While the individuals who participated in our study proved to be quite capable and efficient at drawing upon a wide range of criteria to make nuanced assessments of usefulness, these assessments were highly inconsistent from one person to the next. This Goldilocks effect, in which each person seems to be seeking information that is “just right” from a personal perspective, was indicated by the low overall ICC score and the significant variation in usefulness criteria by participant. Study participants themselves noted that assessment is more difficult when documents are lengthy, viewed out of context, or contain many links. Furthermore, individual preferences, varying worldviews, differing approaches to scoring documents containing partial or peripheral information, and differing interpretations of the scenario all played a part in the lack of agreement.

Within the context of particular information tasks, this variation diminishes somewhat and certain patterns emerge. Participants engaged in fact-finding tasks, which have a clear and specific goal, were more likely to agree on what constituted a useful document and they found quality and presentation criteria most important. Doing and Deciding tasks were associated with a low to mid level of consistency in assessments. For Doing tasks, important criteria are situation, purpose, quality, and presentation, likely because they require information that is intended to support a particular activity, rather than being focused on a particular topic. In contrast, Deciding and Learning tasks are heavily focused on Topical criteria. Learning and Problem Solving tasks have the least specific goals, and they are both associated with very low levels of consistency in assessment. However, Problem Solving tasks are most associated with Situational criteria.

An additional finding of this study is that different assessment criteria come into play at different levels of usefulness. Documents that are not useful seem to be judged primarily on the basis of Topical, and to a lesser extent Situational, criteria. In the middle range, participants mention Topical criteria less, likely because it is assumed that the documents are on topic. The more salient criteria here are the secondary ones such as Purpose or Quality, which explain why a document was not assigned a higher score. For the most useful documents, participants mention a wide range of criteria, including a re-emphasis on Topical and Situational and an increased emphasis on Presentation.

Overall, participants mentioned topical usefulness criteria most often, which reinforces findings of other studies. However, situational and purpose criteria play a more dominant role here than has been shown in studies to date. In contrast, Xu & Chen (2006), who asked participants to assess documents with respect to a topic, found that novelty, was second in importance to topic. The difference is likely a function of the methodology used in this study, namely the focus on usefulness rather than relevance and the use of structured situated work task situations (SWTS) as experimental tasks rather than topics. One of the implications of this work is that the use of de-contextualized topical and factual search tasks to study people and systems is likely to inform artificially simple and cohesive models that will be of little use in predicting natural search behaviour.

CONCLUSION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. BACKGROUND
  5. E-GOVERNMENT
  6. RELEVANCE AND USEFULNESS
  7. ASSESSMENT CRITERIA
  8. TASK-BASED STUDIES OF DOCUMENT ASSESSMENT
  9. METHODS
  10. DESIGN
  11. SCENARIOS & DOCUMENTS
  12. INSTRUMENTS AND PROCEDURES
  13. PARTICIPANTS
  14. DATA ANALYSIS
  15. RESULTS
  16. ASSESSING USEFULNESS
  17. THE EFFECT OF TASK
  18. DISCUSSION
  19. CONCLUSION
  20. Acknowledgements
  21. REFERENCES
  22. APPENDIX

This work represents one step towards dealing with the Goldilocks effect in information assessment by taking into account the searchers' context and patterns of behaviour. Findings reinforce the notion that task effects shape assessment behaviour and clarify some of the differences between the task types examined here. The particular patterns of behaviour identified are valid with respect to the e-government domain, but given the generic nature of the information task types used, findings are likely to be applicable within other information domains.

This work is part of a broader project studying information access and use in e-government (E-Informing the Public). The next phase will build on these results through a naturalistic user study of members of the public searching for online government information.

Acknowledgements

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. BACKGROUND
  5. E-GOVERNMENT
  6. RELEVANCE AND USEFULNESS
  7. ASSESSMENT CRITERIA
  8. TASK-BASED STUDIES OF DOCUMENT ASSESSMENT
  9. METHODS
  10. DESIGN
  11. SCENARIOS & DOCUMENTS
  12. INSTRUMENTS AND PROCEDURES
  13. PARTICIPANTS
  14. DATA ANALYSIS
  15. RESULTS
  16. ASSESSING USEFULNESS
  17. THE EFFECT OF TASK
  18. DISCUSSION
  19. CONCLUSION
  20. Acknowledgements
  21. REFERENCES
  22. APPENDIX

This work was funded by a Social Science and Humanities Research Council Grant and a University of British Columbia HSS Grant to the first author. Thanks to research assistants: Francesca de Freitas, Amanda Leinberger and Christina Nilsen who contributed at various stages of this project, and to the participants who took part in the study.

REFERENCES

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. BACKGROUND
  5. E-GOVERNMENT
  6. RELEVANCE AND USEFULNESS
  7. ASSESSMENT CRITERIA
  8. TASK-BASED STUDIES OF DOCUMENT ASSESSMENT
  9. METHODS
  10. DESIGN
  11. SCENARIOS & DOCUMENTS
  12. INSTRUMENTS AND PROCEDURES
  13. PARTICIPANTS
  14. DATA ANALYSIS
  15. RESULTS
  16. ASSESSING USEFULNESS
  17. THE EFFECT OF TASK
  18. DISCUSSION
  19. CONCLUSION
  20. Acknowledgements
  21. REFERENCES
  22. APPENDIX
  • Bales, S., & Wang, P. (2005). Consolidating user relevance criteria: A meta-ethnography of empirical studies. In Proceedings of the 68th Annual Meeting of the American Society for Information Science and Technology. Silver Spring, MD: ASIS&T.
  • Barry, C. L. (1994). User-defined relevance criteria: An exploratory study. Journal of the American Society for Information Science, 45, 149159.
  • Belkin, N. (2010). On the evaluation of interactive information retrieval systems. In The Janus Faced Scholar, a Festschrift in Honour of Peter Ingwersen. www.issi-society.info/peteringwersen/pif_online.pdf
  • Belkin, N. et al. (2009) Usefulness as the criterion for evaluation of interactive information retrieval. in Proceedings of the third Workshop on Human-Computer Interaction and Information Retrieval. http://cuaslis.iorg/hcir2009
  • Borlund, P. (2003a). The concept of relevance in IR. Journal of the American Society for Information Science, 54, 913925.
  • Borlund, P. (2003b). The IIR evaluation model: a framework for evaluation of interactive information retrieval systems. Information Research, 8, from http://informationr.net/ir/8-3/paper152.html
  • Burke, M.J. & Dunlap, W.P. (2002). Estimating interrater agreement with the average deviation index: A user's guide. Organizational Research Methods. 5, 159172.
  • Byström, K. (2002). Information and information sources in tasks of varying complexity. Journal of the American Society for Information Science, 53, 581591.
  • Byström, K., & Hansen, P. (2005). Conceptual framework for tasks in information studies. Journal of the American Society for Information Science and Technology, 56, 10501061.
  • Byström, K., & Järvelin, K. (1995) Task Complexity Affects Information Seeking and Use. Information Processing & Management, 31, 191213.
  • Cooper, W. S. (1971). A definition of relevance for information retrieval. Information Storage and Retrieval, 7, 1937.
  • Cosijn, E., & Ingwersen, P. (2000). Dimensions of relevance. Information Processing & Management, 36, 533550.
  • Freund, L. (2008). Exploiting task-document relations in support of information retrieval in the workplace, (Doctoral dissertation).
  • University of Toronto, Toronto, Canada.
  • Hansen, P. (2005). Work task information-seeking and retrieval processes. In K.Fisher, S.Erdelez & L.McKechnie (Eds.), Theories of Information Behavior (pp. 392396). Medford, NJ: ASIST.
  • Haraldsen, M., Päivärinta, T., Sein, M. K., & Stray, T. D. (2004, November). Developing e-government portals: from life-events through genres to requirements. Paper Presented at the 11th Norwegian Conference on Information Systems, Stavanger, Norway.
  • Harter, S.P. (1996). Variations in relevance assessments and the measurement of retrieval effectiveness. Journal of the American Society for Information Science. 47, 3749.
  • Hert, C. A., & Marchionini, G. (1997). Seeking statistical information in federal websites: users, tasks, strategies and design recommendations. Final Report to the United States Bureau of Labor Statistics.
  • Horrigan, J. B. (2004). How Americans get in touch with government. Washington, DC: Pew Internet & American Life Project.
  • Järvelin, K., & Ingwersen, P. (2004). Information seeking research needs extension towards tasks and technology. Information Research, 10, from http://InformationR.net/ir/10-1/paper212.html
  • Kelly, D. (2006). Measuring online information seeking context, part 1: background and method. Journal of the American Society for Information Science and Technology, 57, 17291739.
  • Kelly, D., Murdock, V., Yuan, X., Croft, W. B., &
  • Belkin, N. J. (2002). Features of documents relevant to task and fact-oriented questions. In Proceedings of the eleventh international conference on Information and knowledge management. McLean, VA: ACM.
  • Kim, S., & Soergel, D. (2005). Selecting and measuring task characteristics as independent variables. In Proceedings of the 68th Annual ASIS&T Meeting, Charlotte, NC (Vol. 42). Medford, NJ: Information Today.
  • Larsen, B., Malik, S. & Tombros, A. (2008). A Comparison of Interactive and Ad-Hoc Relevance Assessments. In Focused Access to XML Documents (pp. 348358). Berlin: Springer.
  • Larsen, E., & Rainie, L. (2002). The rise of the e-citizen: how people use government agencies' Web sites. Washington, D.C.: Pew Internet & American Life Project.
  • LeBreton, J.M. & Senter, J.L. (2008). Answers to 20 questions about interrater reliability and interrater agreement. Organizational Research Methods, 11, 815851.  
  • Li, Y. (2004, November). Task type and a faceted classification of tasks. Poster Presented at the American Society of Information Science and Technology, Providence, RI.
  • Mitchinson, T., & Ratner, M. (2004). Promoting Transparency through the Electronic Dissemination of Information. In E. L.Oliver, & L.Sanders (Eds.), E-Government Reconsidered: Renewal of Governance for the Knowledge Age (pp. 89105). Regina, SK: Saskatchewan Institute of Public Policy/Canadian Plains Research Center.
  • Mizzaro, S. (1998). How many relevances in information retrieval? Interacting with Computers, 10, 305322.
  • Murdoch, V., Kelly, D., Croft, W. B., Belkin, N., Yuan, X.
  • Identifying and improving retrieval for procedural questions. Information Processing and Management, 43, 181203.
  • Park, S. Y. (1993). The nature of relevance in information retrieval. Library Quarterly, 63 (3), 318351.
  • Pirolli, P. (2007). Information foraging theory: Adaptive interaction with information. Oxford: Oxford University Press.
  • Saracevic, T. (1996). Relevance reconsidered. In P. Ingwersen & N. O. Pors (Eds.), Information science: Integration in perspective; Proceedings of CoLIS, the 2nd international conference on conceptions of library and information science, Copenhagen, October 13–16
  • (pp. 201218). Copenhagen: Royal School of Librarianship.
  • Schamber, L. (1991). User's criteria for evaluation in multimedia information seeking and use situations (Doctoral dissertation, Syracuse University): Dissertation Abstracts International, 52/12, AAT 9214390.
  • Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86 (2), 420428.
  • Tombros, A., Ruthven, I., & Jose, J. M. (2005). How users assess web pages for information seeking. Journal of the American Society for Information Science and Technology, 56, 327344.
  • Toms, E. G., O'Brien, H., Kopak, R., & Freund, L. (2005). Searching for relevance in the relevance of search. In F. Crestani & I. Ruthven (Eds.), Information context: nature, impact, and role: 5th International Conference on Conceptions of Library and Information Science, CoLIS 2005, Glasgow, UK, June 4–8. Berlin: Springer.
  • Toms, E. G., O'Brien, H., Mackenzie, T., Jordan, C., Freund, L., Toze, S., Dawe, E. & MacNutt, A. (2008) Task effects on interactive search: the query factor. In Proceedings of the Initiative for XML Retrieval (INEX) Workshop 2007.
  • Vakkari, P. (2003) Task-based information searching. Annual Review of Information Science and Technology 37, 413464.
  • Vintnar, M., & Leben, A. (2002). The concepts of an active life-event public portal. In Proceedings of the First International Conference on Electronic Government. Berlin: Springer.
  • Wang, P. (1994). A cognitive model of document selection of real users of IR systems (Doctoral dissertation). University of Maryland, College Park, MD.
  • Wildemuth, B.M. & Freund, L. (2009, October) Search tasks and their role in studies of search behaviors. Paper Presented at the Third Annual Workshop on Human Computer Interaction and Information Retrieval, Washington D.C.
  • Wilson, P. (1973). Situational relevance. Information Processing & Management, 9, 457471.
  • Xu, Y.C. & Chen, Z. (2006). Relevance judgement: what do information users consider beyond topicality? Journal of the American Society for Information Science and Technology, 57 (7), 961973.

APPENDIX

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. BACKGROUND
  5. E-GOVERNMENT
  6. RELEVANCE AND USEFULNESS
  7. ASSESSMENT CRITERIA
  8. TASK-BASED STUDIES OF DOCUMENT ASSESSMENT
  9. METHODS
  10. DESIGN
  11. SCENARIOS & DOCUMENTS
  12. INSTRUMENTS AND PROCEDURES
  13. PARTICIPANTS
  14. DATA ANALYSIS
  15. RESULTS
  16. ASSESSING USEFULNESS
  17. THE EFFECT OF TASK
  18. DISCUSSION
  19. CONCLUSION
  20. Acknowledgements
  21. REFERENCES
  22. APPENDIX

SAMPLE SEARCH SCENARIOS

(Deciding) Although it's generally understood that tap water is safe to drink throughout most of Canada, you've begun wondering whether this is a valid assumption. You're considering whether or not to start purchasing bottled water to drink, but need more information about how water quality is assessed, and what the potential risks of drinking tap water might be before you decide. Search for information to help you make this decision.

(Doing) An elderly uncle has had a stroke and is now confined to a wheelchair. He and your aunt want to continue to live in their own home, but would like to do some minor renovations to make it wheelchair accessible as well as safer and more convenient for them as they grow older. They have asked you to help them with the project. Search for information to guide you in the process of adapting the home to their needs.

(Fact-Finding) As a volunteer at a local museum, you have been asked to help prepare a public school workshop on Canadian weather disasters. You remember that Hurricane Juan hit Nova Scotia a few years ago, and you decide to prepare an information sheet for the students. Search for factual information about Hurricane Juan including where and when it hit and how much damage it caused.

(Learning) You've heard that the upcoming year will be an El Niña year. You've heard El Niña and El Niño mentioned many times on television and radio weather reports over the years, but you've never been exactly sure of what these terms mean. You want to learn more about them and how they impact weather patterns in different regions across Canada. Search for information to help you learn about these weather phenomena.

(Problem Solving) You live outside of the city and there is a small stream that runs through the back of your property. In the past, the stream has always run clear and supported an active community of minnows, frogs and other creatures. Over the past few months, you have noticed the water becoming increasingly murky and full of algae and you are concerned that the stream has become polluted. Search for information that would help you better understand and deal with this problem.