Using Wikipedia to make academic abstracts more readable

Authors


Reading is a primary means for people to get information. To read and understand a document, however, often requires readers to have a certain level of background knowledge. For example, people reading academic papers published by ASIST will more than likely need to have a background in digital libraries, information retrieval, and human computer interaction in order to completely understand them. It is common though for people to have incomplete background knowledge. In those cases, readers may often encounter text passages that they do not understand or would like to have additional information on in order to better grasp the document. These situations represent a specific information need where users require explanatory information to help them understand the document that they are reading.

On the Web, the traditional mechanism used for finding this type of information is the search engine. The typical search engine requires users to create and submit search queries and review lists of search results. While the search engine is certainly useful for some information tasks, it is quite disruptive for people who are reading academic abstracts. Using a search engine will effectively interrupt people from their reading and require them to start a search task. In other words, people stop their current task and start a new one for getting the information they need.

This paper describes a new reading tool, called Literary Mark. It retrieves information in order to help people when they are reading abstracts. Literary Mark is an extension to the Web browser. As shown in Figure 1, it allows readers to highlight the text passages that they want additional information on. The Wikipedia (www.wikipedia.org) article that is the most related to the passage in the context of the abstract that it occurs in is retrieved. This article is then displayed in a pop up box. Readers can glance at the article, expand the size of the pop up box, scroll through the entire article, and navigate its links without having to leave the abstract they are currently reading. Although the goal is to return a single relevant article, the pop up box also allows readers to look at a ranked list of other related Wikipedia articles, if so desired. Unlike systems that simply look up terms, such as Answers (www.answers.com), Literary Mark allows users to highlight any amount of text, including entire paragraphs, and uses the abstract to form the context for the search for relevant Wikipedia articles.

Figure 1.

Screenshot of Literary Mark. Here the user is reading the abstract of an ASIST conference paper. The phrase ‘implicit feedback’ has been highlighted and the Wikipedia article on relevance feedback was retrieved.

Interface Goals

By design, Literary Mark has three distinct advantages over the traditional search engine in retrieving information for users that are reading. The first is that no explicit query formulation is required; query formulation is the process where the user transcribes an information need into a search query. Explicit query formulation is not only disruptive, it often leads to iterative query refinement, where readers are modifying their queries and resubmitting them because they were not satisfy with the previous search results.

A second advantage is that the user context is included in the search. The task that the user is engaged in when using Literary Mark is reading academic abstracts. Their information needs are related to better understanding a particular passage in a given abstract. Consequently, the user task and the abstract form the context that Literary Mark uses in the search process. Traditional search engines do not have access to contextual information nor do they have prior knowledge of the user task. They must rely on the typical user query of 2–4 words (Markey, 2007). Advanced search features and forms provided by many search engines typically go unused by users.

A third advantage that Literary Mark offers is that it does not require readers to leave the abstract that they are reading. The new information is displayed in a pop up box on the page. Literary Mark displays the most related Wikipedia article in this pop up box instead of a list of search results however, users can also access such a list if they choose.

Current Status

Literary Mark strives to minimize the level of disruption that searching causes readers. Thus far, a prototype Firefox extension has been created. Preliminary work on a contextual retrieval algorithm for Literary Mark has already been done. Current algorithms that we have developed use the abstract and the highlighted text to search Wikipedia for a relevant article. Wikipedia is one of the largest encyclopedias with million articles currently available and is one of the most visited sites on the Web (Giles, 2005). Given that users of Literary Mark will be requesting explanatory information on text they highlight, Wikipedia is an ideal source for that information; future work will look at utilizing other knowledge bases. Using a controlled set of abstracts, passages, and relevance judgments, our top-performing algorithm returns a relevant Wikipedia article 77% of the time. Our goal is refine these algorithms in order achieve performance between 80–90%.

Future Work

The ongoing research addresses three research questions related to the usefulness of Literary Mark as a tool for people who are reading academic abstracts.

  • RQ1: Do users prefer using the reading tool prototype, Literary Mark, over a traditional keyword search engine for retrieving Wikipedia articles while reading abstracts?

  • RQ2: Does the use of the reading tool prototype, Literary Mark, improve the reported level of understanding that readers have while reading abstracts?

  • RQ3: Does the use of the reading tool prototype, Literary Mark, improve the reported level of confidence that readers have in their understanding while reading abstracts?

The first question has to deal with whether or not users will prefer Literary Mark to search engines while reading abstracts. The underlying hypothesis is that the disruption caused by explicit searching will negatively affect how people read an abstract thus they should prefer Literary Mark. The second and third questions related to the impact that this new tool will have on reading comprehension.

To answer these questions, a user study will be conducted. Each participant will be presented with a series of academic abstracts. They will be asked to read each abstract twice. After each reading, they will report how much they felt they understood and how confident they were in that understanding. Literary Mark will be enabled for participants for a random selection of the abstracts during the second reading. This study will assess the system's effectiveness (correctness of retrieval) and efficiency (the time it complete task), and user preferences.

Ancillary