SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. The Experiment
  5. Conclusions
  6. References

Library Reference service can benefit from the retrieval from a question/answer (Q/A) databank where previously answered reference questions can be reused, either directly or with some modifications. However, the majority of search engines rely heavily on keyword matching, and we believe there are situations in the library reference environment where the relevancy or usefulness of a retrieved item may not be limited to whether there are some linguistic similarities between the queries and entries in the databank. For example, a user who wants to find the answer to a question regarding criticisms or reviews of a particular contemporary author (such as “John Smith”) may not find a good answer readily available from doing a keyword search using the author's name. However, if the search returns recommendations for reliable information resources regarding reviews or criticisms of contemporary authors (e.g. the NY Times Review Archive of Contemporary Authors), the user can go to these resources to search for this particular author “John Smith”. In this paper, an alternative retrieval system that offers a new way of finding useful information for library reference service is presented and discussed. An experiment is conducted where the retrieval engine searches for similar and/or useful question/answer pairs in a knowledge bank in response to a new query.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. The Experiment
  5. Conclusions
  6. References

In library science, it is very important to know where (i.e. what quality and reliable print or online resources) one can find the answers to reference questions. In fact, there are classes in Library Reference which focus on information resources the students can use to find reliable information resources to find the answers for their library users. It is also the case that answering reference questions is a time-consuming task; and the work can be made more efficient if there is a mechanism that allows librarians to consult or reuse reference questions that have previously been asked and answered. This leads to our efforts to find a way to provide a retrieval method that goes beyond the traditional approach to information retrieval which relies on linguistic compatibility, namely keyword matching. We present here a hybrid retrieval approach which returns items based on keyword matching as well as the potential recommendations for reliable information resources.

The Experiment

  1. Top of page
  2. Abstract
  3. Introduction
  4. The Experiment
  5. Conclusions
  6. References

The experiment has a Q/A bank of 916 question/answer pairs on the subject of Literature.

A categorization of the chosen subject (Literature) was derived based on research of existing taxonomies, iterative analyses of the /uestion/answer knowledge bank, and in consultation with some librarians. The categories and subcategories in this taxonomy formed the basis for representation of the question/answer pairs in the knowledge bank. Question/Answer pairs are coded based on which categori(es)/subcategori(es) they most likely belong to. These codings were used as attributes for the purpose of case-based similarity matching during the retrieval process. Questions were also indexed for keyword retrieval.

As discussed above, keyword matching is an important criterion for retrieval, but it should not be the only criteria in certain situations such as in the retrieval of useful reference question/answer (Q/A) pairs for reuse. Our method therefore employs a combination of keyword matching and category-based matching. The user can have 3 options: 1) enter some keyword(s) to use keyword search only, 2) select a subject category and search the QA databank for question/answer pairs belonging to the same category, and 3) enter keywords and select a category.

  • Keyword search uses standard Boolean retrieval model.

  • Category-based retrieval relies on the Case-Based Reasoning methodology where the assumption is that similar problems have similar solutions. In this case, questions are considered “problems” and answers are considered “solutions” (Schank. & Abelson, 1977; Weber et al., 2006). Questions are deemed similar if they belong to the same category, which can lead to information resources where the answers for similar questions can be found. Once the user selects a category, the system will search for questions that are most similar, i.e. belong to that same category.

  • The returns are based on the combination of scores from both keyword match and category-match

A test was conducted to evaluate this approach. A random sample of 30 questions in the knowledge bank was pulled for the test. The test questions were not removed from the knowledge bank and were used as the “perfect” matches; this means that when one of these questions was tested, itself should be among the top ten retrieved entries. Queries were formed based on the content of the selected questions. The retrieved question/answer pairs were evaluated for relevance by examining the answers to see if they offered hints or resources that could potentially help finding the answer to the posed query, namely the retrieved item could be considered useful by the user (the retrieved answer was a close match, and/or offered useful information resources for further investigation).

We used two metrics for our test: 1) precision where precision is defined as the percentage of relevant items retrieved vs. the total number of items retrieved, and 2) the number of useful retrieved results (note that here we use “useful” instead of the usual “relevant”). The 2nd metric is used to test our argument that the hybrid system may find useful similar matches that otherwise would not have been picked up via a keyword search.

The queries were run first using only the keyword search option, then again with both keyword and category search. The results showed that precision was much higher with the hybrid approach. The average for Keyword Matching was 11% and for Hybrid was 46%. This is not unexpected since keyword search tends to return many entries; quite a few of these are not really relevant. The hybrid search did better since more of the returned entries offer users the resources which they could use to find the answer (albeit they might not answer the questions directly). Those entries that were retrieved based on category match were clearly marked with a “C” to tell the user that these might be useful even if there was no word match.

Results for the second metric, number of returns that are potentially useful to the user, also favored the hybrid approach. These were retrieved results that either matched the question closely in linguistic content, or offered resources from which the answer could be found. Here the averages were 1.33 vs. 8.03 for the keyword matching approach and the hybrid approach respectively.

Conclusions

  1. Top of page
  2. Abstract
  3. Introduction
  4. The Experiment
  5. Conclusions
  6. References

We have presented an alternative approach to Q/A retrieval using a combination of keyword matching and category-based matching. The experimental results show that this approach can be useful in the library reference environment. In the experiment, the hybrid approach shows better results with higher precision and retrieves useful items which otherwise would not have been found by a keyword search.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. The Experiment
  5. Conclusions
  6. References
  • Schank, R. C. & Abelson, R. P. (1977). Scripts, Plans, Goals, and Understanding. Erlbaum.
  • Weber, R., Ashley, K. & Bruninghaus, S. (2006). Textual Case-Based Reasoning. The Knowledge Engineering Review, 20(3), 255260.