Query expansion using UMLS Tools for health information retrieval

Authors


Abstract

Four new automatic query expansion strategies based on UMLS Metathesaurus are proposed to improve the effectiveness of health information retrieval: String index with Concept expansion (SC), String index with Term expansion (ST), Word index with Concept expansion (WC), and Word index with Term expansion (WT). Results from a comparison evaluation study using Medline plus dataset indicated that 1) the Mean Average Precisions (MAPs) with term-level expansion are higher than those with concept level by 5.6% for 30 queries and 10.9% for short queries; 2) the MAPs based on the string index strategy are better than those based on the word index by 15.5% for 30 queries and 9.6% for short queries; and 3) the String index with Term expansion (ST) has the highest MAPs for both 30 queries and short queries. These results will help us better understand the effectiveness of different automatic query expansion strategies using UMLS Metathesaurus and further inform the design of future Healthcare IR system.

Introduction

Internet has become a preeminent resource for health information. More than 70,000 websites disseminate health information according to a report in 2001 (Cline & Haynes, 2001). In 2003 half of American adults have searched online for health information and this figure increases to eighty percent in 2006 (Fox & Fallow, 2003; Fox, 2006). Due to its complex and sophisticate terminologies, it has been widely acknowledged that health information seekers, especially the health consumers, have difficulty in searching information on Internet (Zeng et al., 2002; Soergel et al., 2004; Keselman, Browne & Kaufman, 2008).

Health information retrieval systems usually rely on users' correct query for accurate retrieval. It is users' responsibility to select appropriate words to represent their information needs. General Internet users, however, may be lack of such domain knowledge need to choose the right and all the related terminologies in order to formulate well-designed queries. This is particularly true for large, full-text databases which contain multiple expressions on the same concept (Ellen, 1994).

Query expansion is one technique used to solve vocabulary mismatch problems. By adding extra related words (e.g. synonyms) to users' initial inputs, the query expansion technique is able to help users retrieve potential relevant information not indicated in their initial queries. With the help of appropriate medical taxonomy or thesaurus tools such as Unified Medical Language System (UMLS), medical synonyms can be chosen and added automatically without requirements of related domain knowledge to formulate a comprehensive query.

In this paper we studied four automatic query expansion technologies (described in details in the methodology section) and compared their performances using traditional IR evaluation tools.

Related works

A typical challenge in health information retrieval is its complex and sophisticated terminology system. Using transaction log analysis and multidimensional-scaling technique, Zhang et al. visualized health subject based on query term co-occurrences. Significant differences between consumer health query-term usage and formal medical terminology have been revealed (Zhang et al., 2008). Efforts have been made to solve the vocabulary problem. Zeng et al. developed a Health Information Query Assistant (HIQuA) system to assist people formulating health-related queries (Zeng et al., 2006). In a pilot study, Plovnick & Zeng studied reformulation of consumer health queries with professional terminology. Their findings indicated that there is a trend towards increased precision when substituting professional terminology for lay terms, abbreviations and acronyms (Plovnick & Zeng, 2004). In order to help consumers overcome the mismatches in representations of health information, Soergel et al. proposed a framework to inform the design of an “interpretive layer” to mediate between lay and professional perspectives (Soergel et al., 2004). As a technique to improve effectiveness of information retrieval, query expansion has been studies for decades. As early as the 1987, Furnas et al. studied vocabulary problem in human-system communication and suggested that alias lists could improve success rates (Furnas et al., 1987). Finding appropriate words to expand initial queries is a key step in query expansion. Based on different approaches to find additional related words, Baeza-Yates & Ribeiro-Neto (1990) summarizes three major categories to implement query expansion: (a) approaches based on users' feedback; (b) approaches based on initially retrieved documents; and (c) approaches based on global information.

Aiming to address the issues of selection and weighting of additional search terms, Qiu & Frei presented a probabilistic query expansion model based on an automatically built similarity thesaurus. Their concept based expansion shows a notable improvement in retrieval effectiveness (Qiu & Frei, 1993). Mandar et al. provided an approach to refine the set of documents used in feedback so as to control the query drift problem which may happen during query expansion process via adhoc feedback (Mandar et al., 1998). Unlike constructing similarity thesaurus from adhoc feedback or local information, some query expansion studies have been done based on existing thesaurus (e.g. WordNet, UMLS Metathesaurus). Ellen examined the effectiveness of lexical query expansion in large and diverse TREC collection respectively and demonstrated their effectiveness. In their study WordNet synonym sets were used to represent concepts. Experimental results showed different effectiveness of query expansion according to different completed level of original queries. Query expansion performs better when a detailed original query is not supplied than when it is (Ellen, 1994). Interestingly, most casual users have a tendency to use short queries than complicated ones (Wang, Barry, & Yang, 2003).

Continuing the work done by Srinivasan on query expansion in MEDLINE, Aronson & Rindfleisch conduct a comparable experiment using UMLS Metathesaurus concepts. They expanded the original queries with Metathesaurus concepts found by MetaMap. Manual-assigned MeSH terms were indexed together with original document text. Their best results showed 14.1% improvement in average precision over baseline. They concluded that query expansion based on UMLS Metathesaurus is an effective method and comparable to the results of document feedback (Aronson & Rindfleisch, 1997). Hersh et al. assessed query expansion using thesaurus relationships and definitions in the UMLS Metathesaurus (Hersh et al., 2000). Eight different expansion strategies were adopted in their experiment including adding manual assigned Metathesaurus terms, synonym variants of a term, one level of children terms, all levels of children terms, one level of parent terms, all levels of parent terms, related terms and text from term definition. Their results showed degraded aggregate performance in every expansion. However, in some instances, improvement was seen against the baseline.

Unified Medical Language System (UMLS) is a long-term R&D project started at National Library of Medicine in 1986. This system consists of three knowledge sources – Metathesaurus, Semantic Network and Specialist Lexicon & Tools. UMLS tools have been widely used in a variety of research experiments, including Cross-Language information retrieval (David et al., 1998), automatic extracting information needs from clinical questions (Yu & Cao, 2008) and query expansion (Aronson & Rindfleisch, 1997; Hersh et al., 2000).

Even though the vocabulary problem exists widely in health information retrieval and query expansion using UMLS is a promising approach to solve that problem, few studies have been done to compare the performance of specific query expansion technology. In this study, we will explore a new approach to use UMLS and four new strategies to expand queries.

Research Methodology

We note that there are several factors accounting for the final outcome in an IR experiment: a) The indexing strategy; b) The IR system; c) The test collection; d) The queries submitted into experiment system. In this section we will propose a new automatic query expansion approach and analyze it from each of these four factors respectively.

Indexing Strategy

UMLS is utilized as the base for our automatic query expansion. UMLS constructs its knowledge structure by concepts. Metathesaurus is one part of the UMLS metadata structure in which synonymous terms are clustered into a concept. In order to facilitate users to get access to UMLS concepts, indexing systems are built up through which an input string can match up with corresponding concepts. In this experiment we use two kinds of indexing provided by UMLS Knowledge Source Server (UMLSKS) – a normalized string index and a normalized word index.

The normalization process involves breaking a string into its constituent words, lowercasing each word, converting each word to its uninflected form, and sorting the words in alphabetic order. Accordingly, the normalized string index connects the normalized form of a Metathesaurus string to all its related string, term, and concept identifiers. The normalized word index connects each individual normalized English word to all its related string, term, and concept identifiers. Finding concepts by normalized string method will normalize the input string to a set of string normalizations and then matched against the normalized forms found in normalized string index. This process will lead to a set of CUIs. Similarly, finding concepts by normalized word will normalize the input string to a set of string normalizations by splitting them on word boundaries. These words are then matched against the normalized forms found in the normalized word index for CUIs.

IR System

In this experiment, Lemur Toolkit (Croft et al., 2002) has been used to build experimental retrieval system. The Lemur Toolkit is an open-source toolkit designed to facilitate research in language modeling and information retrieval. It supports different automatic indexing strategies and a variety of retrieval models.

Our experiment program uses the UMLSKS web service API which provides developers with functions for retrieving Metathesaurus, Semantic Network, and SPECIALIST Lexicon data. In addition, for the Lemur indexing, we also 1) added a standard stoplist, 2) removed one-character words, and 3) used porter stemming.

Data Set

Test collection are from Medline plus. Three plain files store query description, document collection and relevant benchmark for queries respectively. There are totally 30 information needs in query description and each information need is described by a short paragraph of text (e.g. “the relationship of blood and cerebrospinal fluid oxygen concentrations or partial pressures. A method of interest is polarography.”). Document collection includes 1033 documents collected from Medline plus database.

Submitted Query

With respect to query formulation, five types of queries are submitted to experimental system: one serves as the baseline and the others are different expansion strategy designs we proposed for testing. We used two levels of expansion are technologies: concept and term level expansion. We also will test two relevant concept formulation strategies: finding concepts by normalized string and finding concepts by normalized word. As a result, there are four experiment designs:

  • Standard Baseline Query: The queries of baseline are keywords and phrases extracted from the query description by a domain knowledge expert. In this study we invited a biomedical PhD student for this work. The thirty baseline queries are attached in appendix in this paper.
  • String index with Concept expansion (SC): For each phrase and keyword in the baseline, concepts are extracted using normalized string method and then adding the returned concepts to the original query as new expanded query. This strategy is named SC for short.
  • String index with Term expansion (ST): For each phrase and keyword in the baseline, concepts are extracted using normalized string method and then adding the returned concepts as well as the terms clustered in corresponding concepts to the original query as new expanded query. This strategy is named ST for short.
  • Word index with Concept expansion (WC): For each phrase and keyword in the baseline, concepts are extracted using normalized word method and then adding the returned concepts to the original query as new expanded query. This strategy is named WC for short.
  • Word index with Term expansion (WT): For each phrase and keyword in the baseline, concepts are extracted using normalized word method and then adding the returned concepts as well as the terms clustered in corresponding concepts to the original query as new expanded query. This strategy is named WT for short.

Figure 1 illustrates the process of our indexing and query expansion. For performance evaluation, precisions and recalls are calculated for each run. Specifically, in this study we use Mean Average Precision and 11-point precision versus recall.

Figure 1.

Research Methodology Flow Chat

Results and Analysis

As we can expect, term level expansion adds more words to original queries than concept level does. Using the method of finding concepts by normalized word will return more concepts than that of finding concepts by normalized string. Average precision has been calculated for each of the 30 queries for each strategy and the results are listed in Table 1.

Table 1. Average Precision Summary of Each Expansion Strategy
original image

Compared to the baseline, for all 30 query runs with SC strategy, 26.67% (8 out of 30 queries) have better AP scores; 43.33% (13 out of 30 queries) have no changes. With ST strategy, 30% of queries (9 out of 30) have improved AP scores and 16.67% (5 out of 30) have the same. With WC, 16.67% of queries (5 out of 30) have improved AP scores and 23.3% (7 out of 30) have the same. With WT, 23.33% of queries (7 out of 30) yield improvements and 13.3% (4 out of 30) keep the same (Table 1).

Mean average precision has been calculated using the mean of the average precisions across all 30 queries.

Figure 2.

Mean Average Precision of All Runs

Figure 2 illustrates the Mean Average Precision (MAP) for the baseline and four query expansion strategies. Compared with the baseline, there is a slight improvement for ST (1.3%) but the other three strategies have declined MAP values (−3.05% for SC, −16.72% for WC, and −10.96% for WT, respectively).

Further investigating queries with more than three phrases (query 5, 14, 19 and 24), we can draw the mean average precision for these queries in Figure 3:

Figure 3.

Mean Average Precision of Long Queries

From Figure 3, we find that MAPs for all the expansion strategies decrease. Compared to the baseline, the declines are 3.24%, 11.28%, 34.01% and 49.39% for SC, ST, WC and WT, respectively. Similarly, we can draw the mean average precision for short queries (queries with less than three phrases)

Figure 4.

Mean Average Precision of Short Queries

From Figure 4, different from long queries, we find that MAPs decline only slightly for SC, WC, WT (−1.58% −10.2%, and −0.48%, respectively), but there exists an improvement of 9.05% for ST. We also find that the MAP for string index strategies (0.535 on average) is 9.6% better than that for word index strategies (0.488 on average); and the term level expansion (0.538 on average) is 10.9% better than the concept level expansion (0.485 on average). We picked up query 14 as an example to illustrate how different strategies impact on the average precision. 11-point precision approach will be used and interpolated value of each point is given by the maximum precision value between current point and next point. Assume rj ∈ {0, 0.1, 0.2,…, 1}, then precision at rj P(rj) is calculated by the following formula:

equation image(1)

5

Figure 5.

11-point Precision of Query 14

With respect to precision versus recall, we calculated the precision at the maximum recall point for each query and average them across all queries.

Figure 6.

Average Precision at Maximum Recall

Figure 7.

Average Recall

From figure 6 and figure 7, we find that the recalls of all four query expansion strategies are improved, particularly for term index strategies (30.0% for ST and 43.6% for WT). Usually such expansions will also result in lower precisions. The results in figure 6, however, indicate that precisions only slightly decline for SC (−0.65%), WC (−7.93%), and WT (−8.02%), but increase 6.86% for ST. In other words, at the maximum recall point, the ST strategy achieves an improvement of 30% on recall and 6.86% on precision.

Discussion

In this study we confirmed that there is no statistically significant improvement Mean Average Precision by applying UMLS Metathesaurus for query expansion. This is compatible with findings from the study by Hersh et al (2000) using MEDLINE test collection (OHSUMED). But we found that under certain circumstances, performance will be enhanced.

The study results indicate that three of the four query expansion strategies (SC, WC, and WT) did not improve the retrieval performance. Further analysis focusing on short queries and average precision/recall at maximum recall point concluded the same results. We also found that the MAP declines are particularly significant for long queries. That means adding more words to a relatively well-described long query might introduce more “noise” rather than helping improve the retrieval performance — this is particularly true for the word index strategies (WC and WT).

Query expansion on the term level using string index (ST), however, gained an average 1.3% improvement for 30 queries and 9.05% for short queries. At the maximum recall point, the average precision and recall enhancements are 6.9% and 30.0%. We also found that the decline of Mean Average Precision score is mainly due to the contribution of long queries. Adding more words to a well described long query will introduce more noise rather than improving its performance, particularly for word index approach. This result is compatible with the study by Voorhees (1994) using TREC collection and WordNet.

Conclusion/Future Work

In this study we proposed four new automatic query expansion strategies based on UMLS Metathesaurus for health information retrieval: String index with Concept expansion (SC), String index with Term expansion (ST), Word index with Concept expansion (WC), and Word index with Term expansion (WT). Results from the comparison evaluation study using Medline plus dataset indicated that under maximum recall, string index for term level expansion achieves better average precision (6.86%). In addition, the recalls for all four query expansion strategies are improved, particularly for term index strategies (30.0% for ST and 43.6% for WT).

These results will help us better understand the effectiveness of different automatic query expansion strategies using UMLS Metathesaurus and further inform the design of future Healthcare IR system.

In the future we will continue test our query expansion strategies using a more comprehensive dataset. We also would like to further explore the connection between terms expanded using UMLS Metathesaurus.

Acknowledgements

Many thanks to Mr. Xiaofan Luo for his helps in providing queries from the 30 topic descriptions in the MEDLINE test collection. Also thanks to those anonymous reviewers for their valuable comments.

Appendix

Appendix: Thirty queries used in the study

  • 1.crystalline lens; vertebrates
  • 2.blood oxygen; polarography
  • 3.electron microscopy; lung
  • 4.lung neoplasms
  • 5.fatty acid levels; transport; placenta; fetus
  • 6.ventricular septal defect; aortic regurgitation
  • 7.radioisotopes; pericardial effusioness; applications
  • 8.drug effects; pesticides; bone marrow
  • 9.induced hypothermia; surgery
  • 10.neoplasm immunology
  • 11.steroids; breast neoplasms
  • 12.azathioprine; systemic lupus erythematosus
  • 13.bacillus subtilis phages; transduction
  • 14.renal amyloidosis; tuberculosis; steroids; prednisone; prednisolone; kidney diseases
  • 15.homonymous hemianopsia; gerstmann's syndrome; agnosia
  • 16.separation anxiety; infancy; children
  • 17.nickel; nutrition
  • 18.organic selenium compounds; toxicity
  • 19.parathyroid hormone; kidney; phosphate excretion;
  • 20.somatotropin; bone metabolism; growth
  • 21.language development; infancy
  • 22.mycoplasma; prenatal diseases
  • 23.infantile autism
  • 24.compensatory renal hypertrophy; hypertrophy; hyperplasia; kidney; unilateral nephrectomy
  • 25.nephogenic diabetes insipidus; treatments; children;
  • 26.hydrocephalus; mechanism
  • 27.parasitic diseases; filaria
  • 28.palliation; cancer;
  • 29.liver pathology; hereditary implications
  • 30.hemophilia and christmas disease; pseudotumor formation

Ancillary