Standard Article

Automatic concept identification in biomedical literature

Part 4. Bioinformatics

4.7. Structuring and Integrating Data

Short Specialist Review

  1. William H. Majoros

Published Online: 15 JAN 2005

DOI: 10.1002/047001153X.g408309

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics

How to Cite

Majoros, W. H. 2005. Automatic concept identification in biomedical literature. Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics. 4:4.7:86.

Author Information

  1. The Institute for Genomic Research, Rockville, MD, USA

Publication History

  1. Published Online: 15 JAN 2005

Abstract

While the potential benefits of automatically extracting useful knowledge from biomedical literature are seemingly many and great, nearly all of the techniques that have been considered for doing this rely on a central, nontrivial problem: that of unambiguously identifying the named entities that occur in a particular text. This problem has two parts: (1) identifying the precise boundaries of phrases in a text which refer to biomedical entities and (2) resolving those entity names to known concepts in a knowledge base such as an ontology. These two tasks represent the syntactic and semantic aspects of the problem of concept identification, and both are described in this article along with the currently known methods and their limitations.

Keywords:

  • concept identification;
  • disambiguation;
  • semantic resolution;
  • ontology;
  • lexicon;
  • part-of-speech tagger