Document Image Analysis and Recognition
Published Online: 16 MAR 2009
Copyright © 2007 by John Wiley & Sons, Inc.
Wiley Encyclopedia of Computer Science and Engineering
How to Cite
Manmatha, R. 2009. Document Image Analysis and Recognition. Wiley Encyclopedia of Computer Science and Engineering. 1022–1031.
- Published Online: 16 MAR 2009
Paper archives need to be converted to electronic form to enable the search and organization of documents. This article provides a brief introduction to the basic ideas and the remaining challenges. Essentially, this involves first scanning the image, then discovering the page layout to separate text segments and finally recognizing characters and words. The basics of scanning, image preprocessing steps, page segmentation and recognition are described for printed documents followed by a brief discussion of large vocabulary handwriting recognition. Other issues discussed include the detection of text against image backgrounds, language identification and datasets and evaluation. Modern good quality documents with printed fonts can be well recognized but poorer quality print recognition as well as handwriting recognition still remain major research challenges.
- document imaging;
- character recognition;
- handwriting recognition;
- page segmentation