Standard Article

Document Image Analysis and Recognition

  1. R. Manmatha

Published Online: 16 MAR 2009

DOI: 10.1002/9780470050118.ecse667

Wiley Encyclopedia of Computer Science and Engineering

Wiley Encyclopedia of Computer Science and Engineering

How to Cite

Manmatha, R. 2009. Document Image Analysis and Recognition. Wiley Encyclopedia of Computer Science and Engineering. 1022–1031.

Author Information

  1. University of Massachusetts, Amherst, Massachusetts

Publication History

  1. Published Online: 16 MAR 2009


Paper archives need to be converted to electronic form to enable the search and organization of documents. This article provides a brief introduction to the basic ideas and the remaining challenges. Essentially, this involves first scanning the image, then discovering the page layout to separate text segments and finally recognizing characters and words. The basics of scanning, image preprocessing steps, page segmentation and recognition are described for printed documents followed by a brief discussion of large vocabulary handwriting recognition. Other issues discussed include the detection of text against image backgrounds, language identification and datasets and evaluation. Modern good quality documents with printed fonts can be well recognized but poorer quality print recognition as well as handwriting recognition still remain major research challenges.


  • document imaging;
  • character recognition;
  • OCR;
  • handwriting recognition;
  • page segmentation