Grammatical and context-sensitive error correction using a statistical machine translation framework

Authors


Nava Ehsan, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran.

E-mail: n.ehsan@ece.ut.ac.ir

SUMMARY

Producing electronic rather than paper documents has considerable benefits such as easier organizing and data management. Therefore, existence of automatic writing assistance tools such as spell and grammar checker/correctors can increase the quality of electronic texts by removing noise and correcting the erroneous sentences. Different kinds of errors in a text can be categorized into spelling, grammatical and real-word errors. In this article, we present a language-independent approach based on a statistical machine translation framework to develop a proofreading tool, which detects grammatical errors as well as context-sensitive spelling mistakes (real-word errors). A hybrid model for grammar checking is suggested by combining the mentioned approach with an existing rule-based grammar checker. Experimental results on both English and Persian languages indicate that the proposed statistical method and the rule-based grammar checker are complementary in detecting and correcting syntactic errors. The results of the hybrid grammar checker, applied to some English texts, show an improvement of about 24% with respect to the recall metric with almost similar value for precision. Experiments on real-world data set show that state-of-the-art results are achieved for grammar checking and context-sensitive spell checking for Persian language. Copyright © 2012 John Wiley & Sons, Ltd.

Ancillary