Grammatical and context-sensitive error correction using a statistical machine translation framework


Nava Ehsan, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran.



Producing electronic rather than paper documents has considerable benefits such as easier organizing and data management. Therefore, existence of automatic writing assistance tools such as spell and grammar checker/correctors can increase the quality of electronic texts by removing noise and correcting the erroneous sentences. Different kinds of errors in a text can be categorized into spelling, grammatical and real-word errors. In this article, we present a language-independent approach based on a statistical machine translation framework to develop a proofreading tool, which detects grammatical errors as well as context-sensitive spelling mistakes (real-word errors). A hybrid model for grammar checking is suggested by combining the mentioned approach with an existing rule-based grammar checker. Experimental results on both English and Persian languages indicate that the proposed statistical method and the rule-based grammar checker are complementary in detecting and correcting syntactic errors. The results of the hybrid grammar checker, applied to some English texts, show an improvement of about 24% with respect to the recall metric with almost similar value for precision. Experiments on real-world data set show that state-of-the-art results are achieved for grammar checking and context-sensitive spell checking for Persian language. Copyright © 2012 John Wiley & Sons, Ltd.