Get access

Statistical methods in language processing

Authors

  • Steven Abney

    Corresponding author
    1. Department of Linguistics, University of Michigan, Ann Arbor, MI 48109-1220, USA
    • Department of Linguistics, University of Michigan, Ann Arbor, MI 48109-1220, USA
    Search for more papers by this author

Abstract

The term statistical methods here refers to a methodology that has been dominant in computational linguistics since about 1990. It is characterized by the use of stochastic models, substantial data sets, machine learning, and rigorous experimental evaluation. The shift to statistical methods in computational linguistics parallels a movement in artificial intelligence more broadly. Statistical methods have so thoroughly permeated computational linguistics that almost all work in the field draws on them in some way. There has, however, been little penetration of the methods into general linguistics. The methods themselves are largely borrowed from machine learning and information theory. We limit attention to that which has direct applicability to language processing, though the methods are quite general and have many nonlinguistic applications.

Not every use of statistics in language processing falls under statistical methods as we use the term. Standard hypothesis testing and experimental design, for example, are not covered in this article. WIREs Cogni Sci 2011 2 315–322 DOI: 10.1002/wcs.111

For further resources related to this article, please visit the WIREs website

Ancillary