Vector Space Models of Word Meaning and Phrase Meaning: A Survey



Distributional models represent a word through the contexts in which it has been observed. They can be used to predict similarity in meaning, based on the distributional hypothesis, which states that two words that occur in similar contexts tend to have similar meanings. Distributional approaches are often implemented in vector space models. They represent a word as a point in high-dimensional space, where each dimension stands for a context item, and a word's coordinates represent its context counts. Occurrence in similar contexts then means proximity in space. In this survey we look at the use of vector space models to describe the meaning of words and phrases: the phenomena that vector space models address, and the techniques that they use to do so. Many word meaning phenomena can be described in terms of semantic similarity: synonymy, priming, categorization, and the typicality of a predicate's arguments. But vector space models can do more than just predict semantic similarity. They are a very flexible tool, because they can make use of all of linear algebra, with all its data structures and operations. The dimensions of a vector space can stand for many things: context words, or non-linguistic context like images, or properties of a concept. And vector space models can use matrices or higher-order arrays instead of vectors for representing more complex relationships. Polysemy is a tough problem for distributional approaches, as a representation that is learned from all of a word's contexts will conflate the different senses of the word. It can be addressed, using either clustering or vector combination techniques. Finally, we look at vector space models for phrases, which are usually constructed by combining word vectors. Vector space models for phrases can predict phrase similarity, and some argue that they can form the basis for a general-purpose representation framework for natural language semantics.