Statistical semantics 

Linguistics
Theoretical linguistics
Phonetics
Phonology
Morphology
Syntax
Lexis
Semantics
Lexical semantics
Statistical semantics
Structural semantics
Prototype semantics
Pragmatics
Systemic functional linguistics
Applied linguistics
Language acquisition
Psycholinguistics
Neurolinguistics
Sociolinguistics
Linguistic anthropology
Generative linguistics
Cognitive linguistics
Computational linguistics
Descriptive linguistics
Historical linguistics
Comparative linguistics
Etymology
Stylistics
Prescription
Corpus linguistics
History of linguistics
List of linguists
Unsolved problems

Statistical semantics is the study of "how the statistical patterns of human word usage can be used to figure out what people mean, at least to a level sufficient for information access" (Furnas, 2006). How can we figure out what words mean, simply by looking at patterns of words in huge collections of text? What are the limits to this approach to understanding words?

Contents

History

The term Statistical Semantics was first used by Weaver (1955) in his well-known paper on machine translation. He argued that word sense disambiguation for machine translation should be based on the co-occurrence frequency of the context words near a given target word. The underlying assumption that "a word is characterized by the company it keeps" was advocated by J.R. Firth (1957). This assumption is known in Linguistics as the Distributional Hypothesis. Delavenay (1960) defined Statistical Semantics as "Statistical study of meanings of words and their frequency and order of recurrence." Furnas et al. (1983) is frequently cited as a foundational contribution to Statistical Semantics. An early success in the field was Latent Semantic Analysis.

Applications of statistical semantics

Research in Statistical Semantics has resulted in a wide variety of algorithms that use the Distributional Hypothesis to discover many aspects of semantics, by applying statistical techniques to large corpora:

Related fields

Statistical Semantics focuses on the meanings of common words and the relations between common words, unlike Text Mining, which tends to focus on whole documents, document collections, or named entities (names of people, places, and organizations). Statistical Semantics is a subfield of Computational linguistics and Natural language processing.

Many of the applications of Statistical Semantics (listed above) can also be addressed by lexicon-based algorithms, instead of the corpus-based algorithms of Statistical Semantics. One advantage of corpus-based algorithms is that they are typically not as labour-intensive as lexicon-based algorithms. Another advantage is that they are usually easier to adapt to new languages than lexicon-based algorithms. However, the best performance on an application is often achieved by combining the two approaches (Turney et al., 2003).

See also

External links

References