Use of Morphology in Language Models

Especially for morphologically rich languages, basing language models only on the surface form of words may lead to sparse data problems. For such languages it may be beneficial to make explicit use of the morphological analysis of words.

Morphological Language Models is the main subject of 3 publications. 3 are discussed here.

Topics in LanguageModels

N Gram Language Models | Targeted Language Models | Morphological Language Models | Very Large Language Models

Publications

Factored language models were also used in statistical machine translation systems

Kirchhoff, Katrin and Yang, Mei (2005): Improved Language Modeling for Statistical Machine Translation, Proceedings of the ACL Workshop on Building and Using Parallel Texts

(Kirchhoff and Yang, 2005). For morphologically rich languages, better language models may predict individual morphemes.

Sarikaya, Ruhi and Deng, Yonggang (2007): Joint Morphological-Lexical Language Modeling for Machine Translation, Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

Sarikaya and Deng (2007) propose a joint morphological-lexical language model that uses maximum entropy classifiers to predict each morpheme.

Factored translation models

Koehn, Philipp and Hoang, Hieu (2007): Factored Translation Models, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

mentioned in Factored Translation Models and Morphological Language Models

(Koehn and Hoang, 2007) allow for the use of sequence models over part-of-speech or morphological tags.

MT Research Survey Wiki

A Comprehensive Survey of Neural and Statistical Machine Translation Research Publications

Search Descriptions

Use of Morphology in Language Models

Publications

Benchmarks

Discussion

Related Topics

New Publications