Use of Morphology in Language Models
Especially for morphologically rich languages, basing language models only on the surface form of words may lead to sparse data problems. For such languages it may be beneficial to make explicit use of the morphological analysis of words.
Morphological Language Models is the main subject of 3 publications. 3 are discussed here.
Publications
Factored language models were also used in statistical machine translation systems
Kirchhoff, Katrin and Yang, Mei (2005):
Improved Language Modeling for Statistical Machine Translation, Proceedings of the ACL Workshop on Building and Using Parallel Texts
@InProceedings{kirchhoff-yang:2005:WPT,
author = {Kirchhoff, Katrin and Yang, Mei},
title = {Improved Language Modeling for Statistical Machine Translation},
booktitle = {Proceedings of the ACL Workshop on Building and Using Parallel Texts},
month = {June},
address = {Ann Arbor, Michigan},
publisher = {Association for Computational Linguistics},
pages = {125--128},
url = {
http://www.aclweb.org/anthology/W/W05/W05-0821},
year = 2005
}
(Kirchhoff and Yang, 2005). For morphologically rich languages, better language models may predict individual morphemes.
Sarikaya, Ruhi and Deng, Yonggang (2007):
Joint Morphological-Lexical Language Modeling for Machine Translation, Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
@InProceedings{sarikaya-deng:2007:ShortPapers,
author = {Sarikaya, Ruhi and Deng, Yonggang},
title = {Joint Morphological-Lexical Language Modeling for Machine Translation},
booktitle = {Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers},
month = {April},
address = {Rochester, New York},
publisher = {Association for Computational Linguistics},
pages = {145--148},
url = {
http://www.aclweb.org/anthology/N/N07/N07-2037},
year = 2007
}
Sarikaya and Deng (2007) propose a joint morphological-lexical language model that uses maximum entropy classifiers to predict each morpheme.
Factored translation models
Koehn, Philipp and Hoang, Hieu (2007):
Factored Translation Models, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)
mentioned in Factored Translation Models and Morphological Language Models@InProceedings{koehn-hoang:2007:EMNLP-CoNLL2007,
author = {Koehn, Philipp and Hoang, Hieu},
title = {Factored Translation Models},
booktitle = {Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)},
pages = {868--876},
url = {
http://www.aclweb.org/anthology/D/D07/D07-1091},
year = 2007
}
(Koehn and Hoang, 2007) allow for the use of sequence models over part-of-speech or morphological tags.
Benchmarks
Discussion
Related Topics
New Publications