N-Gram Language Models
All competitive statistical machine translation systems use n-gram language models that predict the probability of a word from a maximum window of proceeding words.
N Gram Language Models is the main subject of 7 publications. 6 are discussed here.
Publications
The most commonly used discount methods to smooth language models are proposed by
Irving J. Good (1953):
The population frequency of species and the estimation of population parameters, Biometrika
@Article{GoodTuring,
author = {Irving J. Good},
title = {The population frequency of species and the estimation of population parameters},
url = {
http://www.ling.upenn.edu/courses/cogs502/GoodTuring1953.pdf},
googlescholar = {2593334280557910938},
journal = {Biometrika},
volume = {40},
pages = {237--264},
year = 1953
}
Good (1953) — see also the description by
Gale and Sampson (1995) —,
Ian H. Witten and Timothy C. Bell (1991):
The zero-frequency problem: estimating the probabilities of novelevents in adaptive text compression, IEEE Transactions on Information Theory
@article{witten-bell,
author = {Ian H. Witten and Timothy C. Bell},
title = {The zero-frequency problem: estimating the probabilities of novelevents in adaptive text compression},
url = {
http://vuz.zaznai.ru/tw\_files2/urls\_5/27/d-26334/7z-docs/2.pdf},
googlescholar = {13874685675690416934},
journal = {IEEE Transactions on Information Theory},
volume = {37},
number = {4},
pages = {1085--1094},
year = 1991
}
Witten and Bell (1991), as well as
Reinhard Kneser and Hermann Ney (1995):
Improved Backing-Off for M-Gram Language Modeling, Proceedings of the IEEE International Conference on Accoustics, Speech and Signal Processing
@InProceedings{kneser-ney,
author = {Reinhard Kneser and Hermann Ney},
title = {Improved Backing-Off for M-Gram Language Modeling},
booktitle = {Proceedings of the IEEE International Conference on Accoustics, Speech and Signal Processing},
volume = {1},
year = 1995
}
Kneser and Ney (1995). A good introduction to the topic of language modelling is given by
Stanley F. Chen and Joshua Goodman (1998):
An Emprirical Study of Smoothing Techniques for Language Modeling @Techreport{ChenGoodman,
author = {Stanley F. Chen and Joshua Goodman},
title = {An Emprirical Study of Smoothing Techniques for Language Modeling},
url = {
http://acl.ldc.upenn.edu/P/P96/P96-1041.pdf},
googlescholar = {3282313242724057405},
month = {August},
institution = {Computer Science Group, Harvard University},
number = {TR-10-98},
year = 1998
}
Chen and Goodman (1998).
Instead of training language models, large corpora can be also exploited by checking if potential translations occur in them as sentences
Radu Soricut and Kevin Knight and Daniel Marcu (2002):
Using a large monolingual corpus to improve translation accuracy, Machine Translation: From Research to Real Users, 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002 Tiburon, CA, USA, October 6-12, 2002, Proceedings
@inproceedings{Soricut:2002,
author = {Radu Soricut and Kevin Knight and Daniel Marcu},
title = {Using a large monolingual corpus to improve translation accuracy},
editor = {Stephen D. Richardson},
booktitle = {Machine Translation: From Research to Real Users, 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002 Tiburon, CA, USA, October 6-12, 2002, Proceedings},
publisher = {Springer},
series = {Lecture Notes in Computer Science},
volume = {2499},
isbn = {3-540-44282-0},
bibsource = {DBLP,
http://dblp.uni-trier.de},
year = 2002
}
(Soricut et al., 2002).
Benchmarks
Discussion
Related Topics
New Publications
Markus Freitag and Minwei Feng and Matthias Huck and Stephan Peitz and Hermann Ney (2013):
Reverse Word Order Model, Machine Translation Summit XIV
@inproceedings{MTS2013-Freitag,
author = {Markus Freitag and Minwei Feng and Matthias Huck and Stephan Peitz and Hermann Ney},
title = {Reverse Word Order Model},
url = {
http://www.mt-archive.info/10/MTS-2013-Freitag.pdf},
pages = {159--166},
booktitle = {Machine Translation Summit XIV},
year = 2013
}
Freitag et al. (2013)