EM Training of Phrase Based Models

While the dominant method to obtain a phrase table is based on a given word alignment, we may also define a probabilistic model that explains the parallel corpus and then train this model directly with the expectation maximization (EM) algorithm.

Phrase Based Model EM is the main subject of 10 publications. 9 are discussed here.

Topics in PhraseBasedModels

Publications

The popular joint phrase model was proposed by

Marcu, Daniel and Wong, Daniel (2002): A Phrase-Based, Joint Probability Model for Statistical Machine Translation, Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)

Marcu and Wong (2002). The joint model may be improved by constraining it with alignment points from the intersection of IBM Model alignments

Birch, Alexandra and Callison-Burch, Chris and Osborne, Miles and Koehn, Philipp (2006): Constraining the Phrase-Based, Joint Probability Statistical Translation Model, Proceedings on the Workshop on Statistical Machine Translation

(Birch et al., 2006;

Alexandra Birch and Chris Callison-Burch and Miles Osborne (2006): Constraining the Phrase-Based, Joint Probability Statistical Translation Model, 5th Conference of the Association for Machine Translation in the Americas (AMTA)

Birch et al., 2006b) or by not strictly requiring a unique phrase alignment

Moore, Robert C. and Quirk, Chris (2007): An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation, Proceedings of the Second Workshop on Statistical Machine Translation

(Moore and Quirk, 2007).

DeNero, John and Gillick, Dan and Zhang, James and Klein, Dan (2006): Why Generative Phrase Models Underperform Surface Heuristics, Proceedings on the Workshop on Statistical Machine Translation

DeNero et al. (2006) point to some problems when using EM training with conditional probabilities.

Cherry, Colin and Lin, Dekang (2007): Inversion Transduction Grammar for Joint Phrasal Translation Modeling, Proceedings of SSST, NAACL-HLT 2007 / AMTA Workshop on Syntax and Structure in Statistical Translation

Cherry and Lin (2007) show that the ITG constraint helps the joint phrase model approach, partly by enabling a faster algorithm with less search errors. The phrase alignment problem is NP-complete

DeNero, John and Klein, Dan (2008): The Complexity of Phrase Alignment Problems, Proceedings of ACL-08: HLT, Short Papers

(DeNero and Klein, 2008).

Wuebker, Joern and Mauser, Arne and Ney, Hermann (2010): Training Phrase Translation Models with Leaving-One-Out, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Wuebker et al. (2010) use leave-one-out to overcome the problem of over-fitting when re-aligning the training data with a model that was obtained from it. They use the obtained alignment to re-estimate translation probabilities (similar to one iteration of EM). Using only the best derivation from the forced alignment drastically reduces the size of the phrase table but hurts performance

Germán Sanchis-Trilles and Daniel Ortiz-Martínez and Jes?s González-Rubio and Jorge González and Francisco Casacuberta (2011): Bilingual segmentation for phrasetable pruning in Statistical Machine Translation, Proceedings of the 15th International Conference of the European Association for Machine Translation (EAMT)

mentioned in Phrase Based Model EM and Pruning Large Models

(Sanchis-Trilles et al., 2011).

Benchmarks

Discussion

New Publications

Mylonakis, Markos and Sima'an, Khalil (2008): Phrase Translation Probabilities with ITG Priors and Smoothing as Learning Objective, Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing mentioned in Phrase Based Model EM and Smoothing
add
@InProceedings{mylonakis-simaan:2008:EMNLP,
author = {Mylonakis, Markos and Sima'an, Khalil},
title = {Phrase Translation Probabilities with {ITG} Priors and Smoothing as Learning Objective},
booktitle = {Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing},
month = {October},
address = {Honolulu, Hawaii},
publisher = {Association for Computational Linguistics},
pages = {630--639},
url = {http://www.aclweb.org/anthology/D08-1066},
year = 2008
}
Mylonakis and Sima'an (2008)

MT Research Survey Wiki

A Comprehensive Survey of Neural and Statistical Machine Translation Research Publications

Search Descriptions

EM Training of Phrase Based Models

Publications

Benchmarks

Discussion

Related Topics

New Publications