EM Training of Phrase Based Models
While the dominant method to obtain a phrase table is based on a given word alignment, we may also define a probabilistic model that explains the parallel corpus and then train this model directly with the expectation maximization (EM) algorithm.
Phrase Based Model EM is the main subject of 10 publications. 9 are discussed here.
Publications
The popular joint phrase model was proposed by
Marcu, Daniel and Wong, Daniel (2002):
A Phrase-Based, Joint Probability Model for Statistical Machine Translation, Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)
@inproceedings{Marcu:2002,
author = {Marcu, Daniel and Wong, Daniel},
title = {A Phrase-Based, Joint Probability Model for Statistical Machine Translation},
url = {
http://acl.ldc.upenn.edu/W/W02/W02-1018.pdf},
booktitle = {Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)},
month = {July},
address = {Philadelphia},
publisher = {Association for Computational Linguistics},
pages = {133--139},
year = 2002
}
Marcu and Wong (2002). The joint model may be improved by constraining it with alignment points from the intersection of IBM Model alignments
Birch, Alexandra and Callison-Burch, Chris and Osborne, Miles and Koehn, Philipp (2006):
Constraining the Phrase-Based, Joint Probability Statistical Translation Model, Proceedings on the Workshop on Statistical Machine Translation
@InProceedings{birch-EtAl:2006:WMT,
author = {Birch, Alexandra and Callison-Burch, Chris and Osborne, Miles and Koehn, Philipp},
title = {Constraining the Phrase-Based, Joint Probability Statistical Translation Model},
booktitle = {Proceedings on the Workshop on Statistical Machine Translation},
month = {June},
address = {New York City},
publisher = {Association for Computational Linguistics},
pages = {154--157},
url = {
http://www.aclweb.org/anthology/W/W06/W06-3123},
year = 2006
}
(Birch et al., 2006;
Alexandra Birch and Chris Callison-Burch and Miles Osborne (2006):
Constraining the Phrase-Based, Joint Probability Statistical Translation Model, 5th Conference of the Association for Machine Translation in the Americas (AMTA)
@InProceedings{Birch:2006:AMTA,
author = {Alexandra Birch and Chris Callison-Burch and Miles Osborne},
title = {Constraining the Phrase-Based, Joint Probability Statistical Translation Model},
url = {
http://acl.ldc.upenn.edu/W/W06/W06-3123.pdf},
googlescholar = {5660312688171842847},
booktitle = {5th Conference of the Association for Machine Translation in the Americas (AMTA)},
month = {August},
address = {Boston, Massachusetts},
year = 2006
}
Birch et al., 2006b) or by not strictly requiring a unique phrase alignment
Moore, Robert C. and Quirk, Chris (2007):
An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation, Proceedings of the Second Workshop on Statistical Machine Translation
@InProceedings{moore-quirk:2007:WMT,
author = {Moore, Robert C. and Quirk, Chris},
title = {An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation},
booktitle = {Proceedings of the Second Workshop on Statistical Machine Translation},
month = {June},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {112--119},
url = {
http://www.aclweb.org/anthology/W/W07/W07-0215},
year = 2007
}
(Moore and Quirk, 2007).
DeNero, John and Gillick, Dan and Zhang, James and Klein, Dan (2006):
Why Generative Phrase Models Underperform Surface Heuristics, Proceedings on the Workshop on Statistical Machine Translation
@InProceedings{denero-EtAl:2006:WMT,
author = {DeNero, John and Gillick, Dan and Zhang, James and Klein, Dan},
title = {Why Generative Phrase Models Underperform Surface Heuristics},
booktitle = {Proceedings on the Workshop on Statistical Machine Translation},
month = {June},
address = {New York City},
publisher = {Association for Computational Linguistics},
pages = {31--38},
url = {
http://www.aclweb.org/anthology/W/W06/W06-3105},
year = 2006
}
DeNero et al. (2006) point to some problems when using EM training with conditional probabilities.
Cherry, Colin and Lin, Dekang (2007):
Inversion Transduction Grammar for Joint Phrasal Translation Modeling, Proceedings of SSST, NAACL-HLT 2007 / AMTA Workshop on Syntax and Structure in Statistical Translation
@InProceedings{cherry-lin:2007:SSST,
author = {Cherry, Colin and Lin, Dekang},
title = {Inversion Transduction Grammar for Joint Phrasal Translation Modeling},
booktitle = {Proceedings of SSST, NAACL-HLT 2007 / AMTA Workshop on Syntax and Structure in Statistical Translation},
month = {April},
address = {Rochester, New York},
publisher = {Association for Computational Linguistics},
pages = {17--24},
url = {
http://www.aclweb.org/anthology/W/W07/W07-0403},
year = 2007
}
Cherry and Lin (2007) show that the ITG constraint helps the joint phrase model approach, partly by enabling a faster algorithm with less search errors. The phrase alignment problem is NP-complete
DeNero, John and Klein, Dan (2008):
The Complexity of Phrase Alignment Problems, Proceedings of ACL-08: HLT, Short Papers
@InProceedings{denero-klein:2008:ACLShort,
author = {DeNero, John and Klein, Dan},
title = {The Complexity of Phrase Alignment Problems},
booktitle = {Proceedings of ACL-08: HLT, Short Papers},
month = {June},
address = {Columbus, Ohio},
publisher = {Association for Computational Linguistics},
pages = {25--28},
url = {
http://www.aclweb.org/anthology/P/P08/P08-2007},
year = 2008
}
(DeNero and Klein, 2008).
Wuebker, Joern and Mauser, Arne and Ney, Hermann (2010):
Training Phrase Translation Models with Leaving-One-Out, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
@InProceedings{wuebker-mauser-ney:2010:ACL,
author = {Wuebker, Joern and Mauser, Arne and Ney, Hermann},
title = {Training Phrase Translation Models with Leaving-One-Out},
booktitle = {Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics},
month = {July},
address = {Uppsala, Sweden},
publisher = {Association for Computational Linguistics},
pages = {475--484},
url = {
http://www.aclweb.org/anthology/P10-1049},
year = 2010
}
Wuebker et al. (2010) use leave-one-out to overcome the problem of over-fitting when re-aligning the training data with a model that was obtained from it. They use the obtained alignment to re-estimate translation probabilities (similar to one iteration of EM). Using only the best derivation from the forced alignment drastically reduces the size of the phrase table but hurts performance
Germán Sanchis-Trilles and Daniel Ortiz-Martínez and Jes?s González-Rubio and Jorge González and Francisco Casacuberta (2011):
Bilingual segmentation for phrasetable pruning in Statistical Machine Translation, Proceedings of the 15th International Conference of the European Association for Machine Translation (EAMT)
mentioned in Phrase Based Model EM and Pruning Large Models@inproceedings{eamt11:Sanchis-Trilles,
author = {Germ{\'a}n Sanchis-Trilles and Daniel Ortiz-Mart{\'i}nez and Jes?s Gonz{\'a}lez-Rubio and Jorge Gonz{\'a}lez and Francisco Casacuberta},
title = {Bilingual segmentation for phrasetable pruning in Statistical Machine Translation},
url = {
http://www.mt-archive.info/EAMT-2011-Sanchis-Trilles.pdf},
googlescholar = {4525003709044849218},
pages = {257--264},
booktitle = {Proceedings of the 15th International Conference of the European Association for Machine Translation (EAMT)},
location = {Leuven, Belgium},
editor = {Mikel L. Forcada and Heidi Depraetere and Vincent Vandeghinste},
year = 2011
}
(Sanchis-Trilles et al., 2011).
Benchmarks
Discussion
Related Topics
New Publications
Mylonakis, Markos and Sima'an, Khalil (2008):
Phrase Translation Probabilities with ITG Priors and Smoothing as Learning Objective, Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing
mentioned in Phrase Based Model EM and Smoothing@InProceedings{mylonakis-simaan:2008:EMNLP,
author = {Mylonakis, Markos and Sima'an, Khalil},
title = {Phrase Translation Probabilities with {ITG} Priors and Smoothing as Learning Objective},
booktitle = {Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing},
month = {October},
address = {Honolulu, Hawaii},
publisher = {Association for Computational Linguistics},
pages = {630--639},
url = {
http://www.aclweb.org/anthology/D08-1066},
year = 2008
}
Mylonakis and Sima'an (2008)