Phrase Based Models and Example Based Translation
A precursor to statistical machine translation, example based translation also attempts to construct translation by re-using parts of of translations in a pre-existing parallel corpus. The main points of difference is a less developed (or non-existant) probabilistic model and the goal to re-use as large chunks as possible.
Phrase Based Vs EBMT is the main subject of 12 publications. 8 are discussed here.
Publications
Phrase-based SMT is related to example-based machine translation
Harold L. Somers (1999):
Review Article: Example-based Machine Translation, Machine Translation
mentioned in Other Approaches and Phrase Based Vs EBMT@Article{EBMT,
author = {Harold L. Somers},
title = {Review Article: Example-based Machine Translation},
journal = {Machine Translation},
volume = {14},
pages = {113-157},
year = 1999
}
(Somers, 1999). Some recent systems blur the distinction between the two fields
Groves, Declan and Way, Andy (2005):
Hybrid Example-Based SMT: the Best of Both Worlds?, Proceedings of the ACL Workshop on Building and Using Parallel Texts
@InProceedings{groves-way:2005:WPT,
author = {Groves, Declan and Way, Andy},
title = {Hybrid Example-Based {SMT}: the Best of Both Worlds?},
booktitle = {Proceedings of the ACL Workshop on Building and Using Parallel Texts},
month = {June},
address = {Ann Arbor, Michigan},
publisher = {Association for Computational Linguistics},
pages = {183--190},
url = {
http://www.aclweb.org/anthology/W/W05/W05-0833},
year = 2005
}
(Groves and Way, 2005;
Michael Paul and Takao Doi and Young-Sook Hwang and Kenji Imamura and Hideo Okuma and Eiichiro Sumita (2005):
Nobody is Perfect: ATR's Hybrid Approach to Spoken Language Translation, Proc. of the International Workshop on Spoken Language Translation
@InProceedings{Paul:2005:iwslt,
author = {Michael Paul and Takao Doi and Young-Sook Hwang and Kenji Imamura and Hideo Okuma and Eiichiro Sumita},
title = {Nobody is Perfect: {ATR}'s Hybrid Approach to Spoken Language Translation},
url = {
http://20.210-193-52.unknown.qala.com.sg/archive/iwslt\_05/papers/slt5\_045.pdf},
googlescholar = {1484657609402106476},
booktitle = {Proc. of the International Workshop on Spoken Language Translation},
location = {Pittsburgh, PA, USA},
month = {October},
year = 2005
}
Paul et al., 2005;
Tinsley, John and Ma, Yanjun and Ozdowska, Sylwia and Way, Andy (2008):
MaTrEx: The DCU MT System for WMT 2008, Proceedings of the Third Workshop on Statistical Machine Translation
mentioned in Research Groups and Phrase Based Vs EBMT@InProceedings{tinsley-EtAl:2008:WMT,
author = {Tinsley, John and Ma, Yanjun and Ozdowska, Sylwia and Way, Andy},
title = {{MaTrEx}: The {DCU} {MT} System for {WMT} 2008},
booktitle = {Proceedings of the Third Workshop on Statistical Machine Translation},
month = {June},
address = {Columbus, Ohio},
publisher = {Association for Computational Linguistics},
pages = {171--174},
url = {
http://www.aclweb.org/anthology/W/W08/W08-0326},
year = 2008
}
Tinsley et al., 2008). Various combinations of methods from SMT and EBMT are explored by
Declan Groves and Andy Way (2006):
Hybridity in MT. Experiments on the Europarl Corpus, Proceedings of the 11th Conference of the European Association for Machine Translation (EAMT)
@InProceedings{Groves:2006:EAMT,
author = {Declan Groves and Andy Way},
title = {Hybridity in {MT}. {E}xperiments on the {E}uroparl Corpus},
url = {
http://doras.dcu.ie/15277/1/GrovesWay\_eamt\_06.pdf},
googlescholar = {10148756506593718247},
booktitle = {Proceedings of the 11th Conference of the European Association for Machine Translation (EAMT)},
month = {June},
address = {Oslo, Norway},
year = 2006
}
Groves and Way (2006). Statistical machine translation models may be used to select the best translation from several example-based systems
Paul, Michael and Sumita, Eiichiro (2006):
Exploiting Variant Corpora for Machine Translation, Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
@InProceedings{paul-sumita:2006:HLT-NAACL06-Short,
author = {Paul, Michael and Sumita, Eiichiro},
title = {Exploiting Variant Corpora for Machine Translation},
booktitle = {Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers},
month = {June},
address = {New York City, USA},
publisher = {Association for Computational Linguistics},
pages = {113--116},
url = {
http://www.aclweb.org/anthology/N/N06/N06-2029},
year = 2006
}
(Paul and Sumita, 2006). Along these lines, phrase-based models may be improved with dynamically constructing translations for unknown phrases by using similar phrases that differ in a word or two and inserting lexical translations for the mismatched words
He, Zhongjun and Liu, Qun and Lin, Shouxun (2008):
Partial Matching Strategy for Phrase-based Statistical Machine Translation, Proceedings of ACL-08: HLT, Short Papers
@InProceedings{he-liu-lin:2008:ACLShort,
author = {He, Zhongjun and Liu, Qun and Lin, Shouxun},
title = {Partial Matching Strategy for Phrase-based Statistical Machine Translation},
booktitle = {Proceedings of ACL-08: HLT, Short Papers},
month = {June},
address = {Columbus, Ohio},
publisher = {Association for Computational Linguistics},
pages = {161--164},
url = {
http://www.aclweb.org/anthology/P/P08/P08-2041},
year = 2008
}
(He et al., 2008).
Similar convergence takes place when combining statistical machine translation with translation memory, for instance by looking for similar sentences in the training data and replacing the mismatch with translation chosen with statistical translation methods
Sanjika Hewavitharana and Stephan Vogel and Alex Waibel (2005):
Augmenting a statistical translation system with a translation memory, Proceedings of the 10th Conference of the European Association for Machine Translation (EAMT)
@InProceedings{Hewavitharana:2005:EAMT,
author = {Sanjika Hewavitharana and Stephan Vogel and Alex Waibel},
title = {Augmenting a statistical translation system with a translation memory},
url = {
http://www.mt-archive.info/EAMT-2005-Hewavitharana.pdf},
googlescholar = {3811412998135090861},
booktitle = {Proceedings of the 10th Conference of the European Association for Machine Translation (EAMT)},
month = {May},
address = {Budapest},
year = 2005
}
(Hewavitharana et al., 2005).
Benchmarks
Discussion
Related Topics
New Publications
Jin'ichi Murakami and Takuya Nishimura and Masato Tokuhisa (2010):
Statistical Pattern-Based Machine Translation with Statistical French-English Machine Translation, Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT)
@inproceedings{iwslt10:EC:tottori,
author = {Jin'ichi Murakami and Takuya Nishimura and Masato Tokuhisa},
title = {{Statistical Pattern-Based Machine Translation with Statistical French-English Machine Translation}},
url = {
http://20.210-193-52.unknown.qala.com.sg/archive/iwslt\_10/papers/slta\_175.pdf},
googlescholar = {775342118208494131},
editor = {Marcello Federico and Ian Lane and Michael Paul and Fran\c{c}ois Yvon},
booktitle = {Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT)},
pages = {175--182},
location = {Paris, France},
year = 2010
}
Murakami et al. (2010)
Ma, Yanjun and He, Yifan and Way, Andy and van Genabith, Josef (2011):
Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies
@InProceedings{ma-EtAl:2011:ACL-HLT2011,
author = {Ma, Yanjun and He, Yifan and Way, Andy and van Genabith, Josef},
title = {Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies},
month = {June},
address = {Portland, Oregon, USA},
publisher = {Association for Computational Linguistics},
pages = {1239--1248},
url = {
http://www.aclweb.org/anthology/P11-1124},
year = 2011
}
Ma et al. (2011)
Jin'ichi Murakami and Masato Tokuhisa and Satoru Ikehara (2009):
Statistical machine translation adding pattern-based machine translation in Chinese-English translation, Proceedings of the International Workshop on Spoken Language Translation (IWSLT)
@inproceedings{IWSLT:2009:Murakami,
author = {Jin'ichi Murakami and Masato Tokuhisa and Satoru Ikehara},
title = {Statistical machine translation adding pattern-based machine translation in {C}hinese-{E}nglish translation},
url = {
http://www.mt-archive.info/IWSLT-2009-Murakami.pdf},
googlescholar = {5733501705653561224},
pages = {107--112},
booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)},
location = {Tokyo, Japan},
month = {December},
year = 2009
}
Murakami et al. (2009)
Dekai Wu (2005):
MT model space: statistical versus compositional versus example-based machine translation, Machine Translation
@article{MTJ:2005:Wu,
author = {Dekai Wu},
title = {MT model space: statistical versus compositional versus example-based machine translation},
url = {
http://www.cs.ust.hk/~dekai/library/WU\_Dekai/Wu\_MT\_2006.pdf},
googlescholar = {8417758514152746949},
pages = {213--227},
journal = {Machine Translation},
volume = {19},
number = {3--4},
month = {December},
year = 2005
}
Wu (2005)