Factored Translation Models
Factored translation model use a richer representation of words, a vector of factors, instead of the typical encoding of words as unique tokens.
Factored Translation Models is the main subject of 17 publications. 8 are discussed here.
Publications
Factored translation models
Koehn, Philipp and Hoang, Hieu (2007):
Factored Translation Models, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)
mentioned in Factored Translation Models and Morphological Language Models@InProceedings{koehn-hoang:2007:EMNLP-CoNLL2007,
author = {Koehn, Philipp and Hoang, Hieu},
title = {Factored Translation Models},
booktitle = {Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)},
pages = {868--876},
url = {
http://www.aclweb.org/anthology/D/D07/D07-1091},
year = 2007
}
(Koehn and Hoang, 2007) allow the integration of syntactic features into the translation and reordering models. Already earlier work augmented phrase translation tables with part-of-speech information
Lioma, Christina and Ounis, Iadh (2005):
Deploying Part-of-Speech Patterns to Enhance Statistical Phrase-Based Machine Translation Resources, Proceedings of the ACL Workshop on Building and Using Parallel Texts
@InProceedings{lioma-ounis:2005:WPT,
author = {Lioma, Christina and Ounis, Iadh},
title = {Deploying Part-of-Speech Patterns to Enhance Statistical Phrase-Based Machine Translation Resources},
booktitle = {Proceedings of the ACL Workshop on Building and Using Parallel Texts},
month = {June},
address = {Ann Arbor, Michigan},
publisher = {Association for Computational Linguistics},
pages = {163--166},
url = {
http://www.aclweb.org/anthology/W/W05/W05-0830},
year = 2005
}
(Lioma and Ounis, 2005). The approach has been shown to be successful for integrating part-of-speech tags, word class factors
Wade Shen and Richard Zens and Nicola Bertoldi and Marcello Federico (2006):
The JHU workshop 2006 IWSLT system, Proc. of the International Workshop on Spoken Language Translation
@inproceedings{Shen:2006:IWSLT,
author = {Wade Shen and Richard Zens and Nicola Bertoldi and Marcello Federico},
title = {The {JHU} workshop 2006 {IWSLT} system},
url = {
http://20.210-193-52.unknown.qala.com.sg/archive/iwslt\_06/papers/slt6\_059.pdf},
googlescholar = {12775791387894450852},
month = {November},
booktitle = {Proc. of the International Workshop on Spoken Language Translation},
address = {Kyoto, Japan},
year = 2006
}
(Shen et al., 2006), CCG super-tags
Birch, Alexandra and Osborne, Miles and Koehn, Philipp (2007):
CCG Supertags in Factored Statistical Machine Translation, Proceedings of the Second Workshop on Statistical Machine Translation
@InProceedings{birch-osborne-koehn:2007:WMT,
author = {Birch, Alexandra and Osborne, Miles and Koehn, Philipp},
title = {{CCG} Supertags in Factored Statistical Machine Translation},
booktitle = {Proceedings of the Second Workshop on Statistical Machine Translation},
month = {June},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {9--16},
url = {
http://www.aclweb.org/anthology/W/W07/W07-0202},
year = 2007
}
(Birch et al., 2007), and morphological tags
Badr, Ibrahim and Zbib, Rabih and Glass, James (2008):
Segmentation for English-to-Arabic Statistical Machine Translation, Proceedings of ACL-08: HLT, Short Papers
mentioned in Generating Rich Morphology and Factored Translation Models@InProceedings{badr-zbib-glass:2008:ACLShort,
author = {Badr, Ibrahim and Zbib, Rabih and Glass, James},
title = {Segmentation for English-to-Arabic Statistical Machine Translation},
booktitle = {Proceedings of ACL-08: HLT, Short Papers},
month = {June},
address = {Columbus, Ohio},
publisher = {Association for Computational Linguistics},
pages = {153--156},
url = {
http://www.aclweb.org/anthology/P/P08/P08-2039},
year = 2008
}
(Badr et al., 2008) for language modeling. Complementary, the source side may be enriched with additional markup, for instance to better predict the right inflections in a morphologically richer output language
Avramidis, Eleftherios and Koehn, Philipp (2008):
Enriching Morphologically Poor Languages for Statistical Machine Translation, Proceedings of ACL-08: HLT
@InProceedings{avramidis-koehn:2008:ACLMain,
author = {Avramidis, Eleftherios and Koehn, Philipp},
title = {Enriching Morphologically Poor Languages for Statistical Machine Translation},
booktitle = {Proceedings of ACL-08: HLT},
month = {June},
address = {Columbus, Ohio},
publisher = {Association for Computational Linguistics},
pages = {763--770},
url = {
http://www.aclweb.org/anthology/P/P08/P08-1087},
year = 2008
}
(Avramidis and Koehn, 2008). More complex factored models for translating morphology have been explored for English–Czech translation
Bojar, Ondřej (2007):
English-to-Czech Factored Machine Translation, Proceedings of the Second Workshop on Statistical Machine Translation
@InProceedings{bojar:2007:WMT,
author = {Bojar, Ond\v{r}ej},
title = {{English-to-Czech} Factored Machine Translation},
booktitle = {Proceedings of the Second Workshop on Statistical Machine Translation},
month = {June},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {232--239},
url = {
http://www.aclweb.org/anthology/W/W07/W07-0235},
year = 2007
}
(Bojar, 2007).
Additional factors may also be considered in the language model.
Niehues, Jan and Herrmann, Teresa and Vogel, Stephan and Waibel, Alex (2011):
Wider Context by Using Bilingual Language Models in Machine Translation, Proceedings of the Sixth Workshop on Statistical Machine Translation
@InProceedings{niehues-EtAl:2011:WMT,
author = {Niehues, Jan and Herrmann, Teresa and Vogel, Stephan and Waibel, Alex},
title = {Wider Context by Using Bilingual Language Models in Machine Translation},
booktitle = {Proceedings of the Sixth Workshop on Statistical Machine Translation},
month = {July},
address = {Edinburgh, Scotland},
publisher = {Association for Computational Linguistics},
pages = {198--206},
url = {
http://www.aclweb.org/anthology/W11-2124},
year = 2011
}
Niehues et al. (2011) include aligned source words in the target word representation which enables so-called bilingual language models.
Benchmarks
Discussion
Related Topics
New Publications
Huet, Stéphane and Manishina, Elena and Lefèvre, Fabrice (2013):
Factored Machine Translation Systems for Russian-English, Proceedings of the Eighth Workshop on Statistical Machine Translation
@InProceedings{huet-manishina-lefevre:2013:WMT,
author = {Huet, St\'{e}phane and Manishina, Elena and Lef\`{e}vre, Fabrice},
title = {Factored Machine Translation Systems for {Russian-English}},
booktitle = {Proceedings of the Eighth Workshop on Statistical Machine Translation},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {154--157},
url = {
http://www.aclweb.org/anthology/W13-2218},
year = 2013
}
Huet et al. (2013)
Thoudam Doren Singh and Savaji Bandyopadhyay (2010):
Statistical Machine Translation of English-Manipuri using Morpho-syntactic and Semantic Information, Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas
@inproceedings{AMTA-2010-Doren,
author = {Thoudam Doren Singh and Savaji Bandyopadhyay},
title = {Statistical Machine Translation of {English-Manipuri} using Morpho-syntactic and Semantic Information},
url = {
http://www.mt-archive.info/AMTA-2010-Doren.pdf},
booktitle = {Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas},
location = {Denver, Colorado},
year = 2010
}
Singh and Bandyopadhyay (2010)
Thoudam Doren Singh and Savaji Bandyopadhyay (2010):
Statistical Machine Translation of English-Manipuri using Morpho-syntactic and Semantic Information, Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas
@inproceedings{AMTA-2010-Doren,
author = {Thoudam Doren Singh and Savaji Bandyopadhyay},
title = {Statistical Machine Translation of {English-Manipuri} using Morpho-syntactic and Semantic Information},
url = {
http://www.mt-archive.info/AMTA-2010-Doren.pdf},
booktitle = {Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas},
location = {Denver, Colorado},
year = 2010
}
Singh and Bandyopadhyay (2010)
Wang, Rui and Osenova, Petya and Simov, Kiril (2012):
Linguistically-Augmented Bulgarian-to-English Statistical Machine Translation Model, Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
@InProceedings{wang-osenova-simov:2012:ESIRMT-HyTra2012,
author = {Wang, Rui and Osenova, Petya and Simov, Kiril},
title = {Linguistically-Augmented Bulgarian-to-English Statistical Machine Translation Model},
booktitle = {Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)},
month = {April},
address = {Avignon, France},
publisher = {Association for Computational Linguistics},
pages = {119--128},
url = {
http://www.aclweb.org/anthology/W12-0116},
year = 2012
}
Wang et al. (2012)
Bojar, Ond"‰ˆÃ´ej and Jawaid, Bushra and Kamran, Amir (2012):
Probes in a Taxonomy of Factored Phrase-Based Models, Proceedings of the Seventh Workshop on Statistical Machine Translation
@InProceedings{bojar-jawaid-kamran:2012:WMT,
author = {Bojar, Ond"‰ˆÃ´ej and Jawaid, Bushra and Kamran, Amir},
title = {Probes in a Taxonomy of Factored Phrase-Based Models},
booktitle = {Proceedings of the Seventh Workshop on Statistical Machine Translation},
month = {June},
address = {Montreal, Canada},
publisher = {Association for Computational Linguistics},
pages = {299--306},
url = {
http://www.aclweb.org/anthology/W12-3135},
year = 2012
}
Bojar et al. (2012)
Philipp Koehn and Barry Haddow (2012):
Interpolated Backoff for Factored Translation Models, Proceedings of the Tenth Conference of the Association for Machine Translation in the Americas (AMTA)
@inproceedings{AMTA-2012-Koehn,
author = {Philipp Koehn and Barry Haddow},
title = {Interpolated Backoff for Factored Translation Models},
url = {
http://www.mt-archive.info/AMTA-2012-Koehn.pdf},
booktitle = {Proceedings of the Tenth Conference of the Association for Machine Translation in the Americas (AMTA)},
location = {San Diego, California},
year = 2012
}
Koehn and Haddow (2012)
Mauro Cettolo and Marcello Federico and Daniele Pighin and Nicola Bertoldi (2008):
Shallow-Syntax Phrase-Based Translation: Joint versus Factored String-to-Chunk Models, Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas (AMTA)
@inproceedings{amta08:Cettolo,
author = {Mauro Cettolo and Marcello Federico and Daniele Pighin and Nicola Bertoldi},
title = {Shallow-Syntax Phrase-Based Translation: Joint versus Factored String-to-Chunk Models},
url = {
http://www.mt-archive.info/AMTA-2008-Cettolo.pdf},
googlescholar = {9381342520987147632},
pages = {56--64},
booktitle = {Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas (AMTA)},
location = {Waikiki, Hawaii},
year = 2008
}
Cettolo et al. (2008)
Yeniterzi, Reyyan and Oflazer, Kemal (2010):
Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
@InProceedings{yeniterzi-oflazer:2010:ACL,
author = {Yeniterzi, Reyyan and Oflazer, Kemal},
title = {Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from {English} to Turkish},
booktitle = {Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics},
month = {July},
address = {Uppsala, Sweden},
publisher = {Association for Computational Linguistics},
pages = {454--464},
url = {
http://www.aclweb.org/anthology/P10-1047},
year = 2010
}
Yeniterzi and Oflazer (2010)
Yvette Graham and Josef van Genabith (2010):
Factor Templates for Factored Machine Translation Models, Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT)
@inproceedings{iwslt10:TP:graham,
author = {Yvette Graham and Josef {van Genabith}},
title = {{Factor Templates for Factored Machine Translation Models}},
url = {
http://www.mt-archive.info/IWSLT-2010-Graham.pdf},
googlescholar = {5544633306257077972},
editor = {Marcello Federico and Ian Lane and Michael Paul and Fran\c{c}ois Yvon},
booktitle = {Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT)},
pages = {275--282},
location = {Paris, France},
year = 2010
}
Graham and van Genabith (2010)
Rishøj, Christian and Søgaard, Anders (2011):
Factored Translation with Unsupervised Word Clusters, Proceedings of the Sixth Workshop on Statistical Machine Translation
@InProceedings{rishoj-sogaard:2011:WMT,
author = {Rish{\o}j, Christian and S{\o}gaard, Anders},
title = {Factored Translation with Unsupervised Word Clusters},
booktitle = {Proceedings of the Sixth Workshop on Statistical Machine Translation},
month = {July},
address = {Edinburgh, Scotland},
publisher = {Association for Computational Linguistics},
pages = {447--451},
url = {
http://www.aclweb.org/anthology/W11-2155},
year = 2011
}
Rishøj and Søgaard (2011)