Factored Translation Models

Factored translation model use a richer representation of words, a vector of factors, instead of the typical encoding of words as unique tokens.

Factored Translation Models is the main subject of 17 publications. 8 are discussed here.

Topics in LinguisticProblems

Publications

Factored translation models

Koehn, Philipp and Hoang, Hieu (2007): Factored Translation Models, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

mentioned in Factored Translation Models and Morphological Language Models

(Koehn and Hoang, 2007) allow the integration of syntactic features into the translation and reordering models. Already earlier work augmented phrase translation tables with part-of-speech information

Lioma, Christina and Ounis, Iadh (2005): Deploying Part-of-Speech Patterns to Enhance Statistical Phrase-Based Machine Translation Resources, Proceedings of the ACL Workshop on Building and Using Parallel Texts

(Lioma and Ounis, 2005). The approach has been shown to be successful for integrating part-of-speech tags, word class factors

Wade Shen and Richard Zens and Nicola Bertoldi and Marcello Federico (2006): The JHU workshop 2006 IWSLT system, Proc. of the International Workshop on Spoken Language Translation

(Shen et al., 2006), CCG super-tags

Birch, Alexandra and Osborne, Miles and Koehn, Philipp (2007): CCG Supertags in Factored Statistical Machine Translation, Proceedings of the Second Workshop on Statistical Machine Translation

(Birch et al., 2007), and morphological tags

Badr, Ibrahim and Zbib, Rabih and Glass, James (2008): Segmentation for English-to-Arabic Statistical Machine Translation, Proceedings of ACL-08: HLT, Short Papers

mentioned in Generating Rich Morphology and Factored Translation Models

(Badr et al., 2008) for language modeling. Complementary, the source side may be enriched with additional markup, for instance to better predict the right inflections in a morphologically richer output language

Avramidis, Eleftherios and Koehn, Philipp (2008): Enriching Morphologically Poor Languages for Statistical Machine Translation, Proceedings of ACL-08: HLT

(Avramidis and Koehn, 2008). More complex factored models for translating morphology have been explored for English–Czech translation

Bojar, Ondřej (2007): English-to-Czech Factored Machine Translation, Proceedings of the Second Workshop on Statistical Machine Translation

(Bojar, 2007).

Additional factors may also be considered in the language model.

Niehues, Jan and Herrmann, Teresa and Vogel, Stephan and Waibel, Alex (2011): Wider Context by Using Bilingual Language Models in Machine Translation, Proceedings of the Sixth Workshop on Statistical Machine Translation

Niehues et al. (2011) include aligned source words in the target word representation which enables so-called bilingual language models.

Benchmarks

Discussion

New Publications

Huet, Stéphane and Manishina, Elena and Lefèvre, Fabrice (2013): Factored Machine Translation Systems for Russian-English, Proceedings of the Eighth Workshop on Statistical Machine Translation
add
@InProceedings{huet-manishina-lefevre:2013:WMT,
author = {Huet, St\'{e}phane and Manishina, Elena and Lef\`{e}vre, Fabrice},
title = {Factored Machine Translation Systems for {Russian-English}},
booktitle = {Proceedings of the Eighth Workshop on Statistical Machine Translation},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {154--157},
url = {http://www.aclweb.org/anthology/W13-2218},
year = 2013
}
Huet et al. (2013)
Thoudam Doren Singh and Savaji Bandyopadhyay (2010): Statistical Machine Translation of English-Manipuri using Morpho-syntactic and Semantic Information, Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas
add
@inproceedings{AMTA-2010-Doren,
author = {Thoudam Doren Singh and Savaji Bandyopadhyay},
title = {Statistical Machine Translation of {English-Manipuri} using Morpho-syntactic and Semantic Information},
url = {http://www.mt-archive.info/AMTA-2010-Doren.pdf},
booktitle = {Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas},
location = {Denver, Colorado},
year = 2010
}
Singh and Bandyopadhyay (2010)
Thoudam Doren Singh and Savaji Bandyopadhyay (2010): Statistical Machine Translation of English-Manipuri using Morpho-syntactic and Semantic Information, Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas
add
@inproceedings{AMTA-2010-Doren,
author = {Thoudam Doren Singh and Savaji Bandyopadhyay},
title = {Statistical Machine Translation of {English-Manipuri} using Morpho-syntactic and Semantic Information},
url = {http://www.mt-archive.info/AMTA-2010-Doren.pdf},
booktitle = {Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas},
location = {Denver, Colorado},
year = 2010
}
Singh and Bandyopadhyay (2010)
Wang, Rui and Osenova, Petya and Simov, Kiril (2012): Linguistically-Augmented Bulgarian-to-English Statistical Machine Translation Model, Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
add
@InProceedings{wang-osenova-simov:2012:ESIRMT-HyTra2012,
author = {Wang, Rui and Osenova, Petya and Simov, Kiril},
title = {Linguistically-Augmented Bulgarian-to-English Statistical Machine Translation Model},
booktitle = {Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)},
month = {April},
address = {Avignon, France},
publisher = {Association for Computational Linguistics},
pages = {119--128},
url = {http://www.aclweb.org/anthology/W12-0116},
year = 2012
}
Wang et al. (2012)
Bojar, Ond"‰ˆÃ´ej and Jawaid, Bushra and Kamran, Amir (2012): Probes in a Taxonomy of Factored Phrase-Based Models, Proceedings of the Seventh Workshop on Statistical Machine Translation
add
@InProceedings{bojar-jawaid-kamran:2012:WMT,
author = {Bojar, Ond"‰ˆÃ´ej and Jawaid, Bushra and Kamran, Amir},
title = {Probes in a Taxonomy of Factored Phrase-Based Models},
booktitle = {Proceedings of the Seventh Workshop on Statistical Machine Translation},
month = {June},
address = {Montreal, Canada},
publisher = {Association for Computational Linguistics},
pages = {299--306},
url = {http://www.aclweb.org/anthology/W12-3135},
year = 2012
}
Bojar et al. (2012)
Philipp Koehn and Barry Haddow (2012): Interpolated Backoff for Factored Translation Models, Proceedings of the Tenth Conference of the Association for Machine Translation in the Americas (AMTA)
add
@inproceedings{AMTA-2012-Koehn,
author = {Philipp Koehn and Barry Haddow},
title = {Interpolated Backoff for Factored Translation Models},
url = {http://www.mt-archive.info/AMTA-2012-Koehn.pdf},
booktitle = {Proceedings of the Tenth Conference of the Association for Machine Translation in the Americas (AMTA)},
location = {San Diego, California},
year = 2012
}
Koehn and Haddow (2012)
Mauro Cettolo and Marcello Federico and Daniele Pighin and Nicola Bertoldi (2008): Shallow-Syntax Phrase-Based Translation: Joint versus Factored String-to-Chunk Models, Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas (AMTA)
add
@inproceedings{amta08:Cettolo,
author = {Mauro Cettolo and Marcello Federico and Daniele Pighin and Nicola Bertoldi},
title = {Shallow-Syntax Phrase-Based Translation: Joint versus Factored String-to-Chunk Models},
url = {http://www.mt-archive.info/AMTA-2008-Cettolo.pdf},
googlescholar = {9381342520987147632},
pages = {56--64},
booktitle = {Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas (AMTA)},
location = {Waikiki, Hawaii},
year = 2008
}
Cettolo et al. (2008)
Yeniterzi, Reyyan and Oflazer, Kemal (2010): Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
add
@InProceedings{yeniterzi-oflazer:2010:ACL,
author = {Yeniterzi, Reyyan and Oflazer, Kemal},
title = {Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from {English} to Turkish},
booktitle = {Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics},
month = {July},
address = {Uppsala, Sweden},
publisher = {Association for Computational Linguistics},
pages = {454--464},
url = {http://www.aclweb.org/anthology/P10-1047},
year = 2010
}
Yeniterzi and Oflazer (2010)
Yvette Graham and Josef van Genabith (2010): Factor Templates for Factored Machine Translation Models, Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT)
add
@inproceedings{iwslt10:TP:graham,
author = {Yvette Graham and Josef {van Genabith}},
title = {{Factor Templates for Factored Machine Translation Models}},
url = {http://www.mt-archive.info/IWSLT-2010-Graham.pdf},
googlescholar = {5544633306257077972},
editor = {Marcello Federico and Ian Lane and Michael Paul and Fran\c{c}ois Yvon},
booktitle = {Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT)},
pages = {275--282},
location = {Paris, France},
year = 2010
}
Graham and van Genabith (2010)
Rishøj, Christian and Søgaard, Anders (2011): Factored Translation with Unsupervised Word Clusters, Proceedings of the Sixth Workshop on Statistical Machine Translation
add
@InProceedings{rishoj-sogaard:2011:WMT,
author = {Rish{\o}j, Christian and S{\o}gaard, Anders},
title = {Factored Translation with Unsupervised Word Clusters},
booktitle = {Proceedings of the Sixth Workshop on Statistical Machine Translation},
month = {July},
address = {Edinburgh, Scotland},
publisher = {Association for Computational Linguistics},
pages = {447--451},
url = {http://www.aclweb.org/anthology/W11-2155},
year = 2011
}
Rishøj and Søgaard (2011)

MT Research Survey Wiki

A Comprehensive Survey of Neural and Statistical Machine Translation Research Publications

Search Descriptions

Factored Translation Models

Publications

Benchmarks

Discussion

Related Topics

New Publications