Suffix Array Translation Models

Large translation models take a long time to train and often exceed the available working memory of current machines. Storing the word aligned parallel corpus in a suffix array and retrieving translation options on demand offer an alternative.

Suffix Arrays is the main subject of 11 publications. 7 are discussed here.

Topics in PhraseBasedModels

Publications

The translation table may be represented in a suffix array as proposed for a searchable translation memory

Chris Callison-Burch and Colin Bannard and Josh Schroeder (2005): A compact data structure for searchable translation memories, Proceedings of the 10th Conference of the European Association for Machine Translation (EAMT)

(Callison-Burch et al., 2005) and integrated into the decoder

Ying Zhang and Stephan Vogel (2005): An efficient phrase-to-phrase alignment model for arbitrarily long phrase and large corpora, Proceedings of the 10th Conference of the European Association for Machine Translation (EAMT)

(Zhang and Vogel, 2005).

Callison-Burch, Chris and Bannard, Colin and Schroeder, Josh (2005): Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases, Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05)

Callison-Burch et al. (2005) propose a suffix-tree structure to keep corpora in memory and extract phrase-translations on the fly.

Suffix arrays may also be used to quickly learn phrase alignments from a parallel corpus without the use of a word alignment

Paul McNamee and James Mayfield (2006): Translation of Multiword Expressions Using Parallel Suffix Arrays, 5th Conference of the Association for Machine Translation in the Americas (AMTA)

(McNamee and Mayfield, 2006). Related to this is the idea of prefix data structures for the translation which allow quicker access and storing the model on disk for on-demand retrieval of applicable translation options

Zens, Richard and Ney, Hermann (2007): Efficient Phrase-Table Representation for Machine Translation with Applications to Online MT and Speech Translation, Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

(Zens and Ney, 2007).

Hierarchical phrase based models may also be stored in such a way

Lopez, Adam (2007): Hierarchical Phrase-Based Translation with Suffix Arrays, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

(Lopez, 2007) and allow for much bigger models

Lopez, Adam (2008): Tera-Scale Translation Models via Pattern Matching, Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

(Lopez, 2008).

Benchmarks

Discussion

New Publications

Ulrich Germann (2015): Sampling Phrase Tables for the Moses Statistical Machine Translation System, The Prague Bulletin of Mathematical Linguistics
add
@article{pbml-104-Germann,
author = {Ulrich Germann},
title = {Sampling Phrase Tables for the Moses Statistical Machine Translation System},
pages = {39--50},
journal = {The Prague Bulletin of Mathematical Linguistics},
url = {http://ufal.mff.cuni.cz/pbml/104/art-germann.pdf},
volume = {104},
month = {October},
year = 2015
}
Germann (2015)
Michael Denkowski and Alon Lavie and Isabel Lacruz and Chris Dyer (2014): Real time adaptive machine translation: cdec and TransCenter, Proceedings of the Third workshop on post-editing technology and practice (WPTP-3)
add
@inproceedings{AMTA-2014-W2-Denkowski,
author = {Michael Denkowski and Alon Lavie and Isabel Lacruz and Chris Dyer},
title = {Real time adaptive machine translation: cdec and TransCenter},
pages = {123},
url = {http://www.mt-archive.info/10/AMTA-2014-W2-Denkowski.pdf},
booktitle = {Proceedings of the Third workshop on post-editing technology and practice (WPTP-3)},
location = {Vancouver, BC, Canada},
year = 2014
}
Denkowski et al. (2014)
Ulrich Germann (2014): Dynamic phrase tables for machine translation in an interactive post-editing scenario, Proceedings of the Workshop on interactive and adaptive machine translation
add
@inproceedings{AMTA-2014-W1-Germann,
author = {Ulrich Germann},
title = {Dynamic phrase tables for machine translation in an interactive post-editing scenario},
pages = {20-31},
url = {http://www.mt-archive.info/10/AMTA-2014-W1-Germann.pdf},
booktitle = {Proceedings of the Workshop on interactive and adaptive machine translation},
location = {Vancouver, BC, Canada},
year = 2014
}
Germann (2014)
Cromieres, Fabien and Kurohashi, Sadao (2011): Efficient retrieval of tree translation examples for Syntax-Based Machine Translation, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{cromieres-kurohashi:2011:EMNLP,
author = {Cromieres, Fabien and Kurohashi, Sadao},
title = {Efficient retrieval of tree translation examples for Syntax-Based Machine Translation},
booktitle = {Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing},
month = {July},
address = {Edinburgh, Scotland, UK.},
publisher = {Association for Computational Linguistics},
pages = {508--518},
url = {http://www.aclweb.org/anthology/D11-1047},
year = 2011
}
Cromieres and Kurohashi (2011)

MT Research Survey Wiki

A Comprehensive Survey of Neural and Statistical Machine Translation Research Publications

Search Descriptions

Suffix Array Translation Models

Publications

Benchmarks

Discussion

Related Topics

New Publications