Alternative Architectures
While the attentional sequence-to-sequence model is currently the dominant architecture for neural machine translation, other architectures have been explored.
Alternative Architectures is the main subject of 44 publications. 14 are discussed here.
Publications
Kalchbrenner, Nal and Blunsom, Phil (2013):
Recurrent Continuous Translation Models, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
@InProceedings{kalchbrenner-blunsom:2013:EMNLP,
author = {Kalchbrenner, Nal and Blunsom, Phil},
title = {Recurrent Continuous Translation Models},
booktitle = {Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing},
month = {October},
address = {Seattle, Washington, USA},
publisher = {Association for Computational Linguistics},
pages = {1700--1709},
url = {
http://www.aclweb.org/anthology/D13-1176},
year = 2013
}
Kalchbrenner and Blunsom (2013) build a comprehensive machine translation model by first encoding the source sentence with a convolutional neural network, and then generate the target sentence by reversing the process. A refinement of this was proposed by
Jonas Gehring and Michael Auli and David Grangier and Denis Yarats and Yann N. Dauphin (2017):
Convolutional Sequence to Sequence Learning, CoRR
@article{DBLP:journals/corr/GehringAGYD17,
author = {Jonas Gehring and Michael Auli and David Grangier and Denis Yarats and Yann N. Dauphin},
title = {Convolutional Sequence to Sequence Learning},
journal = {CoRR},
volume = {abs/1705.03122},
url = {
http://arxiv.org/abs/1705.03122},
timestamp = {Wed, 07 Jun 2017 14:41:12 +0200},
biburl = {
http://dblp.uni-trier.de/rec/bib/journals/corr/GehringAGYD17},
bibsource = {dblp computer science bibliography,
http://dblp.org},
year = 2017
}
Gehring et al. (2017) who use multiple convolutional layers in the encoder and the decoder that do not reduce the length of the encoded sequence but incorporate wider context with each layer.
Self Attention (Transformer)
Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, \L ukasz and Polosukhin, Illia (2017):
Attention is All you Need, Advances in Neural Information Processing Systems 30
mentioned in Alternative Architectures and Analysis And Visualization@incollection{NIPS2017-7181,
author = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, \L ukasz and Polosukhin, Illia},
title = {Attention is All you Need},
booktitle = {Advances in Neural Information Processing Systems 30},
editor = {I. Guyon and U. V. Luxburg and S. Bengio and H. Wallach and R. Fergus and S. Vishwanathan and R. Garnett},
pages = {5998--6008},
publisher = {Curran Associates, Inc.},
url = {
http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf},
year = 2017
}
Vaswani et al. (2017) replace the recurrent neural networks used in attentional sequence-to-sequence models with multiple self-attention layers (called Transformer), both for the encoder as well as the decoder.
Chen, Mia Xu and Firat, Orhan and Bapna, Ankur and Johnson, Melvin and Macherey, Wolfgang and Foster, George and Jones, Llion and Schuster, Mike and Shazeer, Noam and Parmar, Niki and Vaswani, Ashish and Uszkoreit, Jakob and Kaiser, Lukasz and Chen, Zhifeng and Wu, Yonghui and Hughes, Macduff (2018):
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
mentioned in Training and Alternative Architectures@InProceedings{P18-1008,
author = {Chen, Mia Xu and Firat, Orhan and Bapna, Ankur and Johnson, Melvin and Macherey, Wolfgang and Foster, George and Jones, Llion and Schuster, Mike and Shazeer, Noam and Parmar, Niki and Vaswani, Ashish and Uszkoreit, Jakob and Kaiser, Lukasz and Chen, Zhifeng and Wu, Yonghui and Hughes, Macduff},
title = {The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
publisher = {Association for Computational Linguistics},
pages = {76--86},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/P18-1008},
year = 2018
}
Chen et al. (2018) compare different configurations of Transformer or recurrent neural networks in the encoder and decoder, and report that many of the different quality gains are due to a handful of training tricks, and show better results with a Transformer encoder and a RNN decoder.
Emelin, Denis and Titov, Ivan and Sennrich, Rico (2019):
Widening the Representation Bottleneck in Neural Machine Translation with Lexical Shortcuts, Proceedings of the Fourth Conference on Machine Translation
@InProceedings{emelin-titov-sennrich:2019:WMT,
author = {Emelin, Denis and Titov, Ivan and Sennrich, Rico},
title = {Widening the Representation Bottleneck in Neural Machine Translation with Lexical Shortcuts},
booktitle = {Proceedings of the Fourth Conference on Machine Translation},
month = {August},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
pages = {102--115},
url = {
http://www.aclweb.org/anthology/W19-5211},
year = 2019
}
Emelin et al. (2019) claim a representation bottleneck in the self-attention layers that requires carrying through lexical features, preventing it from focusing on more complex features. They add shortcut connections from the initial embedding layer to each of the self-attention layers, in both encoder and decoder.
Dehghani et al. (2019) propose a variant, called Universal Transformers, that do not use a fixed number of processing layers, but a arbitrary long loop through a single processing layer.
Deeper Transformer Models
Naive implementations of deeper transformer models by just increasing number of encoder and decoder blocks leads to worse and sometimes catastrophic results.
Wu, Lijun and Wang, Yiren and Xia, Yingce and Tian, Fei and Gao, Fei and Qin, Tao and Lai, Jianhuang and Liu, Tie-Yan (2019):
Depth Growing for Neural Machine Translation, Proceedings of the 57th Conference of the Association for Computational Linguistics
@inproceedings{wu-etal-2019-depth,
author = {Wu, Lijun and Wang, Yiren and Xia, Yingce and Tian, Fei and Gao, Fei and Qin, Tao and Lai, Jianhuang and Liu, Tie-Yan},
title = {Depth Growing for Neural Machine Translation},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1558},
pages = {5558--5563},
year = 2019
}
Wu et al. (2019) first train a model with
n transformer blocks, then keep their parameters fixed and add
m additional blocks.
Bapna, Ankur and Chen, Mia and Firat, Orhan and Cao, Yuan and Wu, Yonghui (2018):
Training Deeper Neural Machine Translation Models with Transparent Attention, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
@inproceedings{D18-1338,
author = {Bapna, Ankur and Chen, Mia and Firat, Orhan and Cao, Yuan and Wu, Yonghui},
title = {Training Deeper Neural Machine Translation Models with Transparent Attention},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1338},
pages = {3028--3033},
year = 2018
}
Bapna et al. (2018) argue that earlier encoder layers may be lost and connect all encoder layers to the attention computation of the decoder.
Wang, Qiang and Li, Bei and Xiao, Tong and Zhu, Jingbo and Li, Changliang and Wong, Derek F. and Chao, Lidia S. (2019):
Learning Deep Transformer Models for Machine Translation, Proceedings of the 57th Conference of the Association for Computational Linguistics
@inproceedings{wang-etal-2019-learning,
author = {Wang, Qiang and Li, Bei and Xiao, Tong and Zhu, Jingbo and Li, Changliang and Wong, Derek F. and Chao, Lidia S.},
title = {Learning Deep Transformer Models for Machine Translation},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1176},
pages = {1810--1822},
year = 2019
}
Wang et al. (2019) successfully train deep transformer models with up to 30 layers by relocating the normalization step to the beginning of the block and by adding residual connections to all previous layers, not just the directly preceding one.
Document Context
Maruf, Sameen and Martins, André F. T. and Haffari, Gholamreza (2018):
Contextual Neural Model for Translating Bilingual Multi-Speaker Conversations, Proceedings of the Third Conference on Machine Translation: Research Papers
@inproceedings{W18-6311,
author = {Maruf, Sameen and Martins, Andr{\'e} F. T. and Haffari, Gholamreza},
title = {Contextual Neural Model for Translating Bilingual Multi-Speaker Conversations},
booktitle = {Proceedings of the Third Conference on Machine Translation: Research Papers},
month = {oct},
address = {Belgium, Brussels},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W18-6311},
pages = {101--112},
year = 2018
}
Maruf et al. (2018) consider the entire source document as context when translating a sentence. Attention is computed over all input sentences and the sentences are weighted accordingly.
Miculicich, Lesly and Ram, Dhananjay and Pappas, Nikolaos and Henderson, James (2018):
Document-Level Neural Machine Translation with Hierarchical Attention Networks, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
@inproceedings{D18-1325,
author = {Miculicich, Lesly and Ram, Dhananjay and Pappas, Nikolaos and Henderson, James},
title = {Document-Level Neural Machine Translation with Hierarchical Attention Networks},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1325},
pages = {2947--2954},
year = 2018
}
Miculicich et al. (2018) extend this work with hierarchical attention which first computes attention over sentences and then over words. Due to computational problems, this is limited to a window of surrounding sentences.
Maruf, Sameen and Martins, André F. T. and Haffari, Gholamreza (2019):
Selective Attention for Context-aware Neural Machine Translation, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
@inproceedings{maruf-etal-2019-selective,
author = {Maruf, Sameen and Martins, Andr{\'e} F. T. and Haffari, Gholamreza},
title = {Selective Attention for Context-aware Neural Machine Translation},
booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
month = {jun},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/N19-1313},
pages = {3092--3102},
year = 2019
}
Maruf et al. (2019) also use hierarchical attention but compute sentence-level attention over the entire document and filters out the most relevant sentences before extending attention over words. A gate distinguishes between words in the source sentence and words in the context sentences.
Junczys-Dowmunt, Marcin (2019):
Microsoft Translator at WMT 2019: Towards Large-Scale Document-Level Neural Machine Translation, Proceedings of the Fourth Conference on Machine Translation (Shared Task Papers)
@InProceedings{junczysdowmunt:2019:WMT,
author = {Junczys-Dowmunt, Marcin},
title = {Microsoft Translator at WMT 2019: Towards Large-Scale Document-Level Neural Machine Translation},
booktitle = {Proceedings of the Fourth Conference on Machine Translation (Shared Task Papers)},
month = {August},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
year = 2019
}
Junczys-Dowmunt (2019) translates entire source documents (up to 1000 words) at a time by concatenating all input sentences, showing significant improvements.
Benchmarks
Discussion
Related Topics
New Publications
Self-Attention
Xu, Mingzhou and Wong, Derek F. and Yang, Baosong and Zhang, Yue and Chao, Lidia S. (2019):
Leveraging Local and Global Patterns for Self-Attention Networks, Proceedings of the 57th Conference of the Association for Computational Linguistics
@inproceedings{xu-etal-2019-leveraging,
author = {Xu, Mingzhou and Wong, Derek F. and Yang, Baosong and Zhang, Yue and Chao, Lidia S.},
title = {Leveraging Local and Global Patterns for Self-Attention Networks},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1295},
pages = {3069--3075},
year = 2019
}
Xu et al. (2019)
Miculicich Werlen, Lesly and Pappas, Nikolaos and Ram, Dhananjay and Popescu-Belis, Andrei (2018):
Self-Attentive Residual Decoder for Neural Machine Translation, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
@InProceedings{N18-1124,
author = {Miculicich Werlen, Lesly and Pappas, Nikolaos and Ram, Dhananjay and Popescu-Belis, Andrei},
title = {Self-Attentive Residual Decoder for Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)},
publisher = {Association for Computational Linguistics},
pages = {1366--1379},
location = {New Orleans, Louisiana},
url = {
http://aclweb.org/anthology/N18-1124},
year = 2018
}
Werlen et al. (2018)
Transformer
Hao, Jie and Wang, Xing and Yang, Baosong and Wang, Longyue and Zhang, Jinfeng and Tu, Zhaopeng (2019):
Modeling Recurrence for Transformer, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
@inproceedings{hao-etal-2019-modeling,
author = {Hao, Jie and Wang, Xing and Yang, Baosong and Wang, Longyue and Zhang, Jinfeng and Tu, Zhaopeng},
title = {Modeling Recurrence for Transformer},
booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
month = {jun},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/N19-1122},
pages = {1198--1207},
year = 2019
}
Hao et al. (2019) - recurrence
Hideya Mino and Andrew Finch and Eiichiro Sumita (2017):
A Target Attention Model for Neural Machine Translation, Machine Translation Summit XVI
@inproceedings{mtsummit2017:Mino,
author = {Hideya Mino and Andrew Finch and Eiichiro Sumita},
title = {A Target Attention Model for Neural Machine Translation},
booktitle = {Machine Translation Summit XVI},
location = {Nagoya, Japan},
year = 2017
}
Mino et al. (2017) - target attention
Zhang, Biao and Xiong, Deyi and jinsong (2018):
Accelerating Neural Transformer via an Average Attention Network, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
@InProceedings{P18-1166,
author = {Zhang, Biao and Xiong, Deyi and jinsong},
title = {Accelerating Neural Transformer via an Average Attention Network},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
publisher = {Association for Computational Linguistics},
pages = {1789--1798},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/P18-1166},
year = 2018
}
Zhang et al. (2018) - average attention
Multi-Layer Fusion
Wang, Qiang and Li, Fuxue and Xiao, Tong and Li, Yanyang and Li, Yinqiao and Zhu, Jingbo (2018):
Multi-layer Representation Fusion for Neural Machine Translation, Proceedings of the 27th International Conference on Computational Linguistics
@inproceedings{C18-1255,
author = {Wang, Qiang and Li, Fuxue and Xiao, Tong and Li, Yanyang and Li, Yinqiao and Zhu, Jingbo},
title = {Multi-layer Representation Fusion for Neural Machine Translation},
booktitle = {Proceedings of the 27th International Conference on Computational Linguistics},
month = {aug},
address = {Santa Fe, New Mexico, USA},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/C18-1255},
pages = {3015--3026},
year = 2018
}
Wang et al. (2018)
Weakly Recurrent
Mattia A. Di Gangi and Marcello Federico (2018):
Deep Neural Machine Translation with Weakly-Recurrent Units, Proceedings of the 21st Annual Conference of the European Association for Machine Translation
@inproceedings{eamt18-DiGangi,
author = {Mattia A. Di~Gangi and Marcello Federico},
title = {Deep Neural Machine Translation with Weakly-Recurrent Units},
booktitle = {Proceedings of the 21st Annual Conference of the European Association for Machine Translation},
location = {Alicante, Spain},
year = 2018
}
Di Gangi and Federico (2018)
Weight Tying in Embeddings
Pappas, Nikolaos and Miculicich, Lesly and Henderson, James (2018):
Beyond Weight Tying: Learning Joint Input-Output Embeddings for Neural Machine Translation, Proceedings of the Third Conference on Machine Translation: Research Papers
@inproceedings{W18-6308,
author = {Pappas, Nikolaos and Miculicich, Lesly and Henderson, James},
title = {Beyond Weight Tying: Learning Joint Input-Output Embeddings for Neural Machine Translation},
booktitle = {Proceedings of the Third Conference on Machine Translation: Research Papers},
month = {oct},
address = {Belgium, Brussels},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W18-6308},
pages = {73--83},
year = 2018
}
Pappas et al. (2018)
Kuang, Shaohui and Li, Junhui and Branco, António and Luo, Weihua and Xiong, Deyi (2018):
Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
@InProceedings{P18-1164,
author = {Kuang, Shaohui and Li, Junhui and Branco, Ant{\'o}nio and Luo, Weihua and Xiong, Deyi},
title = {Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
publisher = {Association for Computational Linguistics},
pages = {1767--1776},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/P18-1164},
year = 2018
}
Kuang et al. (2018)
Non-Autoregressive
Jiatao Gu and James Bradbury and Caiming Xiong and Victor O.K. Li and Richard Socher (2018):
Non-Autoregressive Neural Machine Translation, International Conference on Learning Representations
@inproceedings{gu2018nonautoregressive,
author = {Jiatao Gu and James Bradbury and Caiming Xiong and Victor O.K. Li and Richard Socher},
title = {Non-Autoregressive Neural Machine Translation},
booktitle = {International Conference on Learning Representations},
url = {
https://openreview.net/forum?id=B1l8BtlCb},
year = 2018
}
Gu et al. (2018)
Wei, Bingzhen and Wang, Mingxuan and Zhou, Hao and Lin, Junyang and Sun, Xu (2019):
Imitation Learning for Non-Autoregressive Neural Machine Translation, Proceedings of the 57th Conference of the Association for Computational Linguistics
@inproceedings{wei-etal-2019-imitation,
author = {Wei, Bingzhen and Wang, Mingxuan and Zhou, Hao and Lin, Junyang and Sun, Xu},
title = {Imitation Learning for Non-Autoregressive Neural Machine Translation},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1125},
pages = {1304--1312},
year = 2019
}
Wei et al. (2019)
Wang, Chunqi and Zhang, Ji and Chen, Haiqing (2018):
Semi-Autoregressive Neural Machine Translation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
@inproceedings{D18-1044,
author = {Wang, Chunqi and Zhang, Ji and Chen, Haiqing},
title = {Semi-Autoregressive Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1044},
pages = {479--488},
year = 2018
}
Wang et al. (2018)
Libovick\'y, Jindřich and Helcl, Jindřich (2018):
End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
@inproceedings{D18-1336,
author = {Libovick{\'y}, Jind{\v{r}}ich and Helcl, Jind{\v{r}}ich},
title = {End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1336},
pages = {3016--3021},
year = 2018
}
Libovick\'y and Helcl (2018)
Phrase Model
Po-Sen Huang and Chong Wang and Sitao Huang and Dengyong Zhou and Li Deng (2018):
Towards Neural Phrase-based Machine Translation, International Conference on Learning Representations
@inproceedings{huang2018towards,
author = {Po-Sen Huang and Chong Wang and Sitao Huang and Dengyong Zhou and Li Deng},
title = {Towards Neural Phrase-based Machine Translation},
booktitle = {International Conference on Learning Representations},
url = {
https://openreview.net/forum?id=HktJec1RZ},
year = 2018
}
Huang et al. (2018)
Convolutional
Lukasz Kaiser and Aidan N. Gomez and Francois Chollet (2018):
Depthwise Separable Convolutions for Neural Machine Translation, International Conference on Learning Representations
@inproceedings{kaiser2018depthwise,
author = {Lukasz Kaiser and Aidan N. Gomez and Francois Chollet},
title = {Depthwise Separable Convolutions for Neural Machine Translation},
booktitle = {International Conference on Learning Representations},
url = {
https://openreview.net/forum?id=S1jBcueAb},
year = 2018
}
Kaiser et al. (2018)
Neural Hidden Markov
Wang, Weiyue and Zhu, Derui and Alkhouli, Tamer and Gan, Zixuan and Ney, Hermann (2018):
Neural Hidden Markov Model for Machine Translation, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
@InProceedings{P18-2060,
author = {Wang, Weiyue and Zhu, Derui and Alkhouli, Tamer and Gan, Zixuan and Ney, Hermann},
title = {Neural Hidden Markov Model for Machine Translation},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
publisher = {Association for Computational Linguistics},
pages = {377--382},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/P18-2060},
year = 2018
}
Wang et al. (2018)
Modelling Past and Future
Zheng, Zaixiang and Zhou, Hao and Huang, Shujian and Mou, Lili and Dai, Xinyu and Chen, Jiajun and Tu, Zhaopeng (2018):
Modeling Past and Future for Neural Machine Translation, Transactions of the Association for Computational Linguistics
@Article{Q18-1011,
author = {Zheng, Zaixiang and Zhou, Hao and Huang, Shujian and Mou, Lili and Dai, Xinyu and Chen, Jiajun and Tu, Zhaopeng},
title = {Modeling Past and Future for Neural Machine Translation},
journal = {Transactions of the Association for Computational Linguistics},
volume = {6},
pages = {145--157},
url = {
http://aclweb.org/anthology/Q18-1011},
year = 2018
}
Zheng et al. (2018)
Two-Dimensional
Bahar, Parnia and Brix, Christopher and Ney, Hermann (2018):
Towards Two-Dimensional Sequence to Sequence Model in Neural Machine Translation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
@inproceedings{D18-1335,
author = {Bahar, Parnia and Brix, Christopher and Ney, Hermann},
title = {Towards Two-Dimensional Sequence to Sequence Model in Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1335},
pages = {3009--3015},
year = 2018
}
Bahar et al. (2018)
Gated Memory
Cao, Qian and Xiong, Deyi (2018):
Encoding Gated Translation Memory into Neural Machine Translation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
@inproceedings{D18-1340,
author = {Cao, Qian and Xiong, Deyi},
title = {Encoding Gated Translation Memory into Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1340},
pages = {3042--3047},
year = 2018
}
Cao and Xiong (2018)
Exploiting Deep Representations
Dou, Zi-Yi and Tu, Zhaopeng and Wang, Xing and Shi, Shuming and Zhang, Tong (2018):
Exploiting Deep Representations for Neural Machine Translation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
@inproceedings{D18-1457,
author = {Dou, Zi-Yi and Tu, Zhaopeng and Wang, Xing and Shi, Shuming and Zhang, Tong},
title = {Exploiting Deep Representations for Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1457},
pages = {4253--4262},
year = 2018
}
Dou et al. (2018)
Addition/Subtraction
Zhang, Biao and Xiong, Deyi and Su, Jinsong and Lin, Qian and Zhang, Huiji (2018):
Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
@inproceedings{D18-1459,
author = {Zhang, Biao and Xiong, Deyi and Su, Jinsong and Lin, Qian and Zhang, Huiji},
title = {Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1459},
pages = {4273--4283},
year = 2018
}
Zhang et al. (2018)
Document-Level
Laura Jehl and Stefan Riezler (2018):
Document-Level Information as Side Constraints for Improved Neural Patent Translation, Annual Meeting of the Association for Machine Translation in the Americas (AMTA)
@inproceedings{AMTA2018-Jehl,
author = {Laura Jehl and Stefan Riezler},
title = {Document-Level Information as Side Constraints for Improved Neural Patent Translation},
booktitle = {Annual Meeting of the Association for Machine Translation in the Americas (AMTA)},
location = {Boston, USA},
year = 2018
}
Jehl and Riezler (2018)
Zhang, Jiacheng and Luan, Huanbo and Sun, Maosong and Zhai, Feifei and Xu, Jingfang and Zhang, Min and Liu, Yang (2018):
Improving the Transformer Translation Model with Document-Level Context, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
@inproceedings{D18-1049,
author = {Zhang, Jiacheng and Luan, Huanbo and Sun, Maosong and Zhai, Feifei and Xu, Jingfang and Zhang, Min and Liu, Yang},
title = {Improving the Transformer Translation Model with Document-Level Context},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1049},
pages = {533--542},
year = 2018
}
Zhang et al. (2018)
Voita, Elena and Sennrich, Rico and Titov, Ivan (2019):
When a Good Translation is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion, Proceedings of the 57th Conference of the Association for Computational Linguistics
@inproceedings{voita-etal-2019-good,
author = {Voita, Elena and Sennrich, Rico and Titov, Ivan},
title = {When a Good Translation is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1116},
pages = {1198--1212},
year = 2019
}
Voita et al. (2019)
Kuang, Shaohui and Xiong, Deyi (2018):
Fusing Recency into Neural Machine Translation with an Inter-Sentence Gate Model, Proceedings of the 27th International Conference on Computational Linguistics
@inproceedings{C18-1051,
author = {Kuang, Shaohui and Xiong, Deyi},
title = {Fusing Recency into Neural Machine Translation with an Inter-Sentence Gate Model},
booktitle = {Proceedings of the 27th International Conference on Computational Linguistics},
month = {aug},
address = {Santa Fe, New Mexico, USA},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/C18-1051},
pages = {607--617},
year = 2018
}
Kuang and Xiong (2018)
Wang, Mingxuan and Xie, Jun and Tan, Zhixing and Su, Jinsong and Xiong, Deyi and Bian, Chao (2018):
Neural Machine Translation with Decoding History Enhanced Attention, Proceedings of the 27th International Conference on Computational Linguistics
@inproceedings{C18-1124,
author = {Wang, Mingxuan and Xie, Jun and Tan, Zhixing and Su, Jinsong and Xiong, Deyi and Bian, Chao},
title = {Neural Machine Translation with Decoding History Enhanced Attention},
booktitle = {Proceedings of the 27th International Conference on Computational Linguistics},
month = {aug},
address = {Santa Fe, New Mexico, USA},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/C18-1124},
pages = {1464--1473},
year = 2018
}
Wang et al. (2018)
Tu, Zhaopeng and Liu, Yang and Shi, Shuming and Zhang, Tong (2018):
Learning to Remember Translation History with a Continuous Cache, Transactions of the Association for Computational Linguistics
@article{Q18-1029,
author = {Tu, Zhaopeng and Liu, Yang and Shi, Shuming and Zhang, Tong},
title = {Learning to Remember Translation History with a Continuous Cache},
journal = {Transactions of the Association for Computational Linguistics},
volume = {6},
url = {
https://www.aclweb.org/anthology/Q18-1029},
pages = {407--420},
year = 2018
}
Tu et al. (2018)
Maruf, Sameen and Haffari, Gholamreza (2018):
Document Context Neural Machine Translation with Memory Networks, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
@InProceedings{P18-1118,
author = {Maruf, Sameen and Haffari, Gholamreza},
title = {Document Context Neural Machine Translation with Memory Networks},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
publisher = {Association for Computational Linguistics},
pages = {1275--1284},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/P18-1118},
year = 2018
}
Maruf and Haffari (2018)
Sentence-Level Context
Wang, Xing and Tu, Zhaopeng and Wang, Longyue and Shi, Shuming (2019):
Exploiting Sentential Context for Neural Machine Translation, Proceedings of the 57th Conference of the Association for Computational Linguistics
@inproceedings{wang-etal-2019-exploiting,
author = {Wang, Xing and Tu, Zhaopeng and Wang, Longyue and Shi, Shuming},
title = {Exploiting Sentential Context for Neural Machine Translation},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1624},
pages = {6197--6203},
year = 2019
}
Wang et al. (2019)
End-to-end
Pouget-Abadie, Jean and Bahdanau, Dzmitry and van Merrienboer, Bart and Cho, Kyunghyun and Bengio, Yoshua (2014):
Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation, Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation
@InProceedings{pougetabadie-EtAl:2014:SSST-8,
author = {Pouget-Abadie, Jean and Bahdanau, Dzmitry and van Merrienboer, Bart and Cho, Kyunghyun and Bengio, Yoshua},
title = {Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation},
booktitle = {Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation},
month = {October},
address = {Doha, Qatar},
publisher = {Association for Computational Linguistics},
pages = {78--85},
url = {
http://www.aclweb.org/anthology/W14-4009},
year = 2014
}
Pouget-Abadie et al. (2014)
Felix Hill and Kyunghyun Cho and Sébastien Jean and Coline Devin and Yoshua Bengio (2014):
Embedding Word Similarity with Neural Machine Translation, CoRR
mentioned in Multilingual Multimodal Multitask and Alternative Architectures@article{DBLP:journals/corr/HillCJDB14a,
author = {Felix Hill and Kyunghyun Cho and S{\'{e}}bastien Jean and Coline Devin and Yoshua Bengio},
title = {Embedding Word Similarity with Neural Machine Translation},
journal = {CoRR},
volume = {abs/1412.6448},
url = {
http://arxiv.org/abs/1412.6448},
timestamp = {Thu, 01 Jan 2015 19:51:08 +0100},
biburl = {
http://dblp.uni-trier.de/rec/bib/journals/corr/HillCJDB14a},
bibsource = {dblp computer science bibliography,
http://dblp.org},
year = 2014
}
Hill et al. (2014)