Attention Model
The currently dominant model in neural machine translation is the sequence-to-sequence model with attention.
Attention Model is the main subject of 31 publications. 8 are discussed here.
Publications
The attention model has its roots in a sequence-to-sequence model.
Cho, Kyunghyun and van Merrienboer, Bart and Bahdanau, Dzmitry and Bengio, Yoshua (2014):
On the Properties of Neural Machine Translation: Encoder--Decoder Approaches, Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation
@InProceedings{cho-EtAl:2014:SSST-8,
author = {Cho, Kyunghyun and van Merrienboer, Bart and Bahdanau, Dzmitry and Bengio, Yoshua},
title = {On the Properties of Neural Machine Translation: Encoder--Decoder Approaches},
booktitle = {Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation},
month = {October},
address = {Doha, Qatar},
publisher = {Association for Computational Linguistics},
pages = {103--111},
url = {
http://www.aclweb.org/anthology/W14-4012},
year = 2014
}
Cho et al. (2014) use recurrent neural networks for the approach.
Sutskever, Ilya and Vinyals, Oriol and Le, Quoc V. (2014):
Sequence to Sequence Learning with Neural Networks, Advances in Neural Information Processing Systems 27
@incollection{NIPS2014-5346,
author = {Sutskever, Ilya and Vinyals, Oriol and Le, Quoc V.},
title = {Sequence to Sequence Learning with Neural Networks},
booktitle = {Advances in Neural Information Processing Systems 27},
editor = {Z. Ghahramani and M. Welling and C. Cortes and N.D. Lawrence and K.Q. Weinberger},
pages = {3104--3112},
publisher = {Curran Associates, Inc.},
url = {
http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf},
year = 2014
}
Sutskever et al. (2014) use a LSTM (long short-term memory) network and reverse the order of the source sentence before decoding.
The seminal work by
Dzmitry Bahdanau and Kyunghyun Cho and Yoshua Bengio (2015):
Neural Machine Translation by Jointly Learning to Align and Translate, ICLR
@inproceedings{bahdanau:ICLR:2015,
author = {Dzmitry Bahdanau and Kyunghyun Cho and Yoshua Bengio},
title = {Neural Machine Translation by Jointly Learning to Align and Translate},
booktitle = {ICLR},
url = {
http://arxiv.org/pdf/1409.0473v6.pdf},
year = 2015
}
Bahdanau et al. (2015) adds an alignment model (so called "attention mechanism") to link generated output words to source words, which includes conditioning on the hidden state that produced the preceding target word. Source words are represented by the two hidden states of recurrent neural networks that process the source sentence left-to-right and right-to-left.
Luong, Thang and Pham, Hieu and Manning, Christopher D. (2015):
Effective Approaches to Attention-based Neural Machine Translation, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
@InProceedings{luong-pham-manning:2015:EMNLP,
author = {Luong, Thang and Pham, Hieu and Manning, Christopher D.},
title = {Effective Approaches to Attention-based Neural Machine Translation},
booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing},
month = {September},
address = {Lisbon, Portugal},
publisher = {Association for Computational Linguistics},
pages = {1412--1421},
url = {
http://aclweb.org/anthology/D15-1166},
year = 2015
}
Luong et al. (2015) propose variants to the attention mechanism (which they call "global" attention model) and also a hard-constraint attention model ("local" attention model) which is restricted to a Gaussian distribution around a specific input word.
To explicitly model the trade-off between source context (the input words) and target context (the already produced target words),
Zhaopeng Tu and Yang Liu and Zhengdong Lu and Xiaohua Liu and Hang Li (2016):
Context Gates for Neural Machine Translation, CoRR
@article{DBLP:journals/corr/TuLLLL16a,
author = {Zhaopeng Tu and Yang Liu and Zhengdong Lu and Xiaohua Liu and Hang Li},
title = {Context Gates for Neural Machine Translation},
journal = {CoRR},
volume = {abs/1608.06043},
url = {
http://arxiv.org/abs/1608.06043},
timestamp = {Fri, 02 Sep 2016 17:46:24 +0200},
biburl = {
http://dblp.uni-trier.de/rec/bib/journals/corr/TuLLLL16a},
bibsource = {dblp computer science bibliography,
http://dblp.org},
year = 2016
}
Tu et al. (2016) introduce an interpolation weight (called "context gate") that scales the impact of the (a) source context state and (b) the previous hidden state and the last word when predicting the next hidden state in the decoder.
Deep Models:
There are several various to add layers to the encoder and the decoder of he neural translation model.
Yonghui Wu and Mike Schuster and Zhifeng Chen and Quoc V. Le and Mohammad Norouzi and Wolfgang Macherey and Maxim Krikun and Yuan Cao and Qin Gao and Klaus Macherey and Jeff Klingner and Apurva Shah and Melvin Johnson and Xiaobing Liu and Lukasz Kaiser and Stephan Gouws and Yoshikiyo Kato and Taku Kudo and Hideto Kazawa and Keith Stevens and George Kurian and Nishant Patil and Wei Wang and Cliff Young and Jason Smith and Jason Riesa and Alex Rudnick and Oriol Vinyals and Greg Corrado and Macduff Hughes and Jeffrey Dean (2016):
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, CoRR
mentioned in Neural Network Models and Attention Model@article{DBLP:journals/corr/WuSCLNMKCGMKSJL16,
author = {Yonghui Wu and Mike Schuster and Zhifeng Chen and Quoc V. Le and Mohammad Norouzi and Wolfgang Macherey and Maxim Krikun and Yuan Cao and Qin Gao and Klaus Macherey and Jeff Klingner and Apurva Shah and Melvin Johnson and Xiaobing Liu and Lukasz Kaiser and Stephan Gouws and Yoshikiyo Kato and Taku Kudo and Hideto Kazawa and Keith Stevens and George Kurian and Nishant Patil and Wei Wang and Cliff Young and Jason Smith and Jason Riesa and Alex Rudnick and Oriol Vinyals and Greg Corrado and Macduff Hughes and Jeffrey Dean},
title = {Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation},
journal = {CoRR},
volume = {abs/1609.08144},
url = {
http://arxiv.org/abs/1609.08144.pdf},
timestamp = {Mon, 03 Oct 2016 17:51:10 +0200},
biburl = {
http://dblp.uni-trier.de/rec/bib/journals/corr/WuSCLNMKCGMKSJL16},
bibsource = {dblp computer science bibliography,
http://dblp.org},
year = 2016
}
Wu et al. (2016) first use the traditional bidirectional recurrent neural networks to compute input word representations and then refine them with several stacked recurrent layers.
Zhou, Jie and Cao, Ying and Wang, Xuguang and Li, Peng and Xu, Wei (2016):
Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation, Transactions of the Association for Computational Linguistics
@article{TACL865,
author = {Zhou, Jie and Cao, Ying and Wang, Xuguang and Li, Peng and Xu, Wei },
title = {Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation},
journal = {Transactions of the Association for Computational Linguistics},
volume = {4},
issn = {2307-387X},
url = {
https://transacl.org/ojs/index.php/tacl/article/view/863},
pages = {371--383},
keywords = {{}},
year = 2016
}
Zhou et al. (2016) alternate between forward and backward recurrent layers.
Miceli Barone, Antonio Valerio and Helcl, Jindřich and Sennrich, Rico and Haddow, Barry and Birch, Alexandra (2017):
Deep architectures for Neural Machine Translation, Proceedings of the Second Conference on Machine Translation, Volume 1: Research Paper
@InProceedings{micelibarone-EtAl:2017:WMT,
author = {Miceli Barone, Antonio Valerio and Helcl, Jind\v{r}ich and Sennrich, Rico and Haddow, Barry and Birch, Alexandra},
title = {Deep architectures for Neural Machine Translation},
booktitle = {Proceedings of the Second Conference on Machine Translation, Volume 1: Research Paper},
month = {September},
address = {Copenhagen, Denmark},
publisher = {Association for Computational Linguistics},
pages = {99--107},
url = {
http://www.aclweb.org/anthology/W17-4710},
year = 2017
}
Barone et al. (2017) show good results with 4 stacks and 2 deep transitions each for encoder and decoder, as well as alternating networks for the encoder. There are a large number of variations (including the use of skip connections, the choice of LSTM vs. GRU, number of layers of any type) that still need to be explored empirical for various data conditions.
Benchmarks
Discussion
Related Topics
New Publications
Indurthi, Sathish Reddy and Chung, Insoo and Kim, Sangha (2019):
Look Harder: A Neural Machine Translation Model with Hard Attention, Proceedings of the 57th Conference of the Association for Computational Linguistics
@inproceedings{indurthi-etal-2019-look,
author = {Indurthi, Sathish Reddy and Chung, Insoo and Kim, Sangha},
title = {Look Harder: A Neural Machine Translation Model with Hard Attention},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1290},
pages = {3037--3043},
year = 2019
}
Indurthi et al. (2019)
Mino, Hideya and Utiyama, Masao and Sumita, Eiichiro and Tokunaga, Takenobu (2017):
Key-value Attention Mechanism for Neural Machine Translation, Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
@inproceedings{mino-etal-2017-key,
author = {Mino, Hideya and Utiyama, Masao and Sumita, Eiichiro and Tokunaga, Takenobu},
title = {Key-value Attention Mechanism for Neural Machine Translation},
booktitle = {Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
month = {nov},
address = {Taipei, Taiwan},
publisher = {Asian Federation of Natural Language Processing},
url = {
https://www.aclweb.org/anthology/I17-2049},
pages = {290--295},
year = 2017
}
Mino et al. (2017)
Samee Ibraheem and Nicholas Altieri and John DeNero (2017):
Learning an Interactive Attention Policy for Neural Machine Translation, Machine Translation Summit XVI
@inproceedings{mtsummit2017:Ibraheem,
author = {Samee Ibraheem and Nicholas Altieri and John DeNero},
title = {Learning an Interactive Attention Policy for Neural Machine Translation},
booktitle = {Machine Translation Summit XVI},
location = {Nagoya, Japan},
year = 2017
}
Ibraheem et al. (2017)
Matïss Rikters and Mark Fishel (2017):
Confidence through Attention, Machine Translation Summit XVI
@inproceedings{mtsummit2017:Rikters,
author = {Mat{\"i}ss Rikters and Mark Fishel},
title = {Confidence through Attention},
booktitle = {Machine Translation Summit XVI},
location = {Nagoya, Japan},
year = 2017
}
Rikters and Fishel (2017)
Li, Xintong and Liu, Lemao and Tu, Zhaopeng and Shi, Shuming and Meng, Max (2018):
Target Foresight Based Attention for Neural Machine Translation, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
@InProceedings{N18-1125,
author = {Li, Xintong and Liu, Lemao and Tu, Zhaopeng and Shi, Shuming and Meng, Max},
title = {Target Foresight Based Attention for Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)},
publisher = {Association for Computational Linguistics},
pages = {1380--1390},
location = {New Orleans, Louisiana},
url = {
http://aclweb.org/anthology/N18-1125},
year = 2018
}
Li et al. (2018)
Malaviya, Chaitanya and Ferreira, Pedro and Martins, André F. T. (2018):
Sparse and Constrained Attention for Neural Machine Translation, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
@InProceedings{P18-2059,
author = {Malaviya, Chaitanya and Ferreira, Pedro and Martins, Andr{\'e} F. T.},
title = {Sparse and Constrained Attention for Neural Machine Translation},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
publisher = {Association for Computational Linguistics},
pages = {370--376},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/P18-2059},
year = 2018
}
Malaviya et al. (2018)
Shankar, Shiv and Garg, Siddhant and Sarawagi, Sunita (2018):
Surprisingly Easy Hard-Attention for Sequence to Sequence Learning, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
@inproceedings{D18-1065,
author = {Shankar, Shiv and Garg, Siddhant and Sarawagi, Sunita},
title = {Surprisingly Easy Hard-Attention for Sequence to Sequence Learning},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1065},
pages = {640--645},
year = 2018
}
Shankar et al. (2018)
Lin, Junyang and Sun, Xu and Ren, Xuancheng and Li, Muyu and Su, Qi (2018):
Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
@inproceedings{D18-1331,
author = {Lin, Junyang and Sun, Xu and Ren, Xuancheng and Li, Muyu and Su, Qi},
title = {Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1331},
pages = {2985--2990},
year = 2018
}
Lin et al. (2018)
Yang, Baosong and Wong, Derek F. and Xiao, Tong and Chao, Lidia S. and Zhu, Jingbo (2017):
Towards Bidirectional Hierarchical Representations for Attention-based Neural Machine Translation, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
@InProceedings{D17-1151,
author = {Yang, Baosong and Wong, Derek F. and Xiao, Tong and Chao, Lidia S. and Zhu, Jingbo},
title = {Towards Bidirectional Hierarchical Representations for Attention-based Neural Machine Translation},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},
publisher = {Association for Computational Linguistics},
pages = {1443--1452},
location = {Copenhagen, Denmark},
url = {
http://aclweb.org/anthology/D17-1151},
year = 2017
}
Yang et al. (2017)
Attention Model
Zhang, Jinchao and Wang, Mingxuan and Liu, Qun and Zhou, Jie (2017):
Incorporating Word Reordering Knowledge into Attention-based Neural Machine Translation, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
@InProceedings{zhang-EtAl:2017:Long3,
author = {Zhang, Jinchao and Wang, Mingxuan and Liu, Qun and Zhou, Jie},
title = {Incorporating Word Reordering Knowledge into Attention-based Neural Machine Translation},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {July},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
pages = {1524--1534},
url = {
http://aclweb.org/anthology/P17-1140},
year = 2017
}
Zhang et al. (2017)
Yu, Lei and Buys, Jan and Blunsom, Phil (2016):
Online Segment to Segment Neural Transduction, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
@InProceedings{yu-buys-blunsom:2016:EMNLP2016,
author = {Yu, Lei and Buys, Jan and Blunsom, Phil},
title = {Online Segment to Segment Neural Transduction},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {1307--1316},
url = {
https://aclweb.org/anthology/D16-1138},
year = 2016
}
Yu et al. (2016)
Huang, Po-Yao and Liu, Frederick and Shiang, Sz-Rung and Oh, Jean and Dyer, Chris (2016):
Attention-based Multimodal Neural Machine Translation, Proceedings of the First Conference on Machine Translation
@InProceedings{huang-EtAl:2016:WMT,
author = {Huang, Po-Yao and Liu, Frederick and Shiang, Sz-Rung and Oh, Jean and Dyer, Chris},
title = {Attention-based Multimodal Neural Machine Translation},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {639--645},
url = {
http://www.aclweb.org/anthology/W/W16/W16-2360},
year = 2016
}
Huang et al. (2016)
Mi, Haitao and Sankaran, Baskaran and Wang, Zhiguo and Ittycheriah, Abe (2016):
Coverage Embedding Models for Neural Machine Translation, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
@InProceedings{mi-EtAl:2016:EMNLP2016,
author = {Mi, Haitao and Sankaran, Baskaran and Wang, Zhiguo and Ittycheriah, Abe},
title = {Coverage Embedding Models for Neural Machine Translation},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {955--960},
url = {
https://aclweb.org/anthology/D16-1096},
year = 2016
}
Mi et al. (2016)
Calixto, Iacer and Stein, Daniel and Matusov, Evgeny and Lohar, Pintu and Castilho, Sheila and Way, Andy (2017):
Using Images to Improve Machine-Translating E-Commerce Product Listings., Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
@InProceedings{calixto-EtAl:2017:EACLshort,
author = {Calixto, Iacer and Stein, Daniel and Matusov, Evgeny and Lohar, Pintu and Castilho, Sheila and Way, Andy},
title = {Using Images to Improve Machine-Translating E-Commerce Product Listings.},
booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers},
month = {April},
address = {Valencia, Spain},
publisher = {Association for Computational Linguistics},
pages = {637--643},
url = {
http://www.aclweb.org/anthology/E17-2101},
year = 2017
}
Calixto et al. (2017)
Press, Ofir and Wolf, Lior (2017):
Using the Output Embedding to Improve Language Models, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
@InProceedings{press-wolf:2017:EACLshort,
author = {Press, Ofir and Wolf, Lior},
title = {Using the Output Embedding to Improve Language Models},
booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers},
month = {April},
address = {Valencia, Spain},
publisher = {Association for Computational Linguistics},
pages = {157--163},
url = {
http://www.aclweb.org/anthology/E17-2025},
year = 2017
}
Press and Wolf (2017)
Yang, Zichao and Hu, Zhiting and Deng, Yuntian and Dyer, Chris and Smola, Alex (2017):
Neural Machine Translation with Recurrent Attention Modeling, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
@InProceedings{yang-EtAl:2017:EACLshort1,
author = {Yang, Zichao and Hu, Zhiting and Deng, Yuntian and Dyer, Chris and Smola, Alex},
title = {Neural Machine Translation with Recurrent Attention Modeling},
booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers},
month = {April},
address = {Valencia, Spain},
publisher = {Association for Computational Linguistics},
pages = {383--387},
url = {
http://www.aclweb.org/anthology/E17-2061},
year = 2017
}
Yang et al. (2017)
Advanced Modelling
Tu, Zhaopeng and Liu, Yang and Lu, Zhengdong and Liu, Xiaohua and Li, Hang (2017):
Context Gates for Neural Machine Translation, Transactions of the Association for Computational Linguistics
@article{TACL948,
author = {Tu, Zhaopeng and Liu, Yang and Lu, Zhengdong and Liu, Xiaohua and Li, Hang },
title = {Context Gates for Neural Machine Translation},
journal = {Transactions of the Association for Computational Linguistics},
volume = {5},
keywords = {{}},
issn = {2307-387X},
url = {
https://transacl.org/ojs/index.php/tacl/article/view/948},
pages = {87--99},
year = 2017
}
Tu et al. (2017)
Gehring, Jonas and Auli, Michael and Grangier, David and Dauphin, Yann (2017):
A Convolutional Encoder Model for Neural Machine Translation, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
@InProceedings{gehring-EtAl:2017:Long,
author = {Gehring, Jonas and Auli, Michael and Grangier, David and Dauphin, Yann},
title = {A Convolutional Encoder Model for Neural Machine Translation},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {July},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
pages = {123--135},
url = {
http://aclweb.org/anthology/P17-1012},
year = 2017
}
Gehring et al. (2017)
Oda, Yusuke and Arthur, Philip and Neubig, Graham and Yoshino, Koichiro and Nakamura, Satoshi (2017):
Neural Machine Translation via Binary Code Prediction, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
@InProceedings{oda-EtAl:2017:Long,
author = {Oda, Yusuke and Arthur, Philip and Neubig, Graham and Yoshino, Koichiro and Nakamura, Satoshi},
title = {Neural Machine Translation via Binary Code Prediction},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {July},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
pages = {850--860},
url = {
http://aclweb.org/anthology/P17-1079},
year = 2017
}
Oda et al. (2017)
Wang, Mingxuan and Lu, Zhengdong and Zhou, Jie and Liu, Qun (2017):
Deep Neural Machine Translation with Linear Associative Unit, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
@InProceedings{wang-EtAl:2017:Long1,
author = {Wang, Mingxuan and Lu, Zhengdong and Zhou, Jie and Liu, Qun},
title = {Deep Neural Machine Translation with Linear Associative Unit},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {July},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
pages = {136--145},
url = {
http://aclweb.org/anthology/P17-1013},
year = 2017
}
Wang et al. (2017)
Wang, Mingxuan and Lu, Zhengdong and Li, Hang and Liu, Qun (2016):
Memory-enhanced Decoder for Neural Machine Translation, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
@InProceedings{wang-EtAl:2016:EMNLP20161,
author = {Wang, Mingxuan and Lu, Zhengdong and Li, Hang and Liu, Qun},
title = {Memory-enhanced Decoder for Neural Machine Translation},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {278--286},
url = {
https://aclweb.org/anthology/D16-1027},
year = 2016
}
Wang et al. (2016)
Sountsov, Pavel and Sarawagi, Sunita (2016):
Length bias in Encoder Decoder Models and a Case for Global Conditioning, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
@InProceedings{sountsov-sarawagi:2016:EMNLP2016,
author = {Sountsov, Pavel and Sarawagi, Sunita},
title = {Length bias in Encoder Decoder Models and a Case for Global Conditioning},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {1516--1525},
url = {
https://aclweb.org/anthology/D16-1158},
year = 2016
}
Sountsov and Sarawagi (2016)
Shu, Raphael and Miura, Akiva (2016):
Residual Stacking of RNNs for Neural Machine Translation, Proceedings of the 3rd Workshop on Asian Translation (WAT2016)
@InProceedings{shu-miura:2016:WAT2016,
author = {Shu, Raphael and Miura, Akiva},
title = {Residual Stacking of RNNs for Neural Machine Translation},
booktitle = {Proceedings of the 3rd Workshop on Asian Translation (WAT2016)},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {223--229},
url = {
http://aclweb.org/anthology/W16-4623},
year = 2016
}
Shu and Miura (2016)