Coverage
A common source of error in neural machine translation is dropping or double-translating words in the input sentence. Explicit models of word coverage address this problem.
Coverage is the main subject of 11 publications. 8 are discussed here.
Publications
Wenhu Chen and Evgeny Matusov and Shahram Khadivi and Jan-Thorsten Peter (2016):
Guided Alignment Training for Topic-Aware Neural Machine Translation, CoRR
mentioned in Coverage and Adaptation@article{DBLP:journals/corr/ChenMKP16,
author = {Wenhu Chen and Evgeny Matusov and Shahram Khadivi and Jan{-}Thorsten Peter},
title = {Guided Alignment Training for Topic-Aware Neural Machine Translation},
journal = {CoRR},
volume = {abs/1607.01628},
url = {
https://arxiv.org/pdf/1607.01628.pdf},
timestamp = {Tue, 02 Aug 2016 12:59:27 +0200},
biburl = {
http://dblp.uni-trier.de/rec/bib/journals/corr/ChenMKP16},
bibsource = {dblp computer science bibliography,
http://dblp.org},
year = 2016
}
Chen et al. (2016);
Liu, Lemao and Utiyama, Masao and Finch, Andrew and Sumita, Eiichiro (2016):
Neural Machine Translation with Supervised Attention, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
@InProceedings{liu-EtAl:2016:COLING,
author = {Liu, Lemao and Utiyama, Masao and Finch, Andrew and Sumita, Eiichiro},
title = {Neural Machine Translation with Supervised Attention},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {3093--3102},
url = {
http://aclweb.org/anthology/C16-1291},
year = 2016
}
Liu et al. (2016) add supervised word alignment information (obtained with traditional statistical word alignment methods) to training. They augment the objective function to also optimize matching of the attention mechanism to the given alignments.
To better model coverage,
Tu, Zhaopeng and Lu, Zhengdong and Liu, Yang and Liu, Xiaohua and Li, Hang (2016):
Modeling Coverage for Neural Machine Translation, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
@InProceedings{tu-EtAl:2016:P16-1,
author = {Tu, Zhaopeng and Lu, Zhengdong and Liu, Yang and Liu, Xiaohua and Li, Hang},
title = {Modeling Coverage for Neural Machine Translation},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {76--85},
url = {
http://www.aclweb.org/anthology/P16-1008},
year = 2016
}
Tu et al. (2016) add coverage states for each input word by either (a) summing up attention values, scaled by a fertility value predicted from the input word in context, or (b) learning a coverage update function as a feed-forward neural network layer. This coverage state is added as additional conditioning context for the prediction of the attention state.
Feng, Shi and Liu, Shujie and Yang, Nan and Li, Mu and Zhou, Ming and Zhu, Kenny Q. (2016):
Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
@InProceedings{feng-EtAl:2016:COLING3,
author = {Feng, Shi and Liu, Shujie and Yang, Nan and Li, Mu and Zhou, Ming and Zhu, Kenny Q.},
title = {Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {3082--3092},
url = {
http://aclweb.org/anthology/C16-1290},
year = 2016
}
Feng et al. (2016) condition the prediction of the attention state also on the previous context state and also introduce a coverage state (initialized with the sum of input source embeddings) that aims to subtract covered words at each step. Similarly,
Meng, Fandong and Lu, Zhengdong and Li, Hang and Liu, Qun (2016):
Interactive Attention for Neural Machine Translation, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
@InProceedings{meng-EtAl:2016:COLING,
author = {Meng, Fandong and Lu, Zhengdong and Li, Hang and Liu, Qun},
title = {Interactive Attention for Neural Machine Translation},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {2174--2185},
url = {
http://aclweb.org/anthology/C16-1205},
year = 2016
}
Meng et al. (2016) separate hidden states that keep track of source coverage and hidden states that keep track of produced output.
Cohn, Trevor and Hoang, Cong Duy Vu and Vymolova, Ekaterina and Yao, Kaisheng and Dyer, Chris and Haffari, Gholamreza (2016):
Incorporating Structural Alignment Biases into an Attentional Neural Translation Model, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
@InProceedings{cohn-EtAl:2016:N16-1,
author = {Cohn, Trevor and Hoang, Cong Duy Vu and Vymolova, Ekaterina and Yao, Kaisheng and Dyer, Chris and Haffari, Gholamreza},
title = {Incorporating Structural Alignment Biases into an Attentional Neural Translation Model},
booktitle = {Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
month = {June},
address = {San Diego, California},
publisher = {Association for Computational Linguistics},
pages = {876--885},
url = {
http://www.aclweb.org/anthology/N16-1102},
year = 2016
}
Cohn et al. (2016) add a number of biases to model coverage, fertility, and alignment inspired by traditional statistical machine translation models. They condition the prediction of the attention state on absolute word positions, the attention state of the previous output word in a limited window, and coverage (added attention state values) over a limited window. They also add a fertility model and add coverage in the training objective.
Alkhouli, Tamer and Bretschner, Gabriel and Peter, Jan-Thorsten and Hethnawi, Mohammed and Guta, Andreas and Ney, Hermann (2016):
Alignment-Based Neural Machine Translation, Proceedings of the First Conference on Machine Translation
@InProceedings{alkhouli-EtAl:2016:WMT,
author = {Alkhouli, Tamer and Bretschner, Gabriel and Peter, Jan-Thorsten and Hethnawi, Mohammed and Guta, Andreas and Ney, Hermann},
title = {Alignment-Based Neural Machine Translation},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {54--65},
url = {
http://www.aclweb.org/anthology/W/W16/W16-2206},
year = 2016
}
Alkhouli et al. (2016) propose to integrate an alignment model that is similar to word-based statistical machine translation into a basic sequence-to-sequence translation model. This model is trained externally with traditional word alignment methods and informs predictions about which input word to translate next and bases the lexical translation decision on that word.
Alkhouli, Tamer and Ney, Hermann (2017):
Biasing Attention-Based Recurrent Neural Networks Using External Alignment Information, Proceedings of the Second Conference on Machine Translation, Volume 1: Research Paper
@InProceedings{alkhouli-ney:2017:WMT,
author = {Alkhouli, Tamer and Ney, Hermann},
title = {Biasing Attention-Based Recurrent Neural Networks Using External Alignment Information},
booktitle = {Proceedings of the Second Conference on Machine Translation, Volume 1: Research Paper},
month = {September},
address = {Copenhagen, Denmark},
publisher = {Association for Computational Linguistics},
pages = {108--117},
url = {
http://www.aclweb.org/anthology/W17-4711},
year = 2017
}
Alkhouli and Ney (2017) combine such a alignment model with the more traditional attention model, showing improvements.
Benchmarks
Discussion
Related Topics
New Publications
Yang, Zonghan and Cheng, Yong and Liu, Yang and Sun, Maosong (2019):
Reducing Word Omission Errors in Neural Machine Translation: A Contrastive Learning Approach, Proceedings of the 57th Conference of the Association for Computational Linguistics
@inproceedings{yang-etal-2019-reducing,
author = {Yang, Zonghan and Cheng, Yong and Liu, Yang and Sun, Maosong},
title = {Reducing Word Omission Errors in Neural Machine Translation: A Contrastive Learning Approach},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1623},
pages = {6191--6196},
year = 2019
}
Yang et al. (2019)
Li, Yanyang and Xiao, Tong and Li, Yinqiao and Wang, Qiang and Xu, Changming and Zhu, Jingbo (2018):
A Simple and Effective Approach to Coverage-Aware Neural Machine Translation, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
@InProceedings{P18-2047,
author = {Li, Yanyang and Xiao, Tong and Li, Yinqiao and Wang, Qiang and Xu, Changming and Zhu, Jingbo},
title = {A Simple and Effective Approach to Coverage-Aware Neural Machine Translation},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
publisher = {Association for Computational Linguistics},
pages = {292--297},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/P18-2047},
year = 2018
}
Li et al. (2018)
Supervised Attention
Mi, Haitao and Wang, Zhiguo and Ittycheriah, Abe (2016):
Supervised Attentions for Neural Machine Translation, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
@InProceedings{mi-wang-ittycheriah:2016:EMNLP2016,
author = {Mi, Haitao and Wang, Zhiguo and Ittycheriah, Abe},
title = {Supervised Attentions for Neural Machine Translation},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {2283--2288},
url = {
https://aclweb.org/anthology/D16-1249},
year = 2016
}
Mi et al. (2016)