Multi-Lingual, Multi-Modal, Multi-Task
Training machine translation for multiple language pairs leads to more generalization in the models, and helps low-resource language pairs. Moreover, the input to machine translation may also be enriched by information from other modalities, such as images or speech. And finally, machine translation may just be one task of an integrated neural network that performs other language processing tasks.
Multilingual Multimodal Multitask is the main subject of 71 publications. 42 are discussed here.
Publications
Multi-language training:
Zoph, Barret and Yuret, Deniz and May, Jonathan and Knight, Kevin (2016):
Transfer Learning for Low-Resource Neural Machine Translation, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

@InProceedings{zoph-EtAl:2016:EMNLP2016,
author = {Zoph, Barret and Yuret, Deniz and May, Jonathan and Knight, Kevin},
title = {Transfer Learning for Low-Resource Neural Machine Translation},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {1568--1575},
url = {
https://aclweb.org/anthology/D16-1163},
year = 2016
}
Zoph et al. (2016) first train on a resource language pair and then adapt the resulting model towards a targeted low resource language, and show gains over just training on the low resource language.
Nguyen, Toan Q. and Chiang, David (2017):
Transfer Learning across Low-Resource, Related Languages for Neural Machine Translation, Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

@InProceedings{nguyen-chiang:2017:I17-2,
author = {Nguyen, Toan Q. and Chiang, David},
title = {Transfer Learning across Low-Resource, Related Languages for Neural Machine Translation},
booktitle = {Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
month = {November},
address = {Taipei, Taiwan},
publisher = {Asian Federation of Natural Language Processing},
pages = {296--301},
url = {
http://www.aclweb.org/anthology/I17-2050},
year = 2017
}
Nguyen and Chiang (2017) show better results when merging the vocabularies of the different input languages.
Thanh-Le Ha and Jan Niehues and Alex Waibel (2016):
Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder, Proceedings of the International Workshop on Spoken Language Translation (IWSLT)

@inproceedings{IWSLT-2016-Ha,
author = {Thanh-Le Ha and Jan Niehues and Alex Waibel},
title = {Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder},
booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)},
location = {Seattle, Washington, USA},
url = {
http://workshop2016.iwslt.org/downloads/IWSLT\_2016\_paper\_5.pdf},
month = {December},
year = 2016
}
Ha et al. (2016) prefix each input work with a language identifier (e.g., @en@dog, @de@Hund) and add monolingual data, both as source and target.
Thanh-Le Ha and Jan Niehues and Alex Waibel (2017):
Effective Strategies in Zero-Shot Neural Machine Translation, Proceedings of the International Workshop on Spoken Language Translation (IWSLT)

@inproceedings{IWSLT2017:Ha,
author = {Thanh-Le Ha and Jan Niehues and Alex Waibel},
title = {Effective Strategies in Zero-Shot Neural Machine Translation},
url = {
http://workshop2017.iwslt.org/downloads/P06-Paper.pdf},
booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)},
location = {Tokyo, Japan},
year = 2017
}
Ha et al. (2017) observe that translation in multi-language systems with multiple target languages may switch to the wrong language. They limit word predictions to words existing in the desired target language, and also add source side language-identifying word factors.
Lakew, Surafel Melaku and Cettolo, Mauro and Federico, Marcello (2018):
A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation, Proceedings of the 27th International Conference on Computational Linguistics

@inproceedings{C18-1054,
author = {Lakew, Surafel Melaku and Cettolo, Mauro and Federico, Marcello},
title = {A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation},
booktitle = {Proceedings of the 27th International Conference on Computational Linguistics},
month = {aug},
address = {Santa Fe, New Mexico, USA},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/C18-1054},
pages = {641--652},
year = 2018
}
Lakew et al. (2018) show that Transformer models perform better for multi-language pair training than previous models based on recurrent neural networks.
Lakew, Surafel Melaku and Erofeeva, Aliia and Federico, Marcello (2018):
Neural Machine Translation into Language Varieties, Proceedings of the Third Conference on Machine Translation: Research Papers

@inproceedings{W18-6316,
author = {Lakew, Surafel Melaku and Erofeeva, Aliia and Federico, Marcello},
title = {Neural Machine Translation into Language Varieties},
booktitle = {Proceedings of the Third Conference on Machine Translation: Research Papers},
month = {oct},
address = {Belgium, Brussels},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W18-6316},
pages = {156--164},
year = 2018
}
Lakew et al. (2018) build one-to-many translation models for languages varieties, i.e., closely related dialects such Brazilian and European Portuguese or Croatian and Serbian. This requires language variety identification to separate out the training data.
Surafel Melaku Lakew and Aliia Erofeeva and Matteo Negri and Marcello Federico and Marco Turchi (2018):
Transfer Learning in Multilingual Neural Machine Translation with Dynamic Vocabulary, Proceedings of the International Workshop on Spoken Language Translation (IWSLT)

@inproceedings{iwslt18-Transfer-Melaku,
author = {Surafel Melaku Lakew and Aliia Erofeeva and Matteo Negri and Marcello Federico and Marco Turchi},
title = {Transfer Learning in Multilingual Neural Machine Translation with Dynamic Vocabulary},
booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)},
url = {
https://arxiv.org/pdf/1811.01137.pdf},
year = 2018
}
Lakew et al. (2018) start with a model trained on a high-resource language pair and then incrementally add low-resource language pairs, including new vocabulary items. They show much faster training convergence and slight quality gains over joint training.
Neubig, Graham and Hu, Junjie (2018):
Rapid Adaptation of Neural Machine Translation to New Languages, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1103,
author = {Neubig, Graham and Hu, Junjie},
title = {Rapid Adaptation of Neural Machine Translation to New Languages},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1103},
pages = {875--880},
year = 2018
}
Neubig and Hu (2018) train a many-to-one model for 58 language pairs and fine-tune it towards each of them.
Aharoni, Roee and Johnson, Melvin and Firat, Orhan (2019):
Massively Multilingual Neural Machine Translation, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

@inproceedings{aharoni-etal-2019-massively,
author = {Aharoni, Roee and Johnson, Melvin and Firat, Orhan},
title = {Massively Multilingual Neural Machine Translation},
booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
month = {jun},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/N19-1388},
pages = {3874--3884},
year = 2019
}
Aharoni et al. (2019) scale up multi-language training to up to 103 languages, training on language pairs with English on either side, measuring average translation performance from English and into English. They show that many-to-many systems improve over many-to-one system when translating into English but not over one-to-many systems when translating from English. They also see degradation when combining more than 5 languages.
Murthy, Rudra and Kunchukuttan, Anoop and Bhattacharyya, Pushpak (2019):
Addressing word-order Divergence in Multilingual Neural Machine Translation for extremely Low Resource Languages, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

@inproceedings{murthy-etal-2019-addressing,
author = {Murthy, Rudra and Kunchukuttan, Anoop and Bhattacharyya, Pushpak},
title = {Addressing word-order Divergence in Multilingual Neural Machine Translation for extremely Low Resource Languages},
booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
month = {jun},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/N19-1387},
pages = {3868--3873},
year = 2019
}
Murthy et al. (2019) identify a problem when a targeted language pair in the multi-language setup is low resource and has different word order from the other language pair. They propose to pre-order the input to match the word order of the dominant language.
Zero-Shot:
Johnson, Melvin and Schuster, Mike and Le, Quoc and Krikun, Maxim and Wu, Yonghui and Chen, Zhifeng and Thorat, Nikhil and Viegas, Fernanda and Wattenberg, Martin and Corrado, Greg and Hughes, Macduff and Dean, Jeffrey (2017):
Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation, Transactions of the Association for Computational Linguistics

@article{TACL1081,
author = {Johnson, Melvin and Schuster, Mike and Le, Quoc and Krikun, Maxim and Wu, Yonghui and Chen, Zhifeng and Thorat, Nikhil and Viegas, Fernanda and Wattenberg, Martin and Corrado, Greg and Hughes, Macduff and Dean, Jeffrey},
title = {Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation},
journal = {Transactions of the Association for Computational Linguistics},
volume = {5},
keywords = {{}},
issn = {2307-387X},
url = {
https://transacl.org/ojs/index.php/tacl/article/view/1081},
pages = {339--351},
year = 2017
}
Johnson et al. (2017) explore how well a single canonical neural translation model is able to learn from multiple to multiple languages, by simultaneously training on on parallel corpora for several language pairs. They show small benefits for several input languages with the same output languages, mixed results for translating into multiple output languages (indicated by an additional input language token). The most interesting result is the ability for such a model to translate in language directions for which no parallel corpus is provided ("zero-shot"), thus demonstrating that some interlingual meaning representation is learned, although less well than using traditional pivot methods.
Giulia Mattoni and Pat Nagle and Carlos Collantes and Dimitar Shterionov (2017):
Zero-Shot Translation for Low-Resource Indian Languages, Machine Translation Summit XVI

@inproceedings{mtsummit2017:Mattoni,
author = {Giulia Mattoni and Pat Nagle and Carlos Collantes and Dimitar Shterionov},
title = {Zero-Shot Translation for Low-Resource {Indian} Languages},
booktitle = {Machine Translation Summit XVI},
location = {Nagoya, Japan},
year = 2017
}
Mattoni et al. (2017) explore zero-shot training for Indian languages with sparse training data, achieving limited success.
Al-Shedivat, Maruan and Parikh, Ankur (2019):
Consistency by Agreement in Zero-Shot Neural Machine Translation, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

@inproceedings{al-shedivat-parikh-2019-consistency,
author = {Al-Shedivat, Maruan and Parikh, Ankur},
title = {Consistency by Agreement in Zero-Shot Neural Machine Translation},
booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
month = {jun},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/N19-1121},
pages = {1184--1197},
year = 2019
}
Al-Shedivat and Parikh (2019) extend the training objective of zero-shot training in the scenario of English-X parallel corpora so that given an English-French sentence pair the translations French-Russian and English-Russian are consistent.
Multi-Language Training with Language-Specific Components:
There have been a few suggestions to alter the model for multi-language pair training.
Dong, Daxiang and Wu, Hua and He, Wei and Yu, Dianhai and Wang, Haifeng (2015):
Multi-Task Learning for Multiple Language Translation, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

@inproceedings{dong-etal-2015-multi,
author = {Dong, Daxiang and Wu, Hua and He, Wei and Yu, Dianhai and Wang, Haifeng},
title = {Multi-Task Learning for Multiple Language Translation},
booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
month = {jul},
address = {Beijing, China},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P15-1166},
doi = {10.3115/v1/P15-1166},
pages = {1723--1732},
year = 2015
}
Dong et al. (2015) use different decoders for each target language.
Firat, Orhan and Cho, Kyunghyun and Bengio, Yoshua (2016):
Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

@InProceedings{firat-cho-bengio:2016:N16-1,
author = {Firat, Orhan and Cho, Kyunghyun and Bengio, Yoshua},
title = {Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism},
booktitle = {Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
month = {June},
address = {San Diego, California},
publisher = {Association for Computational Linguistics},
pages = {866--875},
url = {
http://www.aclweb.org/anthology/N16-1101},
year = 2016
}
Firat et al. (2016) support multi-language input and output by training language-specific encoders and decoders and a shared attention mechanism.
Firat, Orhan and Sankaran, Baskaran and Al-Onaizan, Yaser and Yarman Vural, Fatos T. and Cho, Kyunghyun (2016):
Zero-Resource Translation with Multi-Lingual Neural Machine Translation, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

@InProceedings{firat-EtAl:2016:EMNLP2016,
author = {Firat, Orhan and Sankaran, Baskaran and Al-Onaizan, Yaser and Yarman Vural, Fatos T. and Cho, Kyunghyun},
title = {Zero-Resource Translation with Multi-Lingual Neural Machine Translation},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {268--277},
url = {
https://aclweb.org/anthology/D16-1026},
year = 2016
}
Firat et al. (2016) evaluate how well this model works for zero-shot translation.
Lu, Yichao and Keung, Phillip and Ladhak, Faisal and Bhardwaj, Vikas and Zhang, Shaonan and Sun, Jason (2018):
A neural interlingua for multilingual machine translation, Proceedings of the Third Conference on Machine Translation: Research Papers

@inproceedings{W18-6309,
author = {Lu, Yichao and Keung, Phillip and Ladhak, Faisal and Bhardwaj, Vikas and Zhang, Shaonan and Sun, Jason},
title = {A neural interlingua for multilingual machine translation},
booktitle = {Proceedings of the Third Conference on Machine Translation: Research Papers},
month = {oct},
address = {Belgium, Brussels},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W18-6309},
pages = {84--92},
year = 2018
}
Lu et al. (2018) add an additional interlingua layer between specialized encoders and decoders that is shared across all language pairs.
Conversely,
Blackwood, Graeme and Ballesteros, Miguel and Ward, Todd (2018):
Multilingual Neural Machine Translation with Task-Specific Attention, Proceedings of the 27th International Conference on Computational Linguistics

@inproceedings{C18-1263,
author = {Blackwood, Graeme and Ballesteros, Miguel and Ward, Todd},
title = {Multilingual Neural Machine Translation with Task-Specific Attention},
booktitle = {Proceedings of the 27th International Conference on Computational Linguistics},
month = {aug},
address = {Santa Fe, New Mexico, USA},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/C18-1263},
pages = {3112--3122},
year = 2018
}
Blackwood et al. (2018) use shared encoders and decoders but language-pair specific attention.
Sachan, Devendra and Neubig, Graham (2018):
Parameter Sharing Methods for Multilingual Self-Attentional Translation Models, Proceedings of the Third Conference on Machine Translation: Research Papers

@inproceedings{W18-6327,
author = {Sachan, Devendra and Neubig, Graham},
title = {Parameter Sharing Methods for Multilingual Self-Attentional Translation Models},
booktitle = {Proceedings of the Third Conference on Machine Translation: Research Papers},
month = {oct},
address = {Belgium, Brussels},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W18-6327},
pages = {261--271},
year = 2018
}
Sachan and Neubig (2018) investigate which parameters in a Transformer model should be shared during one-to-many training and find that partial sharing of components outperforms no sharing or full sharing, although the best configuration depends on the languages involved.
Wang, Yining and Zhang, Jiajun and Zhai, Feifei and Xu, Jingfang and Zong, Chengqing (2018):
Three Strategies to Improve One-to-Many Multilingual Translation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1326,
author = {Wang, Yining and Zhang, Jiajun and Zhai, Feifei and Xu, Jingfang and Zong, Chengqing},
title = {Three Strategies to Improve One-to-Many Multilingual Translation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1326},
pages = {2955--2960},
year = 2018
}
Wang et al. (2018) add language-dependent positional embeddings and split the decoder state into a general and language-dependent part.
Platanios, Emmanouil Antonios and Sachan, Mrinmaya and Neubig, Graham and Mitchell, Tom (2018):
Contextual Parameter Generation for Universal Neural Machine Translation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1039,
author = {Platanios, Emmanouil Antonios and Sachan, Mrinmaya and Neubig, Graham and Mitchell, Tom},
title = {Contextual Parameter Generation for Universal Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1039},
pages = {425--435},
year = 2018
}
Platanios et al. (2018) generate the language-pair specific parameters for the encoder and decoder with a parameter generator that takes embeddings of input and output language identifiers as input.
Gu, Jiatao and Wang, Yong and Chen, Yun and Li, Victor O. K. and Cho, Kyunghyun (2018):
Meta-Learning for Low-Resource Neural Machine Translation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1398,
author = {Gu, Jiatao and Wang, Yong and Chen, Yun and Li, Victor O. K. and Cho, Kyunghyun},
title = {Meta-Learning for Low-Resource Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1398},
pages = {3622--3631},
year = 2018
}
Gu et al. (2018) frame the multi-language training setup as meta learning, which they define as either learning a policy for updating model parameters or learning a good parameter initialization method for fast adaptation. Their approach falls under that second definition and is similar to multi-language training with adaptation via fine-tuning, except for optimization during the first phase towards parameters that can be quickly adapted.
Gu, Jiatao and Hassan, Hany and Devlin, Jacob and Li, Victor O.K. (2018):
Universal Neural Machine Translation for Extremely Low Resource Languages, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

@InProceedings{N18-1032,
author = {Gu, Jiatao and Hassan, Hany and Devlin, Jacob and Li, Victor O.K.},
title = {Universal Neural Machine Translation for Extremely Low Resource Languages},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)},
publisher = {Association for Computational Linguistics},
pages = {344--354},
location = {New Orleans, Louisiana},
url = {
http://aclweb.org/anthology/N18-1032},
year = 2018
}
Gu et al. (2018) focus on the problem of word representation in multi-lingual training. They map the tokens of every language into a universal embedding space, aided monolingual data.
Xinyi Wang and Hieu Pham and Philip Arthur and Graham Neubig (2019):
Multilingual Neural Machine Translation With Soft Decoupled Encoding, International Conference on Learning Representations (ICLR)

@inproceedings{iclr-multilingual-soft-2019,
author = {Xinyi Wang and Hieu Pham and Philip Arthur and Graham Neubig},
title = {Multilingual Neural Machine Translation With Soft Decoupled Encoding},
booktitle = {International Conference on Learning Representations (ICLR)},
url = {
https://openreview.net/pdf?id=Skeke3C5Fm},
year = 2019
}
Wang et al. (2019) have the same goal in mind and use language-specific and language-independent character-based word representations to map to a shared word embedding space. This is done for input words in a 58 language to English translation model.
Xu Tan and Yi Ren and Di He and Tao Qin and Zhou Zhao and Tie-Yan Liu (2019):
Multilingual Neural Machine Translation with Knowledge Distillation, International Conference on Learning Representations (ICLR)

@inproceedings{iclr-multilingual-knowledge-2019,
author = {Xu Tan and Yi Ren and Di He and Tao Qin and Zhou Zhao and Tie-Yan Liu},
title = {Multilingual Neural Machine Translation with Knowledge Distillation},
booktitle = {International Conference on Learning Representations (ICLR)},
url = {
https://openreview.net/pdf?id=S1gUsoR9YX},
year = 2019
}
Tan et al. (2019) change the training objective for multi-language training. In addition to matching the training data for the language pairs, an additional training objective is to match the prediction of a "teacher" model that was trained on the corresponding single-language pair data.
Malaviya, Chaitanya and Neubig, Graham and Littell, Patrick (2017):
Learning Language Representations for Typology Prediction, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

@InProceedings{D17-1268,
author = {Malaviya, Chaitanya and Neubig, Graham and Littell, Patrick},
title = {Learning Language Representations for Typology Prediction},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},
publisher = {Association for Computational Linguistics},
pages = {2529--2535},
location = {Copenhagen, Denmark},
url = {
http://aclweb.org/anthology/D17-1268},
year = 2017
}
Malaviya et al. (2017) use the embedding associated with the language indicator token in massively multi-language models to predict typological properties of a language.
Ren, Shuo and Chen, Wenhu and Liu, Shujie and Li, Mu and Zhou, Ming and Ma, Shuai (2018):
Triangular Architecture for Rare Language Translation, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

@InProceedings{P18-1006,
author = {Ren, Shuo and Chen, Wenhu and Liu, Shujie and Li, Mu and Zhou, Ming and Ma, Shuai},
title = {Triangular Architecture for Rare Language Translation},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
publisher = {Association for Computational Linguistics},
pages = {56--65},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/P18-1006},
year = 2018
}
Ren et al. (2018) address the challenge of pivot translation (train a X-Z model by using a third language Y with large corpora X-Y and X-Z) in a neural model approach by setting up training objectives that match translation through the pivot path and the direct translation, and also other paths in this language triangle.
Multiple Inputs:
Zoph, Barret and Knight, Kevin (2016):
Multi-Source Neural Translation, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

@InProceedings{zoph-knight:2016:N16-1,
author = {Zoph, Barret and Knight, Kevin},
title = {Multi-Source Neural Translation},
booktitle = {Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
month = {June},
address = {San Diego, California},
publisher = {Association for Computational Linguistics},
pages = {30--34},
url = {
http://www.aclweb.org/anthology/N16-1004},
year = 2016
}
Zoph and Knight (2016) augment a translation model to consume two meaning-equivalent sentences in different languages as input.
Zhou, Long and Hu, Wenpeng and Zhang, Jiajun and Zong, Chengqing (2017):
Neural System Combination for Machine Translation, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
mentioned in Inference and Multilingual Multimodal Multitask@InProceedings{zhou-EtAl:2017:Short1,
author = {Zhou, Long and Hu, Wenpeng and Zhang, Jiajun and Zong, Chengqing},
title = {Neural System Combination for Machine Translation},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
month = {July},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
pages = {378--384},
url = {
http://aclweb.org/anthology/P17-2060},
year = 2017
}
Zhou et al. (2017) apply this idea to the task of system combination, i.e., obtaining a consensus translation from multiple machine translation outputs.
Garmash, Ekaterina and Monz, Christof (2016):
Ensemble Learning for Multi-Source Neural Machine Translation, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

@InProceedings{garmash-monz:2016:COLING,
author = {Garmash, Ekaterina and Monz, Christof},
title = {Ensemble Learning for Multi-Source Neural Machine Translation},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {1409--1418},
url = {
http://aclweb.org/anthology/C16-1133},
year = 2016
}
Garmash and Monz (2016) train multiple single-language systems, feed each the corresponding meaning-equivalent input sentence and combine these predictions of the models in an ensemble approach during decoding.
Nishimura, Yuta and Sudoh, Katsuhito and Neubig, Graham and Nakamura, Satoshi (2018):
Multi-Source Neural Machine Translation with Missing Data, Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

@InProceedings{W18-2711,
author = {Nishimura, Yuta and Sudoh, Katsuhito and Neubig, Graham and Nakamura, Satoshi},
title = {Multi-Source Neural Machine Translation with Missing Data},
booktitle = {Proceedings of the 2nd Workshop on Neural Machine Translation and Generation},
publisher = {Association for Computational Linguistics},
pages = {92--99},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/W18-2711},
year = 2018
}
Nishimura et al. (2018) explore how a multi-source model works when input for some languages is missing. In their experiments, the multi-encoder approach works more often better than the ensemble.
Yuta Nishimura and Katsuhito Sudoh and Graham Neubig and Satoshi Nakamura (2018):
Multi-Source Neural Machine Translation with Data Augmentation, Proceedings of the International Workshop on Spoken Language Translation (IWSLT)
mentioned in Training and Multilingual Multimodal Multitask@inproceedings{iwslt18-Nishimura-Multi-Source,
author = {Yuta Nishimura and Katsuhito Sudoh and Graham Neubig and Satoshi Nakamura},
title = {Multi-Source Neural Machine Translation with Data Augmentation},
booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)},
url = {
https://arxiv.org/pdf/1810.06826.pdf},
year = 2018
}
Nishimura et al. (2018) fill in the missing sentences in the training data with (multi-source) back-translation.
Raj Dabre and Fabien Cromieres and Sadao Kurohashi (2017):
Enabling Multi-Source Neural Machine Translation by Concatenating Source Sentences in Multiple Languages, Machine Translation Summit XVI

@inproceedings{mtsummit2017:Dabre,
author = {Raj Dabre and Fabien Cromieres and Sadao Kurohashi},
title = {Enabling Multi-Source Neural Machine Translation by Concatenating Source Sentences in Multiple Languages},
booktitle = {Machine Translation Summit XVI},
location = {Nagoya, Japan},
url = {
https://arxiv.org/pdf/1702.06135.pdf},
year = 2017
}
Dabre et al. (2017) concatenate the input sentences, and also use training data in the same format (which requires intersecting overlapping parallel corpora).
Pre-trained word embeddings:
Mattia Antonino Di Gangi and Marcello Federico (2017):
Monolingual Embeddings for Low Resourced Neural Machine Translation, Proceedings of the International Workshop on Spoken Language Translation (IWSLT)

@inproceedings{IWSLT2017:DiGangi,
author = {Mattia Antonino Di~Gangi and Marcello Federico},
title = {Monolingual Embeddings for Low Resourced Neural Machine Translation},
url = {
http://workshop2017.iwslt.org/downloads/P05-Paper.pdf},
booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)},
location = {Tokyo, Japan},
year = 2017
}
Di Gangi and Federico (2017) do not observe improvement when using monolingual word embeddings in a gated network that trains additional word embeddings purely on parallel data.
Abdou, Mostafa and Gloncak, Vladan and Bojar, Ondřej (2017):
Variable Mini-Batch Sizing and Pre-Trained Embeddings, Proceedings of the Second Conference on Machine Translation, Volume 2: Shared Task Papers
mentioned in Research Groups and Multilingual Multimodal Multitask@InProceedings{abdou-gloncak-bojar:2017:WMT,
author = {Abdou, Mostafa and Gloncak, Vladan and Bojar, Ond\v{r}ej},
title = {Variable Mini-Batch Sizing and Pre-Trained Embeddings},
booktitle = {Proceedings of the Second Conference on Machine Translation, Volume 2: Shared Task Papers},
month = {September},
address = {Copenhagen, Denmark},
publisher = {Association for Computational Linguistics},
pages = {680--686},
url = {
http://www.aclweb.org/anthology/W17-4780},
year = 2017
}
Abdou et al. (2017) showed worse performance on a WMT news translation task with pre-trained word embeddings. They argue, as
Felix Hill and Kyunghyun Cho and Sébastien Jean and Coline Devin and Yoshua Bengio (2014):
Embedding Word Similarity with Neural Machine Translation, CoRR
mentioned in Multilingual Multimodal Multitask and Alternative Architectures@article{DBLP:journals/corr/HillCJDB14a,
author = {Felix Hill and Kyunghyun Cho and S{\'{e}}bastien Jean and Coline Devin and Yoshua Bengio},
title = {Embedding Word Similarity with Neural Machine Translation},
journal = {CoRR},
volume = {abs/1412.6448},
url = {
http://arxiv.org/abs/1412.6448},
timestamp = {Thu, 01 Jan 2015 19:51:08 +0100},
biburl = {
http://dblp.uni-trier.de/rec/bib/journals/corr/HillCJDB14a},
bibsource = {dblp computer science bibliography,
http://dblp.org},
year = 2014
}
Hill et al. (2014);
Hill, Felix and Cho, Kyunghyun and Jean, Sébastien and Bengio, Yoshua (2017):
The Representational Geometry of Word Meanings Acquired by Neural Machine Translation Models, Machine Translation

@article{Hill:2017:RGW:3127662.3127714,
author = {Hill, Felix and Cho, Kyunghyun and Jean, S{\'e}bastien and Bengio, Yoshua},
title = {The Representational Geometry of Word Meanings Acquired by Neural Machine Translation Models},
journal = {Machine Translation},
issue_date = {June 2017},
volume = {31},
number = {1-2},
month = {jun},
issn = {0922-6567},
pages = {3--18},
numpages = {16},
url = {
https://doi.org/10.1007/s10590-017-9194-2},
doi = {10.1007/s10590-017-9194-2},
acmid = {3127714},
publisher = {Kluwer Academic Publishers},
address = {Hingham, MA, USA},
keywords = {Machine translation, Representation, Word embeddings},
year = 2017
}
Hill et al. (2017) did previously, that neural machine translation requires word embeddings that are based on semantic similarity of words (teacher and professor) rather than other kinds of relatedness (teacher and student), and demonstrate that word embeddings trained for translation score better on standard semantic similarity tasks.
Mikel Artetxe and Gorka Labaka and Eneko Agirre and Kyunghyun Cho (2018):
Unsupervised Neural Machine Translation, International Conference on Learning Representations
mentioned in Monolingual Data and Multilingual Multimodal Multitask@inproceedings{artetxe2018unsupervised,
author = {Mikel Artetxe and Gorka Labaka and Eneko Agirre and Kyunghyun Cho},
title = {Unsupervised Neural Machine Translation},
booktitle = {International Conference on Learning Representations},
url = {
https://openreview.net/forum?id=Sy2ogebAW},
year = 2018
}
Artetxe et al. (2018) use monolingually trained word embeddings in a neural machine translation system, without using any parallel corpus.
Qi, Ye and Sachan, Devendra and Felix, Matthieu and Padmanabhan, Sarguna and Neubig, Graham (2018):
When and Why Are Pre-Trained Word Embeddings Useful for Neural Machine Translation?, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

@InProceedings{N18-2084,
author = {Qi, Ye and Sachan, Devendra and Felix, Matthieu and Padmanabhan, Sarguna and Neubig, Graham},
title = {When and Why Are Pre-Trained Word Embeddings Useful for Neural Machine Translation?},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)},
publisher = {Association for Computational Linguistics},
pages = {529--535},
location = {New Orleans, Louisiana},
url = {
http://aclweb.org/anthology/N18-2084},
year = 2018
}
Qi et al. (2018) do show gains with pre-trained word embeddings in low resource conditions, but that benefits decrease with larger data sizes.
Multi-task training:
Niehues, Jan and Cho, Eunah (2017):
Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning, Proceedings of the Second Conference on Machine Translation, Volume 1: Research Paper

@InProceedings{niehues-cho:2017:WMT,
author = {Niehues, Jan and Cho, Eunah},
title = {Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning},
booktitle = {Proceedings of the Second Conference on Machine Translation, Volume 1: Research Paper},
month = {September},
address = {Copenhagen, Denmark},
publisher = {Association for Computational Linguistics},
pages = {80--89},
url = {
http://www.aclweb.org/anthology/W17-4708},
year = 2017
}
Niehues and Cho (2017) tackle multiple tasks (translation, part-of-speech tagging, and named entity identification) with shared components of a sequence to sequence model, showing that training on several tasks improves performance on each individual task.
Zaremoodi, Poorya and Haffari, Gholamreza (2018):
Neural Machine Translation for Bilingually Scarce Scenarios: a Deep Multi-Task Learning Approach, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

@InProceedings{N18-1123,
author = {Zaremoodi, Poorya and Haffari, Gholamreza},
title = {Neural Machine Translation for Bilingually Scarce Scenarios: a Deep Multi-Task Learning Approach},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)},
publisher = {Association for Computational Linguistics},
pages = {1356--1365},
location = {New Orleans, Louisiana},
url = {
http://aclweb.org/anthology/N18-1123},
year = 2018
}
Zaremoodi and Haffari (2018) refine this approach with adversarial training that enforces task-independent representation in intermediate layers, and apply to to joint training with syntactic and semantic parsing.
Li, Guanlin and Liu, Lemao and Li, Xintong and Zhu, Conghui and Zhao, Tiejun and Shi, Shuming (2019):
Understanding and Improving Hidden Representations for Neural Machine Translation, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

@inproceedings{li-etal-2019-understanding,
author = {Li, Guanlin and Liu, Lemao and Li, Xintong and Zhu, Conghui and Zhao, Tiejun and Shi, Shuming},
title = {U}nderstanding and {I}mproving {H}idden {R}epresentations for {N}eural {M}achine {Translation},
booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
month = {jun},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/N19-1046},
pages = {466--477},
year = 2019
}
Li et al. (2019) add as auxiliary tasks the prediction of hierarchical word classes obtained by hierarchical Brown clustering. In the first layer of the decoder of a transformer model, the coarsest word classes are predicted, and in later layers more fine-grained word classes are predicted. The authors argue that this increases generalization ability of intermediate representations and show improvements in translation quality.
Benchmarks
Discussion
Related Topics
New Publications
Pham, Ngoc-Quan and Niehues, Jan and Ha, Thanh-Le and Waibel, Alex (2019):
Improving Zero-shot Translation with Language-Independent Constraints, Proceedings of the Fourth Conference on Machine Translation

@InProceedings{pham-EtAl:2019:WMT,
author = {Pham, Ngoc-Quan and Niehues, Jan and Ha, Thanh-Le and Waibel, Alex},
title = {Improving Zero-shot Translation with Language-Independent Constraints},
booktitle = {Proceedings of the Fourth Conference on Machine Translation},
month = {August},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
pages = {13--23},
url = {
http://www.aclweb.org/anthology/W19-5202},
year = 2019
}
Pham et al. (2019)
Calixto, Iacer and Rios, Miguel and Aziz, Wilker (2019):
Latent Variable Model for Multi-modal Translation, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{calixto-etal-2019-latent,
author = {Calixto, Iacer and Rios, Miguel and Aziz, Wilker},
title = {Latent Variable Model for Multi-modal Translation},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1642},
pages = {6392--6405},
year = 2019
}
Calixto et al. (2019)
Chen, Xilun and Awadallah, Ahmed Hassan and Hassan, Hany and Wang, Wei and Cardie, Claire (2019):
Multi-Source Cross-Lingual Model Transfer: Learning What to Share, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{chen-etal-2019-multi-source,
author = {Chen, Xilun and Awadallah, Ahmed Hassan and Hassan, Hany and Wang, Wei and Cardie, Claire},
title = {Multi-Source Cross-Lingual Model Transfer: Learning What to Share},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1299},
pages = {3098--3112},
year = 2019
}
Chen et al. (2019)
Ive, Julia and Madhyastha, Pranava and Specia, Lucia (2019):
Distilling Translations with Visual Awareness, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{ive-etal-2019-distilling,
author = {Ive, Julia and Madhyastha, Pranava and Specia, Lucia},
title = {Distilling Translations with Visual Awareness},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1653},
pages = {6525--6538},
year = 2019
}
Ive et al. (2019)
Kim, Yunsu and Gao, Yingbo and Ney, Hermann (2019):
Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{kim-etal-2019-effective,
author = {Kim, Yunsu and Gao, Yingbo and Ney, Hermann},
title = {Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1120},
pages = {1246--1257},
year = 2019
}
Kim et al. (2019)
Leng, Yichong and Tan, Xu and Qin, Tao and Li, Xiang-Yang and Liu, Tie-Yan (2019):
Unsupervised Pivot Translation for Distant Languages, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{leng-etal-2019-unsupervised,
author = {Leng, Yichong and Tan, Xu and Qin, Tao and Li, Xiang-Yang and Liu, Tie-Yan},
title = {Unsupervised Pivot Translation for Distant Languages},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1017},
pages = {175--183},
year = 2019
}
Leng et al. (2019)
Liu, Hairong and Ma, Mingbo and Huang, Liang and Xiong, Hao and He, Zhongjun (2019):
Robust Neural Machine Translation with Joint Textual and Phonetic Embedding, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{liu-etal-2019-robust,
author = {Liu, Hairong and Ma, Mingbo and Huang, Liang and Xiong, Hao and He, Zhongjun},
title = {Robust Neural Machine Translation with Joint Textual and Phonetic Embedding},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1291},
pages = {3044--3049},
year = 2019
}
Liu et al. (2019)
Sen, Sukanta and Gupta, Kamal Kumar and Ekbal, Asif and Bhattacharyya, Pushpak (2019):
Multilingual Unsupervised NMT using Shared Encoder and Language-Specific Decoders, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{sen-etal-2019-multilingual,
author = {Sen, Sukanta and Gupta, Kamal Kumar and Ekbal, Asif and Bhattacharyya, Pushpak},
title = {Multilingual Unsupervised {NMT} using Shared Encoder and Language-Specific Decoders},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1297},
pages = {3083--3089},
year = 2019
}
Sen et al. (2019)
Wang, Yining and Zhou, Long and Zhang, Jiajun and Zhai, Feifei and Xu, Jingfang and Zong, Chengqing (2019):
A Compact and Language-Sensitive Multilingual Translation Method, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{wang-etal-2019-compact,
author = {Wang, Yining and Zhou, Long and Zhang, Jiajun and Zhai, Feifei and Xu, Jingfang and Zong, Chengqing},
title = {A Compact and Language-Sensitive Multilingual Translation Method},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1117},
pages = {1213--1223},
year = 2019
}
Wang et al. (2019)
Wang, Xinyi and Neubig, Graham (2019):
Target Conditioned Sampling: Optimizing Data Selection for Multilingual Neural Machine Translation, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{wang-neubig-2019-target,
author = {Wang, Xinyi and Neubig, Graham},
title = {Target Conditioned Sampling: Optimizing Data Selection for Multilingual Neural Machine Translation},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1583},
pages = {5823--5828},
year = 2019
}
Wang and Neubig (2019)
Domhan, Tobias and Hieber, Felix (2017):
Using Target-side Monolingual Data for Neural Machine Translation through Multi-task Learning, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

@InProceedings{D17-1158,
author = {Domhan, Tobias and Hieber, Felix},
title = {Using Target-side Monolingual Data for Neural Machine Translation through Multi-task Learning},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},
publisher = {Association for Computational Linguistics},
pages = {1501--1506},
location = {Copenhagen, Denmark},
url = {
http://aclweb.org/anthology/D17-1158},
year = 2017
}
Domhan and Hieber (2017)
Multi-Lingual
Surafel Melaku Lakew and Quintino Francesco Lotito and Matteo Negri and Marco Turchi and Marcello Federico (2017):
Improving Zero-Shot Translation of Low-Resource Languages, Proceedings of the International Workshop on Spoken Language Translation (IWSLT)

@inproceedings{IWSLT2017:Lakew,
author = {Surafel Melaku Lakew and Quintino Francesco Lotito and Matteo Negri and Marco Turchi and Marcello Federico},
title = {Improving Zero-Shot Translation of Low-Resource Languages},
url = {
http://workshop2017.iwslt.org/downloads/O03-3-Paper.pdf},
booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)},
location = {Tokyo, Japan},
year = 2017
}
Lakew et al. (2017)
Zhou, Zhong and Sperber, Matthias and Waibel, Alex (2018):
Massively Parallel Cross-Lingual Learning in Low-Resource Target Language Translation, Proceedings of the Third Conference on Machine Translation: Research Papers

@inproceedings{W18-6324,
author = {Zhou, Zhong and Sperber, Matthias and Waibel, Alex},
title = {Massively Parallel Cross-Lingual Learning in Low-Resource Target Language Translation},
booktitle = {Proceedings of the Third Conference on Machine Translation: Research Papers},
month = {oct},
address = {Belgium, Brussels},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W18-6324},
pages = {232--243},
year = 2018
}
Zhou et al. (2018)
Gu, Jiatao and Wang, Yong and Cho, Kyunghyun and Li, Victor O.K. (2019):
Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{gu-etal-2019-improved,
author = {Gu, Jiatao and Wang, Yong and Cho, Kyunghyun and Li, Victor O.K.},
title = {Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1121},
pages = {1258--1268},
year = 2019
}
Gu et al. (2019)
Multi-Source, Multi-Target
Libovick\'y, Jindřich and Helcl, Jindřich and Mareček, David (2018):
Input Combination Strategies for Multi-Source Transformer Decoder, Proceedings of the Third Conference on Machine Translation: Research Papers

@inproceedings{W18-6326,
author = {Libovick{\'y}, Jind{\v{r}}ich and Helcl, Jind{\v{r}}ich and Mare{\v{c}}ek, David},
title = {Input Combination Strategies for Multi-Source Transformer Decoder},
booktitle = {Proceedings of the Third Conference on Machine Translation: Research Papers},
month = {oct},
address = {Belgium, Brussels},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W18-6326},
pages = {253--260},
year = 2018
}
Libovick\'y et al. (2018)
Multi-modal (speech, vision)
Caglayan, Ozan and Aransa, Walid and Wang, Yaxing and Masana, Marc and García-Martínez, Mercedes and Bougares, Fethi and Barrault, Loïc and van de Weijer, Joost (2016):
Does Multimodality Help Human and Machine for Translation and Image Captioning?, Proceedings of the First Conference on Machine Translation

@InProceedings{caglayan-EtAl:2016:WMT,
author = {Caglayan, Ozan and Aransa, Walid and Wang, Yaxing and Masana, Marc and Garc\'{i}a-Mart\'{i}nez, Mercedes and Bougares, Fethi and Barrault, Lo\"{i}c and van de Weijer, Joost},
title = {Does Multimodality Help Human and Machine for Translation and Image Captioning?},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {627--633},
url = {
http://www.aclweb.org/anthology/W/W16/W16-2358},
year = 2016
}
Caglayan et al. (2016)
Lala, Chiraag and Madhyastha, Pranava and Specia, Lucia (2019):
Grounded Word Sense Translation, Proceedings of the Second Workshop on Shortcomings in Vision and Language

@inproceedings{lala-etal-2019-grounded,
author = {Lala, Chiraag and Madhyastha, Pranava and Specia, Lucia},
title = {Grounded Word Sense Translation},
booktitle = {Proceedings of the Second Workshop on Shortcomings in Vision and Language},
month = {jun},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W19-1808},
pages = {78--85},
year = 2019
}
Lala et al. (2019)
Singhal, Karan and Raman, Karthik and ten Cate, Balder (2019):
Learning Multilingual Word Embeddings Using Image-Text Data, Proceedings of the Second Workshop on Shortcomings in Vision and Language

@inproceedings{singhal-etal-2019-learning,
author = {Singhal, Karan and Raman, Karthik and ten Cate, Balder},
title = {Learning Multilingual Word Embeddings Using Image-Text Data},
booktitle = {Proceedings of the Second Workshop on Shortcomings in Vision and Language},
month = {jun},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W19-1807},
pages = {68--77},
year = 2019
}
Singhal et al. (2019)
Dutta Chowdhury, Koel and Hasanuzzaman, Mohammed and Liu, Qun (2018):
Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data, Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP

@inproceedings{W18-3405,
author = {Dutta Chowdhury, Koel and Hasanuzzaman, Mohammed and Liu, Qun},
title = {Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data},
booktitle = {Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP},
month = {jul},
address = {Melbourne},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W18-3405},
pages = {33--42},
year = 2018
}
Chowdhury et al. (2018)
Shigehiko Schamoni and Julian Hitschler and Stefan Riezler (2018):
A Dataset and Reranking Method for Multimodal MT of User-Generated Image Captions, Annual Meeting of the Association for Machine Translation in the Americas (AMTA)

@inproceedings{AMTA2018-Schamoni,
author = {Shigehiko Schamoni and Julian Hitschler and Stefan Riezler},
title = {A Dataset and Reranking Method for Multimodal {MT} of User-Generated Image Captions},
booktitle = {Annual Meeting of the Association for Machine Translation in the Americas (AMTA)},
location = {Boston, USA},
year = 2018
}
Schamoni et al. (2018)
Shah, Kashif and Wang, Josiah and Specia, Lucia (2016):
SHEF-Multimodal: Grounding Machine Translation on Images, Proceedings of the First Conference on Machine Translation

@InProceedings{shah-wang-specia:2016:WMT,
author = {Shah, Kashif and Wang, Josiah and Specia, Lucia},
title = {SHEF-Multimodal: Grounding Machine Translation on Images},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {660--665},
url = {
http://www.aclweb.org/anthology/W/W16/W16-2363},
year = 2016
}
Shah et al. (2016)
Elliott, Desmond and Kádár, \'Akos (2017):
Imagination Improves Multimodal Translation, Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

@inproceedings{elliott-kadar-2017-imagination,
author = {Elliott, Desmond and K{\'a}d{\'a}r, {\'A}kos},
title = {Imagination Improves Multimodal Translation},
booktitle = {Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
month = {nov},
address = {Taipei, Taiwan},
publisher = {Asian Federation of Natural Language Processing},
url = {
https://www.aclweb.org/anthology/I17-1014},
pages = {130--141},
year = 2017
}
Elliott and Kádár (2017)
Delbrouck, Jean-Benoit and Dupont, Stéphane (2017):
An empirical study on the effectiveness of images in Multimodal Neural Machine Translation, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

@InProceedings{D17-1096,
author = {Delbrouck, Jean-Benoit and Dupont, St{\'e}phane},
title = {An empirical study on the effectiveness of images in Multimodal Neural Machine Translation},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},
publisher = {Association for Computational Linguistics},
pages = {921--930},
location = {Copenhagen, Denmark},
url = {
http://aclweb.org/anthology/D17-1096},
year = 2017
}
Delbrouck and Dupont (2017)
Calixto, Iacer and Liu, Qun (2017):
Incorporating Global Visual Features into Attention-based Neural Machine Translation., Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

@InProceedings{D17-1106,
author = {Calixto, Iacer and Liu, Qun},
title = {Incorporating Global Visual Features into Attention-based Neural Machine Translation.},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},
publisher = {Association for Computational Linguistics},
pages = {1003--1014},
location = {Copenhagen, Denmark},
url = {
http://aclweb.org/anthology/D17-1106},
year = 2017
}
Calixto and Liu (2017)
Hitschler, Julian and Schamoni, Shigehiko and Riezler, Stefan (2016):
Multimodal Pivots for Image Caption Translation, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

@InProceedings{hitschler-schamoni-riezler:2016:P16-1,
author = {Hitschler, Julian and Schamoni, Shigehiko and Riezler, Stefan},
title = {Multimodal Pivots for Image Caption Translation},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {2399--2409},
url = {
http://www.aclweb.org/anthology/P16-1227},
year = 2016
}
Hitschler et al. (2016)
Calixto, Iacer and Liu, Qun and Campbell, Nick (2017):
Doubly-Attentive Decoder for Multi-modal Neural Machine Translation, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

@InProceedings{calixto-liu-campbell:2017:Long,
author = {Calixto, Iacer and Liu, Qun and Campbell, Nick},
title = {Doubly-Attentive Decoder for Multi-modal Neural Machine Translation},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {July},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
pages = {1913--1924},
url = {
http://aclweb.org/anthology/P17-1175},
year = 2017
}
Calixto et al. (2017)
Hewitt, John and Ippolito, Daphne and Callahan, Brendan and Kriz, Reno and Wijaya, Derry Tanti and Callison-Burch, Chris (2018):
Learning Translations via Images with a Massively Multilingual Image Dataset, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

@InProceedings{P18-1239,
author = {Hewitt, John and Ippolito, Daphne and Callahan, Brendan and Kriz, Reno and Wijaya, Derry Tanti and Callison-Burch, Chris},
title = {Learning Translations via Images with a Massively Multilingual Image Dataset},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
publisher = {Association for Computational Linguistics},
pages = {2566--2576},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/P18-1239},
year = 2018
}
Hewitt et al. (2018)
Zhou, Mingyang and Cheng, Runxiang and Lee, Yong Jae and Yu, Zhou (2018):
A Visual Attention Grounding Neural Model for Multimodal Machine Translation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1400,
author = {Zhou, Mingyang and Cheng, Runxiang and Lee, Yong Jae and Yu, Zhou},
title = {A Visual Attention Grounding Neural Model for Multimodal Machine Translation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1400},
pages = {3643--3653},
year = 2018
}
Zhou et al. (2018)
Multi-Task
Kiperwasser, Eliyahu and Ballesteros, Miguel (2018):
Scheduled Multi-Task Learning: From Syntax to Translation, Transactions of the Association for Computational Linguistics

@article{Q18-1017,
author = {Kiperwasser, Eliyahu and Ballesteros, Miguel},
title = {Scheduled Multi-Task Learning: From Syntax to Translation},
journal = {Transactions of the Association for Computational Linguistics},
volume = {6},
url = {
https://www.aclweb.org/anthology/Q18-1017},
pages = {225--240},
year = 2018
}
Kiperwasser and Ballesteros (2018)