Inference
The task of applying a trained model to generate a translation is called inference in machine learning, or more commonly decoding in machine translation. This problem is typically solved by beam search.
Inference is the main subject of 65 publications. 56 are discussed here.
Publications
Beam Search Refinements
Xiaoguang Hu and Wei Li and Xiang Lan and Hua Wu and Haifeng Wang (2015):
Improved beam search with constrained softmax for NMT, Machine Translation Summit XV

@inproceedings{MTS2015-Hu,
author = {Xiaoguang Hu and Wei Li and Xiang Lan and Hua Wu and Haifeng Wang},
title = {Improved beam search with constrained softmax for NMT},
url = {
http://www.mt-archive.info/15/MTS-2015-Hu.pdf},
pages = {297--309},
booktitle = {Machine Translation Summit XV},
year = 2015
}
Hu et al. (2015) modify search in two ways. Instead of expand all hypotheses of a stack of maximum size N, expand only the one best hypothesis from any stack at a time. To avoid only expanding the shortest hypothesis, a brevity penalty is introduced. Similarly,
Shu, Raphael and Nakayama, Hideki (2018):
Improving Beam Search by Removing Monotonic Constraint for Neural Machine Translation, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

@InProceedings{P18-2054,
author = {Shu, Raphael and Nakayama, Hideki},
title = {Improving Beam Search by Removing Monotonic Constraint for Neural Machine Translation},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
publisher = {Association for Computational Linguistics},
pages = {339--344},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/P18-2054},
year = 2018
}
Shu and Nakayama (2018) do not generate a fixed number of hypotheses for each partial translation length, but instead organize hypotheses in a single priority queue regardless of their length. They also use a length penalty (called progress penalty) and a prediction of the output length.
Freitag, Markus and Al-Onaizan, Yaser (2017):
Beam Search Strategies for Neural Machine Translation, Proceedings of the First Workshop on Neural Machine Translation

@InProceedings{freitag-alonaizan:2017:NMT,
author = {Freitag, Markus and Al-Onaizan, Yaser},
title = {Beam Search Strategies for Neural Machine Translation},
booktitle = {Proceedings of the First Workshop on Neural Machine Translation},
month = {August},
address = {Vancouver},
publisher = {Association for Computational Linguistics},
pages = {56--60},
url = {
http://www.aclweb.org/anthology/W17-3207},
year = 2017
}
Freitag and Al-Onaizan (2017) introduce threshold pruning to neural machine translation. Discarding hypotheses whose score falls below a certain fraction of the best score are discarded, showing faster decoding while maintaining quality.
Zhang, Zhisong and Wang, Rui and Utiyama, Masao and Sumita, Eiichiro and Zhao, Hai (2018):
Exploring Recombination for Efficient Decoding of Neural Machine Translation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1511,
author = {Zhang, Zhisong and Wang, Rui and Utiyama, Masao and Sumita, Eiichiro and Zhao, Hai},
title = {Exploring Recombination for Efficient Decoding of Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1511},
pages = {4785--4790},
year = 2018
}
Zhang et al. (2018) explore recombination, a well known technique in statistical machine translation decoding. They merge hypotheses that share the most recent output words and that are of similar length, thus speeding up decoding and reaching higher quality with fixed beam size.
Zhang, Wen and Huang, Liang and Feng, Yang and Shen, Lei and Liu, Qun (2018):
Speeding Up Neural Machine Translation Decoding by Cube Pruning, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1460,
author = {Zhang, Wen and Huang, Liang and Feng, Yang and Shen, Lei and Liu, Qun},
title = {Speeding Up Neural Machine Translation Decoding by Cube Pruning},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1460},
pages = {4284--4294},
year = 2018
}
Zhang et al. (2018) apply the idea of cube pruning to neural model decoding, by grouping together hypotheses with the same last output word into so-called "sub cubes". States are expanded sequentially, starting with the highest scoring hypothesis from the highest scoring sub cube, thus obtaining probabilities for subsequent hypotheses. For some hypotheses, states are not expanded when it is no promising new hypotheses would be generated from them.
One problem with beam search is that larger beam sizes lead to earlier generation of the end-of-sentence symbol and thus shorter translations.
Kikuchi, Yuta and Neubig, Graham and Sasano, Ryohei and Takamura, Hiroya and Okumura, Manabu (2016):
Controlling Output Length in Neural Encoder-Decoders, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

@InProceedings{kikuchi-EtAl:2016:EMNLP2016,
author = {Kikuchi, Yuta and Neubig, Graham and Sasano, Ryohei and Takamura, Hiroya and Okumura, Manabu},
title = {Controlling Output Length in Neural Encoder-Decoders},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {1328--1338},
url = {
https://aclweb.org/anthology/D16-1140},
year = 2016
}
Kikuchi et al. (2016) force the decoder to produce translations within a pre-specified length range by ignoring other completed hypothesis. They also add a length embedding as an additional input feature to the decoder state progression.
Wei He and Zhongjun He and Hua Wu and Haifeng Wang (2016):
Improved Neural Machine Translation with SMT Features, Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence

@inproceedings{He-aaai16,
author = {Wei He and Zhongjun He and Hua Wu and Haifeng Wang},
title = {Improved Neural Machine Translation with {SMT} Features},
booktitle = {Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence},
pages = {151-157},
year = 2016
}
He et al. (2016) add a word bonus for each generated word (they also propose to add lexical translation probabilities and an n-gram language model).
Murray, Kenton and Chiang, David (2018):
Correcting Length Bias in Neural Machine Translation, Proceedings of the Third Conference on Machine Translation: Research Papers

@inproceedings{W18-6322,
author = {Murray, Kenton and Chiang, David},
title = {Correcting Length Bias in Neural Machine Translation},
booktitle = {Proceedings of the Third Conference on Machine Translation: Research Papers},
month = {oct},
address = {Belgium, Brussels},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W18-6322},
pages = {212--223},
year = 2018
}
Murray and Chiang (2018) learn the optimal value for this word bonus.
Huang, Liang and Zhao, Kai and Ma, Mingbo (2017):
When to Finish? Optimal Beam Search for Neural Text Generation (modulo beam size), Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

@inproceedings{huang-etal-2017-finish,
author = {Huang, Liang and Zhao, Kai and Ma, Mingbo},
title = {When to Finish? Optimal Beam Search for Neural Text Generation (modulo beam size)},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},
month = {sep},
address = {Copenhagen, Denmark},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D17-1227},
doi = {10.18653/v1/D17-1227},
pages = {2134--2139},
year = 2017
}
Huang et al. (2017) add a bounded word reward that boosts hypothesis length up to an expected optimal length.
Yang, Yilin and Huang, Liang and Ma, Mingbo (2018):
Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1342,
author = {Yang, Yilin and Huang, Liang and Ma, Mingbo},
title = {Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1342},
pages = {3054--3059},
year = 2018
}
Yang et al. (2018) refine this reward and also change the stopping criteria for beam search so that sufficiently many long translations are generated.
Stochastic Search
Monte-Carlo decoding was used by
Ott, Myle and Auli, Michael and Grangier, David and Ranzato, Marc'Aurelio (2018):
Analyzing Uncertainty in Neural Machine Translation, Proceedings of the 35th International Conference on Machine Learning
mentioned in Corpus Cleaning, Inference and Analysis And Visualization@InProceedings{pmlr-v80-ott18a,
author = {Ott, Myle and Auli, Michael and Grangier, David and Ranzato, Marc'Aurelio},
title = {Analyzing Uncertainty in Neural Machine Translation},
booktitle = {Proceedings of the 35th International Conference on Machine Learning},
pages = {3956--3965},
editor = {Dy, Jennifer and Krause, Andreas},
volume = {80},
series = {Proceedings of Machine Learning Research},
address = {Stockholmsmässan, Stockholm Sweden},
month = {10--15 Jul},
publisher = {PMLR},
url = {
http://proceedings.mlr.press/v80/ott18a/ott18a.pdf},
year = 2018
}
Ott et al. (2018) to analyze the search space and by
Edunov, Sergey and Ott, Myle and Auli, Michael and Grangier, David (2018):
Understand ing Back-Translation at Scale, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
mentioned in Inference and Monolingual Data@inproceedings{D18-1045,
author = {Edunov, Sergey and Ott, Myle and Auli, Michael and Grangier, David},
title = {Understand ing Back-Translation at Scale},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1045},
pages = {489--500},
year = 2018
}
Edunov et al. (2018) for back-translation.
Greedy Search
Kyunghyun Cho (2016):
Noisy Parallel Approximate Decoding for Conditional Recurrent Language Model, CoRR

@article{DBLP:journals/corr/Cho16,
author = {Kyunghyun Cho},
title = {Noisy Parallel Approximate Decoding for Conditional Recurrent Language Model},
journal = {CoRR},
volume = {abs/1605.03835},
url = {
http://arxiv.org/abs/1605.03835},
archiveprefix = {arXiv},
eprint = {1605.03835},
timestamp = {Mon, 13 Aug 2018 16:48:55 +0200},
biburl = {
https://dblp.org/rec/bib/journals/corr/Cho16},
bibsource = {dblp computer science bibliography,
https://dblp.org},
year = 2016
}
Cho (2016) proposes a variant of greedy decoding where noise is added to the hidden state of the decoder. Multiple passes are performed with different random noise and picking the translation with the highest probability assigned by the non-noisy model.
Gu, Jiatao and Cho, Kyunghyun and Li, Victor O.K. (2017):
Trainable Greedy Decoding for Neural Machine Translation, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

@InProceedings{D17-1209,
author = {Gu, Jiatao and Cho, Kyunghyun and Li, Victor O.K.},
title = {Trainable Greedy Decoding for Neural Machine Translation},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},
publisher = {Association for Computational Linguistics},
pages = {1958--1968},
location = {Copenhagen, Denmark},
url = {
http://aclweb.org/anthology/D17-1210},
year = 2017
}
Gu et al. (2017) build on this idea to develop a trainable greedy decoding method. Instead of a noise term, they learn an adjustment term that is optimized on sentence-level translation quality (as measured by BLEU) using reinforcement learning.
Fast Decoding:
Devlin, Jacob (2017):
Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
mentioned in Neural Network Models and Inference@InProceedings{D17-1299,
author = {Devlin, Jacob},
title = {Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},
publisher = {Association for Computational Linguistics},
pages = {2810--2815},
location = {Copenhagen, Denmark},
url = {
http://aclweb.org/anthology/D17-1300},
year = 2017
}
Devlin (2017) obtain speed-ups with pre-computation and use of 16-bit floating point operations.
Zhang, Wen and Huang, Liang and Feng, Yang and Shen, Lei and Liu, Qun (2018):
Speeding Up Neural Machine Translation Decoding by Cube Pruning, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1460,
author = {Zhang, Wen and Huang, Liang and Feng, Yang and Shen, Lei and Liu, Qun},
title = {Speeding Up Neural Machine Translation Decoding by Cube Pruning},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1460},
pages = {4284--4294},
year = 2018
}
Zhang et al. (2018) remove the normalization in the softmax output word prediction, after adjusting the training objective to perform self-normalization.
Hoang, Hieu and Dwojak, Tomasz and Krislauks, Rihards and Torregrosa, Daniel and Heafield, Kenneth (2018):
Fast Neural Machine Translation Implementation, Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

@InProceedings{W18-2714,
author = {Hoang, Hieu and Dwojak, Tomasz and Krislauks, Rihards and Torregrosa, Daniel and Heafield, Kenneth},
title = {Fast Neural Machine Translation Implementation},
booktitle = {Proceedings of the 2nd Workshop on Neural Machine Translation and Generation},
publisher = {Association for Computational Linguistics},
pages = {116--121},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/W18-2714},
year = 2018
}
Hoang et al. (2018) speed up decoding by batching several input sentences, refined k-best extraction with specialized GPU kernel functions, and use of 16-bit floating point operations.
Iglesias, Gonzalo and Tambellini, William and Gispert, Adrià and Hasler, Eva and Byrne, Bill (2018):
Accelerating NMT Batched Beam Decoding with LMBR Posteriors for Deployment, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

@InProceedings{N18-3013,
author = {Iglesias, Gonzalo and Tambellini, William and Gispert, Adri{\`a} and Hasler, Eva and Byrne, Bill},
title = {Accelerating {NMT} Batched Beam Decoding with LMBR Posteriors for Deployment},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)},
publisher = {Association for Computational Linguistics},
pages = {106--113},
location = {New Orleans - Louisiana},
url = {
http://aclweb.org/anthology/N18-3013},
year = 2018
}
Iglesias et al. (2018) also show improvements with such batching.
Argueta, Arturo and Chiang, David (2019):
Accelerating Sparse Matrix Operations in Neural Networks on Graphics Processing Units, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{argueta-chiang-2019-accelerating,
author = {Argueta, Arturo and Chiang, David},
title = {Accelerating Sparse Matrix Operations in Neural Networks on Graphics Processing Units},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1626},
pages = {6215--6224},
year = 2019
}
Argueta and Chiang (2019) fuse the softmax and k-best extraction computation.
Senellart, Jean and Zhang, Dakun and WANG, Bo and KLEIN, Guillaume and Ramatchandirin, Jean-Pierre and Crego, Josep and Rush, Alexander (2018):
OpenNMT System Description for WNMT 2018: 800 words/sec on a single-core CPU, Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

@InProceedings{W18-2715,
author = {Senellart, Jean and Zhang, Dakun and WANG, Bo and KLEIN, Guillaume and Ramatchandirin, Jean-Pierre and Crego, Josep and Rush, Alexander},
title = {OpenNMT System Description for WNMT 2018: 800 words/sec on a single-core CPU},
booktitle = {Proceedings of the 2nd Workshop on Neural Machine Translation and Generation},
publisher = {Association for Computational Linguistics},
pages = {122--128},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/W18-2715},
year = 2018
}
Senellart et al. (2018) build a smaller model with knowledge distillation that allows faster decoding.
Limiting Hypothesis Generation
Xiaoguang Hu and Wei Li and Xiang Lan and Hua Wu and Haifeng Wang (2015):
Improved beam search with constrained softmax for NMT, Machine Translation Summit XV

@inproceedings{MTS2015-Hu,
author = {Xiaoguang Hu and Wei Li and Xiang Lan and Hua Wu and Haifeng Wang},
title = {Improved beam search with constrained softmax for NMT},
url = {
http://www.mt-archive.info/15/MTS-2015-Hu.pdf},
pages = {297--309},
booktitle = {Machine Translation Summit XV},
year = 2015
}
Hu et al. (2015) limit the computation of translation probabilities to words that are in the phrase table of a traditional phrase-based model for the input sentence, leading to several fold speed-ups at little loss in quality. Extending this work,
Mi, Haitao and Wang, Zhiguo and Ittycheriah, Abe (2016):
Vocabulary Manipulation for Neural Machine Translation, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

@InProceedings{mi-wang-ittycheriah:2016:P16-2,
author = {Mi, Haitao and Wang, Zhiguo and Ittycheriah, Abe},
title = {Vocabulary Manipulation for Neural Machine Translation},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {124--129},
url = {
http://anthology.aclweb.org/P16-2021},
year = 2016
}
Mi et al. (2016) also consider top word translations and the most frequent words in the vocabulary filter for the prediction computation.
Shi, Xing and Knight, Kevin (2017):
Speeding Up Neural Machine Translation Decoding by Shrinking Run-time Vocabulary, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

@InProceedings{shi-knight:2017:Short,
author = {Shi, Xing and Knight, Kevin},
title = {Speeding Up Neural Machine Translation Decoding by Shrinking Run-time Vocabulary},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
month = {July},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
pages = {574--579},
url = {
http://aclweb.org/anthology/P17-2091},
year = 2017
}
Shi and Knight (2017) use dictionaries obtained from statistical alignment models and (unsuccessfully) Locality Sensitive Hashing to speed up decoding.
Limiting Search Space
Compared to statistical machine translation, neural machine translation may be less adequate, even if more fluent. In other words, the translation may diverge from the input in various ways, such as not translating part of the sentence or generating un-related output words.
Zhang, Jingyi and Utiyama, Masao and Sumita, Eiichro and Neubig, Graham and Nakamura, Satoshi (2017):
Improving Neural Machine Translation through Phrase-based Forced Decoding, Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

@InProceedings{zhang-EtAl:2017:I17-12,
author = {Zhang, Jingyi and Utiyama, Masao and Sumita, Eiichro and Neubig, Graham and Nakamura, Satoshi},
title = {Improving Neural Machine Translation through Phrase-based Forced Decoding},
booktitle = {Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
month = {November},
address = {Taipei, Taiwan},
publisher = {Asian Federation of Natural Language Processing},
pages = {152--162},
url = {
http://www.aclweb.org/anthology/I17-1016},
year = 2017
}
Zhang et al. (2017) propose to limit the search space of the neural decoder to the search graph generated by a phrase-based system.
Khayrallah, Huda and Kumar, Gaurav and Duh, Kevin and Post, Matt and Koehn, Philipp (2017):
Neural Lattice Search for Domain Adaptation in Machine Translation, Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

@InProceedings{khayrallah-EtAl:2017:I17-2,
author = {Khayrallah, Huda and Kumar, Gaurav and Duh, Kevin and Post, Matt and Koehn, Philipp},
title = {Neural Lattice Search for Domain Adaptation in Machine Translation},
booktitle = {Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
month = {November},
address = {Taipei, Taiwan},
publisher = {Asian Federation of Natural Language Processing},
pages = {20--25},
url = {
http://www.aclweb.org/anthology/I17-2004},
year = 2017
}
Khayrallah et al. (2017) extend this to the search lattice.
Reranking
Niehues, Jan and Cho, Eunah and Ha, Thanh-Le and Waibel, Alex (2017):
Analyzing Neural MT Search and Model Performance, Proceedings of the First Workshop on Neural Machine Translation

@InProceedings{niehues-EtAl:2017:NMT,
author = {Niehues, Jan and Cho, Eunah and Ha, Thanh-Le and Waibel, Alex},
title = {Analyzing Neural {MT} Search and Model Performance},
booktitle = {Proceedings of the First Workshop on Neural Machine Translation},
month = {August},
address = {Vancouver},
publisher = {Association for Computational Linguistics},
pages = {11--17},
url = {
http://www.aclweb.org/anthology/W17-3202},
year = 2017
}
Niehues et al. (2017) explore the search space considered during decoding. While they find that decoding makes very few search errors, better translation results could be obtained by picking other translations considered during beam search. Similarly,
Frédéric Blain and Lucia Specia and Pranava Madhyastha (2017):
Exploring Hypotheses Spaces in Neural Machine Translation, Machine Translation Summit XVI

@inproceedings{mtsummit2017:Blain,
author = {Fr{\'e}d{\'e}ric Blain and Lucia Specia and Pranava Madhyastha},
title = {Exploring Hypotheses Spaces in Neural Machine Translation},
booktitle = {Machine Translation Summit XVI},
location = {Nagoya, Japan},
url = {
http://www.doc.ic.ac.uk/~pshantha/papers/mtsummit17.pdf},
year = 2017
}
Blain et al. (2017) observe that very large beam sizes hurt 1-best decoding but generate higher scoring translations in the n-best list
Liu, Lemao and Utiyama, Masao and Finch, Andrew and Sumita, Eiichiro (2016):
Agreement on Target-bidirectional Neural Machine Translation, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

@InProceedings{liu-EtAl:2016:N16-11,
author = {Liu, Lemao and Utiyama, Masao and Finch, Andrew and Sumita, Eiichiro},
title = {Agreement on Target-bidirectional Neural Machine Translation},
booktitle = {Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
month = {June},
address = {San Diego, California},
publisher = {Association for Computational Linguistics},
pages = {411--416},
url = {
http://www.aclweb.org/anthology/N16-1046},
year = 2016
}
Liu et al. (2016) rerank the n-best list by training a model that generates the output starting with the last word of the sentence, called left-to-right decoding. Their approach was successfully used by
Sennrich, Rico and Haddow, Barry and Birch, Alexandra (2016):
Edinburgh Neural Machine Translation Systems for WMT 16, Proceedings of the First Conference on Machine Translation
mentioned in Neural Network Models and Inference@InProceedings{sennrich-haddow-birch:2016:WMT,
author = {Sennrich, Rico and Haddow, Barry and Birch, Alexandra},
title = {Edinburgh Neural Machine Translation Systems for WMT 16},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {371--376},
url = {
http://www.aclweb.org/anthology/W/W16/W16-2323},
year = 2016
}
Sennrich et al. (2016) in their winning system in the WMT 2016 shared task.
Hoang, Cong Duy Vu and Haffari, Gholamreza and Cohn, Trevor (2017):
Towards Decoding as Continuous Optimisation in Neural Machine Translation, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

@InProceedings{D17-1014,
author = {Hoang, Cong Duy Vu and Haffari, Gholamreza and Cohn, Trevor},
title = {Towards Decoding as Continuous Optimisation in Neural Machine Translation},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},
publisher = {Association for Computational Linguistics},
pages = {146--156},
location = {Copenhagen, Denmark},
url = {
http://aclweb.org/anthology/D17-1014},
year = 2017
}
Hoang et al. (2017) propose using a model trained in the inverse translation direction and a language model.
Jiwei Li and Dan Jurafsky (2016):
Mutual Information and Diverse Decoding Improve Neural Machine Translation, CoRR

@article{DBLP:journals/corr/LiJ16,
author = {Jiwei Li and Dan Jurafsky},
title = {Mutual Information and Diverse Decoding Improve Neural Machine Translation},
journal = {CoRR},
volume = {abs/1601.00372},
url = {
http://arxiv.org/abs/1601.00372},
archiveprefix = {arXiv},
eprint = {1601.00372},
year = 2016
}
Li and Jurafsky (2016) generate more diverse n-best lists by adding a bias term to penalize too many expansions of a single hypothesis. In a refinement,
Jiwei Li and Will Monroe and Dan Jurafsky (2016):
A Simple, Fast Diverse Decoding Algorithm for Neural Generation, CoRR

@article{DBLP:journals/corr/LiMJ16,
author = {Jiwei Li and Will Monroe and Dan Jurafsky},
title = {A Simple, Fast Diverse Decoding Algorithm for Neural Generation},
journal = {CoRR},
volume = {abs/1611.08562},
url = {
http://arxiv.org/abs/1611.08562},
archiveprefix = {arXiv},
eprint = {1611.08562},
timestamp = {Mon, 13 Aug 2018 16:48:46 +0200},
biburl = {
https://dblp.org/rec/bib/journals/corr/LiMJ16},
bibsource = {dblp computer science bibliography,
https://dblp.org},
year = 2016
}
Li et al. (2016) learn the diversity rate with reinforcement learning, using as reward the generation of n-best lists that yield better translation quality after reranking.
Stahlberg, Felix and de Gispert, Adrià and Hasler, Eva and Byrne, Bill (2017):
Neural Machine Translation by Minimising the Bayes-risk with Respect to Syntactic Translation Lattices, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

@InProceedings{stahlberg-EtAl:2017:EACLshort,
author = {Stahlberg, Felix and de Gispert, Adri\`{a} and Hasler, Eva and Byrne, Bill},
title = {Neural Machine Translation by Minimising the Bayes-risk with Respect to Syntactic Translation Lattices},
booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers},
month = {April},
address = {Valencia, Spain},
publisher = {Association for Computational Linguistics},
pages = {362--368},
url = {
http://www.aclweb.org/anthology/E17-2058},
year = 2017
}
Stahlberg et al. (2017) use minimum Bayes risk to rerank decoding lattices. This method also allows the combination of SMT and NMT search graphs.
Iglesias, Gonzalo and Tambellini, William and Gispert, Adrià and Hasler, Eva and Byrne, Bill (2018):
Accelerating NMT Batched Beam Decoding with LMBR Posteriors for Deployment, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

@InProceedings{N18-3013,
author = {Iglesias, Gonzalo and Tambellini, William and Gispert, Adri{\`a} and Hasler, Eva and Byrne, Bill},
title = {Accelerating {NMT} Batched Beam Decoding with LMBR Posteriors for Deployment},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)},
publisher = {Association for Computational Linguistics},
pages = {106--113},
location = {New Orleans - Louisiana},
url = {
http://aclweb.org/anthology/N18-3013},
year = 2018
}
Iglesias et al. (2018) show gains for minimum Bayes risk decoding for the Transformer model.
Niehues, Jan and Cho, Eunah and Ha, Thanh-Le and Waibel, Alex (2016):
Pre-Translation for Neural Machine Translation, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

@InProceedings{niehues-EtAl:2016:COLING,
author = {Niehues, Jan and Cho, Eunah and Ha, Thanh-Le and Waibel, Alex},
title = {Pre-Translation for Neural Machine Translation},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {1828--1836},
url = {
http://aclweb.org/anthology/C16-1172},
year = 2016
}
Niehues et al. (2016) attach the phrase-based translation to the input sentence and feed that into a neural machine translation decoder.
Geng, Xinwei and Feng, Xiaocheng and Qin, Bing and Liu, Ting (2018):
Adaptive Multi-pass Decoder for Neural Machine Translation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1048,
author = {Geng, Xinwei and Feng, Xiaocheng and Qin, Bing and Liu, Ting},
title = {Adaptive Multi-pass Decoder for Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1048},
pages = {523--532},
year = 2018
}
Geng et al. (2018) extend this idea to multi-pass decoding. The output of a regular decoding pass is then used as additional input to a second decoding pass. This process is iterated for a fixed number of steps, or stopped based on the decision of a so-called policy network.
Zhou, Long and Hu, Wenpeng and Zhang, Jiajun and Zong, Chengqing (2017):
Neural System Combination for Machine Translation, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
mentioned in Inference and Multilingual Multimodal Multitask@InProceedings{zhou-EtAl:2017:Short1,
author = {Zhou, Long and Hu, Wenpeng and Zhang, Jiajun and Zong, Chengqing},
title = {Neural System Combination for Machine Translation},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
month = {July},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
pages = {378--384},
url = {
http://aclweb.org/anthology/P17-2060},
year = 2017
}
Zhou et al. (2017) propose a system combination method that combines the output of different translation systems (such as NMT and variants of SMT) that takes the form of multi-source decoding, i.e., using multiple encoders, one for each system output, feeding into a single decoder that produces the consensus output.
Decoding Constraints
In practical deployment of machine translation, there is often a need to override model predictions with pre-specified word or phrase translations, for instance to enforce required terminology or to support external components.
Chatterjee, Rajen and Negri, Matteo and Turchi, Marco and Federico, Marcello and Specia, Lucia and Blain, Frédéric (2017):
Guiding Neural Machine Translation Decoding with External Knowledge, Proceedings of the Second Conference on Machine Translation, Volume 1: Research Paper

@InProceedings{chatterjee-EtAl:2017:WMT1,
author = {Chatterjee, Rajen and Negri, Matteo and Turchi, Marco and Federico, Marcello and Specia, Lucia and Blain, Fr\'{e}d\'{e}ric},
title = {Guiding Neural Machine Translation Decoding with External Knowledge},
booktitle = {Proceedings of the Second Conference on Machine Translation, Volume 1: Research Paper},
month = {September},
address = {Copenhagen, Denmark},
publisher = {Association for Computational Linguistics},
pages = {157--168},
url = {
http://www.aclweb.org/anthology/W17-4716},
year = 2017
}
Chatterjee et al. (2017) allow the specification of pre-defined translations for certain input words and modify the decoder to use them, based on input word attention.
Hokamp, Chris and Liu, Qun (2017):
Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

@InProceedings{hokamp-liu:2017:Long,
author = {Hokamp, Chris and Liu, Qun},
title = {Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {July},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
pages = {1535--1546},
url = {
http://aclweb.org/anthology/P17-1141},
year = 2017
}
Hokamp and Liu (2017) modify the decoding algorithm to force the decoder to produce certain specified output strings. Each time such one of the output strings is produced, hypotheses are placed into a different beam, and final translations are picked from the beam that contains hypotheses that produced all specified output. Related to this idea,
Anderson, Peter and Fernando, Basura and Johnson, Mark and Gould, Stephen (2017):
Guided Open Vocabulary Image Captioning with Constrained Beam Search, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

@InProceedings{anderson-EtAl:2017:EMNLP2017,
author = {Anderson, Peter and Fernando, Basura and Johnson, Mark and Gould, Stephen},
title = {Guided Open Vocabulary Image Captioning with Constrained Beam Search},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},
month = {September},
address = {Copenhagen, Denmark},
publisher = {Association for Computational Linguistics},
pages = {936--945},
url = {
https://www.aclweb.org/anthology/D17-1098},
year = 2017
}
Anderson et al. (2017) mark hypotheses with states in a finite state machines that indicate the subset of constraints (pre-specified translatins) that have been satisfied.
Hasler, Eva and Gispert, Adrià and Iglesias, Gonzalo and Byrne, Bill (2018):
Neural Machine Translation Decoding with Terminology Constraints, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

@InProceedings{N18-2081,
author = {Hasler, Eva and Gispert, Adri{\`a} and Iglesias, Gonzalo and Byrne, Bill},
title = {Neural Machine Translation Decoding with Terminology Constraints},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)},
publisher = {Association for Computational Linguistics},
pages = {506--512},
location = {New Orleans, Louisiana},
url = {
http://aclweb.org/anthology/N18-2081},
year = 2018
}
Hasler et al. (2018) refine this approach by using a linear (not exponential) number of constraint satisfaction states, and also remove attention from words whose constraints have been satisfied.
Post, Matt and Vilar, David (2018):
Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

@InProceedings{N18-1119,
author = {Post, Matt and Vilar, David},
title = {Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)},
publisher = {Association for Computational Linguistics},
pages = {1314--1324},
location = {New Orleans, Louisiana},
url = {
http://aclweb.org/anthology/N18-1119},
year = 2018
}
Post and Vilar (2018) split up the beam into sub beams, instead of duplicating beams to prevent increase in decoding time for sentences with such constraints.
Hu, J. Edward and Khayrallah, Huda and Culkin, Ryan and Xia, Patrick and Chen, Tongfei and Post, Matt and Van Durme, Benjamin (2019):
Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

@inproceedings{hu-etal-2019-improved,
author = {Hu, J. Edward and Khayrallah, Huda and Culkin, Ryan and Xia, Patrick and Chen, Tongfei and Post, Matt and Van Durme, Benjamin},
title = {Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting},
booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
month = {jun},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/N19-1090},
pages = {839--850},
year = 2019
}
Hu et al. (2019) extends this work with a trie structure to encode constraints, thus improving the handling of constraints that start with the same words, and also improve batching.
Song, Kai and Zhang, Yue and Yu, Heng and Luo, Weihua and Wang, Kun and Zhang, Min (2019):
Code-Switching for Enhancing NMT with Pre-Specified Translation, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

@inproceedings{song-etal-2019-code,
author = {Song, Kai and Zhang, Yue and Yu, Heng and Luo, Weihua and Wang, Kun and Zhang, Min},
title = {Code-Switching for Enhancing {NMT} with Pre-Specified Translation},
booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
month = {jun},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/N19-1044},
pages = {449--459},
year = 2019
}
Song et al. (2019) replace the words with their specified translations in the input and aid the translation of such code-switched data with a pointer network that handles the copying of the specified translations.
Dinu, Georgiana and Mathur, Prashant and Federico, Marcello and Al-Onaizan, Yaser (2019):
Training Neural Machine Translation to Apply Terminology Constraints, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{dinu-etal-2019-training,
author = {Dinu, Georgiana and Mathur, Prashant and Federico, Marcello and Al-Onaizan, Yaser},
title = {Training Neural Machine Translation to Apply Terminology Constraints},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1294},
pages = {3063--3068},
year = 2019
}
Dinu et al. (2019) also present the specified translations as input, but in addition to the original source words, and using a source factor to label input tokens according to the three classes: regular input word, input word with specified translation, and specified translation.
Simultaneous Translation
Integrating speech recognition and machine translation for the real-time translation of spoken language, requires decoding algorithms that operate on an incoming stream of input words and the production of translations for them before the input sentence is complete, as much as that is possible.
Harsh Satija and Joelle Pineau (2016):
Simultaneous Machine Translation using Deep Reinforcement Learning, Abstraction in Reinforcement Learning (ICML Workshop)

@inproceedings{Satija-ICML-2016,
author = {Harsh Satija and Joelle Pineau},
title = {Simultaneous Machine Translation using Deep Reinforcement Learning},
booktitle = {Abstraction in Reinforcement Learning (ICML Workshop)},
url = {
http://docs.wixstatic.com/ugd/3195dc\_538b63de8e2644b782db920c55f74650.pdf},
year = 2016
}
Satija and Pineau (2016) propose using reinforcement learning to learn the trade-off between waiting for input and producing output.
Kyunghyun Cho and Masha Esipova (2016):
Can neural machine translation do simultaneous translation?, CoRR

@article{DBLP:journals/corr/ChoE16,
author = {Kyunghyun Cho and Masha Esipova},
title = {Can neural machine translation do simultaneous translation?},
journal = {CoRR},
volume = {abs/1606.02012},
url = {
http://arxiv.org/abs/1606.02012},
archiveprefix = {arXiv},
eprint = {1606.02012},
timestamp = {Mon, 13 Aug 2018 16:47:35 +0200},
biburl = {
https://dblp.org/rec/bib/journals/corr/ChoE16},
bibsource = {dblp computer science bibliography,
https://dblp.org},
year = 2016
}
Cho and Esipova (2016) frame the problem as predicting a sequence of read and write actions, i.e., reading an additional input word and writing out an output word.
Gu, Jiatao and Neubig, Graham and Cho, Kyunghyun and Li, Victor O.K. (2017):
Learning to Translate in Real-time with Neural Machine Translation, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

@InProceedings{gu-EtAl:2017:EACLlong,
author = {Gu, Jiatao and Neubig, Graham and Cho, Kyunghyun and Li, Victor O.K.},
title = {Learning to Translate in Real-time with Neural Machine Translation},
booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers},
month = {April},
address = {Valencia, Spain},
publisher = {Association for Computational Linguistics},
pages = {1053--1062},
url = {
http://www.aclweb.org/anthology/E17-1099},
year = 2017
}
Gu et al. (2017) optimize the decoding algorithm with reinforcement learning based on this framework.
Alinejad, Ashkan and Siahbani, Maryam and Sarkar, Anoop (2018):
Prediction Improves Simultaneous Neural Machine Translation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1337,
author = {Alinejad, Ashkan and Siahbani, Maryam and Sarkar, Anoop},
title = {Prediction Improves Simultaneous Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1337},
pages = {3022--3027},
year = 2018
}
Alinejad et al. (2018) refine this with an prediction operation that predicts the next input words.
Dalvi, Fahim and Durrani, Nadir and Sajjad, Hassan and Vogel, Stephan (2018):
Incremental Decoding and Training Methods for Simultaneous Translation in Neural Machine Translation, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

@InProceedings{N18-2079,
author = {Dalvi, Fahim and Durrani, Nadir and Sajjad, Hassan and Vogel, Stephan},
title = {Incremental Decoding and Training Methods for Simultaneous Translation in Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)},
publisher = {Association for Computational Linguistics},
pages = {493--499},
location = {New Orleans, Louisiana},
url = {
http://aclweb.org/anthology/N18-2079},
year = 2018
}
Dalvi et al. (2018) propose a simpler static read-and-write approach that reads a certain number of input words ahead. Similarly,
Ma, Mingbo and Huang, Liang and Xiong, Hao and Zheng, Renjie and Liu, Kaibo and Zheng, Baigong and Zhang, Chuanqiang and He, Zhongjun and Liu, Hairong and Li, Xing and Wu, Hua and Wang, Haifeng (2019):
STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{ma-etal-2019-stacl,
author = {Ma, Mingbo and Huang, Liang and Xiong, Hao and Zheng, Renjie and Liu, Kaibo and Zheng, Baigong and Zhang, Chuanqiang and He, Zhongjun and Liu, Hairong and Li, Xing and Wu, Hua and Wang, Haifeng},
title = {STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1289},
pages = {3025--3036},
year = 2019
}
Ma et al. (2019) use a wait-k strategy that reads a fixed number of words ahead and train a prefix-to-prefix translation model. They argue that their model learns to anticipate missing content.
Arivazhagan, Naveen and Cherry, Colin and Macherey, Wolfgang and Chiu, Chung-Cheng and Yavuz, Semih and Pang, Ruoming and Li, Wei and Raffel, Colin (2019):
Monotonic Infinite Lookback Attention for Simultaneous Machine Translation, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{arivazhagan-etal-2019-monotonic,
author = {Arivazhagan, Naveen and Cherry, Colin and Macherey, Wolfgang and Chiu, Chung-Cheng and Yavuz, Semih and Pang, Ruoming and Li, Wei and Raffel, Colin},
title = {Monotonic Infinite Lookback Attention for Simultaneous Machine Translation},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1126},
pages = {1313--1323},
year = 2019
}
Arivazhagan et al. (2019) integrate the learning of the size of the look-ahead window into the attention mechanism. Their training objective takes both prediction accuracy and look-ahead penalty into account. Similarly,
Zheng, Baigong and Zheng, Renjie and Ma, Mingbo and Huang, Liang (2019):
Simultaneous Translation with Flexible Policy via Restricted Imitation Learning, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{zheng-etal-2019-simultaneous,
author = {Zheng, Baigong and Zheng, Renjie and Ma, Mingbo and Huang, Liang},
title = {Simultaneous Translation with Flexible Policy via Restricted Imitation Learning},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1582},
pages = {5816--5822},
year = 2019
}
Zheng et al. (2019) also train an end-to-end model that learns translation predictions and look-ahead (i.e., read) operations at the same time. For training, action sequences are generated from the training data with different look-ahead window sizes.
Lattice Decoding
In the case of speech translation off-line, i.e., for a stored audio file without any real-time requirements, tighter integration of the speech recognition component and the machine translation component may be attempted. A common strategy is to expose the full search graph of the speech recognition system in form of a word lattice, a method that also works for preserving ambiguity for word segmentation, morphological analysis, or differing byte pair encoding vocabularies.
Zhang, Pei and Ge, Niyu and Chen, Boxing and Fan, Kai (2019):
Lattice Transformer for Speech Translation, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{zhang-etal-2019-lattice,
author = {Zhang, Pei and Ge, Niyu and Chen, Boxing and Fan, Kai},
title = {Lattice Transformer for Speech Translation},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1649},
pages = {6475--6484},
year = 2019
}
Zhang et al. (2019) propose a novel attention mechanism over lattices. It excludes consideration of nodes in the lattice that cannot be in the same path for any given node, and also incorporates the probabilities of nodes in the lattice.
Interactive Translation Prediction
Another special decoding scenario is the interactive translation by machines and humans. In this setup, the machine translation system offers up suggestions for word translations, one word at a time, which the human translator either accepts or modifies. Either way, the machine translation system has to propose extensions to the current partial translation.
Rebecca Knowles and Philipp Koehn (2016):
Neural Interactive Translation Prediction, Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA)

@inproceedings{neural-interactive-translation-2016,
author = {Rebecca Knowles and Philipp Koehn},
title = {Neural Interactive Translation Prediction},
booktitle = {Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA)},
year = 2016
}
Knowles and Koehn (2016) show that neural methods make better predictions than traditional statistical methods with search lattice methods.
Wuebker, Joern and Green, Spence and DeNero, John and Hasan, Sasa and Luong, Minh-Thang (2016):
Models and Inference for Prefix-Constrained Machine Translation, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
mentioned in Computer Aided Translation and Inference@InProceedings{wuebker-EtAl:2016:P16-1,
author = {Wuebker, Joern and Green, Spence and DeNero, John and Hasan, Sasa and Luong, Minh-Thang},
title = {Models and Inference for Prefix-Constrained Machine Translation},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {66--75},
url = {
http://www.aclweb.org/anthology/P16-1007},
year = 2016
}
Wuebker et al. (2016);
Peris, lvaro and Domingo, Miguel and Casacuberta, Francisco (2017):
Interactive Neural Machine Translation, Computer Speech Language
mentioned in Computer Aided Translation and Inference@article{Peris:2017:INM,
author = {Peris, lvaro and Domingo, Miguel and Casacuberta, Francisco},
title = {Interactive Neural Machine Translation},
journal = {Computer Speech Language},
issue\_date = {September 2017},
volume = {45},
number = {C},
month = {sep},
issn = {0885-2308},
pages = {201--220},
numpages = {20},
url = {
https://doi.org/10.1016/j.csl.2016.12.003},
doi = {10.1016/j.csl.2016.12.003},
acmid = {3103744},
publisher = {Academic Press Ltd.},
address = {London, UK, UK},
keywords = {Interactive-predictive machine translation, Neural machine translation, Recurrent neural networks},
year = 2017
}
Peris et al. (2017) also suggest to force-decode the given partial translation and let the model make subsequent predictions.
Knowles, Rebecca and Sanchez-Torron, Marina and Koehn, Philipp (2019):
A user study of neural interactive translation prediction, Machine Translation

@Article{Knowles2019,
author = {Knowles, Rebecca and Sanchez-Torron, Marina and Koehn, Philipp},
title = {A user study of neural interactive translation prediction},
journal = {Machine Translation},
month = {Jun},
day = {01},
volume = {33},
number = {1},
pages = {135--154},
issn = {1573-0573},
doi = {10.1007/s10590-019-09235-8},
url = {
https://doi.org/10.1007/s10590-019-09235-8},
year = 2019
}
Knowles et al. (2019) carry out a study with professional translators, showing that interactive translation predictions allows some of them to translate faster.
Peris, \'Alvaro and Casacuberta, Francisco (2019):
A Neural, Interactive-predictive System for Multimodal Sequence to Sequence Tasks, Proceedings of the 57th Conference of the Association for Computational Linguistics: System Demonstrations

@inproceedings{peris-casacuberta-2019-neural,
author = {Peris, {\'A}lvaro and Casacuberta, Francisco},
title = {A Neural, Interactive-predictive System for Multimodal Sequence to Sequence Tasks},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics: System Demonstrations},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-3014},
pages = {81--86},
year = 2019
}
Peris and Casacuberta (2019) extend this technology to other sequence-to-sequence task, such as image captioning.
Benchmarks
Discussion
Related Topics
New Publications
Matusov, Evgeny and Wilken, Patrick and Georgakopoulou, Yota (2019):
Customizing Neural Machine Translation for Subtitling, Proceedings of the Fourth Conference on Machine Translation

@InProceedings{matusov-wilken-georgakopoulou:2019:WMT,
author = {Matusov, Evgeny and Wilken, Patrick and Georgakopoulou, Yota},
title = {Customizing Neural Machine Translation for Subtitling},
booktitle = {Proceedings of the Fourth Conference on Machine Translation},
month = {August},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
pages = {82--93},
url = {
http://www.aclweb.org/anthology/W19-5209},
year = 2019
}
Matusov et al. (2019)
Saboo, Ashutosh and Baumann, Timo (2019):
Integration of Dubbing Constraints into Machine Translation, Proceedings of the Fourth Conference on Machine Translation

@InProceedings{saboo-baumann:2019:WMT,
author = {Saboo, Ashutosh and Baumann, Timo},
title = {Integration of Dubbing Constraints into Machine Translation},
booktitle = {Proceedings of the Fourth Conference on Machine Translation},
month = {August},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
pages = {94--101},
url = {
http://www.aclweb.org/anthology/W19-5210},
year = 2019
}
Saboo and Baumann (2019)
Zhang, Xiaowei and Chen, Wei and Wang, Feng and Xu, Shuang and Xu, Bo (2017):
Towards Compact and Fast Neural Machine Translation Using a Combined Method, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
mentioned in Neural Network Models and Inference@InProceedings{D17-1154,
author = {Zhang, Xiaowei and Chen, Wei and Wang, Feng and Xu, Shuang and Xu, Bo},
title = {Towards Compact and Fast Neural Machine Translation Using a Combined Method},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},
publisher = {Association for Computational Linguistics},
pages = {1476--1482},
location = {Copenhagen, Denmark},
url = {
http://aclweb.org/anthology/D17-1154},
year = 2017
}
Zhang et al. (2017)
Other
Lin, Junyang and Sun, Xu and Ren, Xuancheng and Ma, Shuming and Su, Jinsong and Su, Qi (2018):
Deconvolution-Based Global Decoding for Neural Machine Translation, Proceedings of the 27th International Conference on Computational Linguistics

@inproceedings{C18-1276,
author = {Lin, Junyang and Sun, Xu and Ren, Xuancheng and Ma, Shuming and Su, Jinsong and Su, Qi},
title = {Deconvolution-Based Global Decoding for Neural Machine Translation},
booktitle = {Proceedings of the 27th International Conference on Computational Linguistics},
month = {aug},
address = {Santa Fe, New Mexico, USA},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/C18-1276},
pages = {3260--3271},
year = 2018
}
Lin et al. (2018)
Ma, Shuming and SUN, Xu and Wang, Yizhong and Lin, Junyang (2018):
Bag-of-Words as Target for Neural Machine Translation, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

@InProceedings{P18-2053,
author = {Ma, Shuming and SUN, Xu and Wang, Yizhong and Lin, Junyang},
title = {Bag-of-Words as Target for Neural Machine Translation},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
publisher = {Association for Computational Linguistics},
pages = {332--338},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/P18-2053},
year = 2018
}
Ma et al. (2018)
He, Di and Lu, Hanqing and Xia, Yingce and Qin, Tao and Wang, Liwei and Liu, Tieyan (2017):
Decoding with Value Networks for Neural Machine Translation, Advances in Neural Information Processing Systems 30

@incollection{NIPS2017-6622,
author = {He, Di and Lu, Hanqing and Xia, Yingce and Qin, Tao and Wang, Liwei and Liu, Tieyan},
title = {Decoding with Value Networks for Neural Machine Translation},
booktitle = {Advances in Neural Information Processing Systems 30},
editor = {I. Guyon and U. V. Luxburg and S. Bengio and H. Wallach and R. Fergus and S. Vishwanathan and R. Garnett},
pages = {178--187},
publisher = {Curran Associates, Inc.},
url = {
http://papers.nips.cc/paper/6622-decoding-with-value-networks-for-neural-machine-translation.pdf},
year = 2017
}
He et al. (2017)
Stahlberg, Felix and Byrne, Bill (2017):
Unfolding and Shrinking Neural Machine Translation Ensembles, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
mentioned in Neural Network Models and Inference@InProceedings{D17-1207,
author = {Stahlberg, Felix and Byrne, Bill},
title = {Unfolding and Shrinking Neural Machine Translation Ensembles},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},
publisher = {Association for Computational Linguistics},
pages = {1936--1946},
location = {Copenhagen, Denmark},
url = {
http://aclweb.org/anthology/D17-1207},
year = 2017
}
Stahlberg and Byrne (2017)
Sampling
Schulz, Philip and Aziz, Wilker and Cohn, Trevor (2018):
A Stochastic Decoder for Neural Machine Translation, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

@InProceedings{P18-1115,
author = {Schulz, Philip and Aziz, Wilker and Cohn, Trevor},
title = {A Stochastic Decoder for Neural Machine Translation},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
publisher = {Association for Computational Linguistics},
pages = {1243--1252},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/P18-1115},
year = 2018
}
Schulz et al. (2018)
Lattice
Xiao, Fengshun and Li, Jiangtong and Zhao, Hai and Wang, Rui and Chen, Kehai (2019):
Lattice-Based Transformer Encoder for Neural Machine Translation, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{xiao-etal-2019-lattice,
author = {Xiao, Fengshun and Li, Jiangtong and Zhao, Hai and Wang, Rui and Chen, Kehai},
title = {Lattice-Based Transformer Encoder for Neural Machine Translation},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1298},
pages = {3090--3097},
year = 2019
}
Xiao et al. (2019)