Analysis and Visualization
Neural machine translation models operate on high-dimensional representation at any stage of processing. Their abilities and failures are hard to determine from their millions of parameters. To better understand the behavior of neural machine translation models, researchers compared performance to phrase-based systems, explored linguistic abilities of the models, and developed method to visualize their processing.
Analysis And Visualization is the main subject of 85 publications. 62 are discussed here.
Publications
Quality of Machine Translation Output:
With the advent of neural machine translation and its better quality in terms of automatic metrics such as BLEU and human ranking of translation quality
Bojar, Ondřej and Chatterjee, Rajen and Federmann, Christian and Graham, Yvette and Haddow, Barry and Huck, Matthias and Jimeno Yepes, Antonio and Koehn, Philipp and Logacheva, Varvara and Monz, Christof and Negri, Matteo and Neveol, Aurelie and Neves, Mariana and Popel, Martin and Post, Matt and Rubino, Raphael and Scarton, Carolina and Specia, Lucia and Turchi, Marco and Verspoor, Karin and Zampieri, Marcos (2016):
Findings of the 2016 Conference on Machine Translation, Proceedings of the First Conference on Machine Translation
mentioned in Evaluation Campaigns and Analysis And Visualization@InProceedings{bojar-EtAl:2016:WMT1,
author = {Bojar, Ond\v{r}ej and Chatterjee, Rajen and Federmann, Christian and Graham, Yvette and Haddow, Barry and Huck, Matthias and Jimeno Yepes, Antonio and Koehn, Philipp and Logacheva, Varvara and Monz, Christof and Negri, Matteo and Neveol, Aurelie and Neves, Mariana and Popel, Martin and Post, Matt and Rubino, Raphael and Scarton, Carolina and Specia, Lucia and Turchi, Marco and Verspoor, Karin and Zampieri, Marcos},
title = {Findings of the 2016 Conference on Machine Translation},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {131--198},
url = {
http://www.aclweb.org/anthology/W/W16/W16-2301},
year = 2016
}
(Bojar et al., 2016), researchers and users of machine translation were initially interested in more fine-grained assessment the differences of these two technologies.
Bentivogli, Luisa and Bisazza, Arianna and Cettolo, Mauro and Federico, Marcello (2016):
Neural versus Phrase-Based Machine Translation Quality: a Case Study, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

@InProceedings{bentivogli-EtAl:2016:EMNLP2016,
author = {Bentivogli, Luisa and Bisazza, Arianna and Cettolo, Mauro and Federico, Marcello},
title = {Neural versus Phrase-Based Machine Translation Quality: a Case Study},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {257--267},
url = {
https://aclweb.org/anthology/D16-1025},
year = 2016
}
Bentivogli et al. (2016);
Luisa Bentivogli and Arianna Bisazza and Mauro Cettolo and Marcello Federico (2018):
Neural versus phrase-based MT quality: An in-depth analysis on English--German and English--French, Computer Speech and Language

@article{BENTIVOGLI201852,
author = {Luisa Bentivogli and Arianna Bisazza and Mauro Cettolo and Marcello Federico},
title = {Neural versus phrase-based MT quality: An in-depth analysis on English--German and English--French},
journal = {Computer Speech and Language},
volume = {49},
pages = {52--70},
issn = {0885-2308},
doi = {
https://doi.org/10.1016/j.csl.2017.11.004},
url = {
http://www.sciencedirect.com/science/article/pii/S0885230817301079},
keywords = {Machine translation (MT), Neural MT, Phrase-based MT, Evaluation},
year = 2018
}
Bentivogli et al. (2018) considered different automatically assessed linguistic categories when comparing the performance of neural vs. statistical machine translation systems for English-German.
Filip Klubička and Antonio Toral and Víctor M. Sánchez-Cartagena (2017):
Fine-Grained Human Evaluation of Neural Versus Phrase-Based Machine Translation, The Prague Bulletin of Mathematical Linguistics

@article{klubicka-toral-sanchez-cartagena:2017,
author = {Filip Klubi\v{c}ka and Antonio Toral and V\'{i}ctor M. S\'{a}nchez-Cartagena},
title = {{Fine-Grained Human Evaluation of Neural Versus Phrase-Based Machine Translation}},
journal = {The Prague Bulletin of Mathematical Linguistics},
month = {June},
volume = {108},
pages = {121--132},
doi = {10.1515/pralin-2017-0014},
issn = {0032-6585},
url = {
https://ufal.mff.cuni.cz/pbml/108/art-klubicka-toral-sanchez-cartagena.pdf},
year = 2017
}
Klubička et al. (2017) use multidimensional quality metrics (MQM) for a manual error analysis to compare two statistical and one neural system for English-Croatian.
Aljoscha Burchardt and Vivien Macketanz and Jon Dehdari and Georg Heigold and Jan-Thorsten Peter and Philip Williams (2017):
A Linguistic Evaluation of Rule-Based, Phrase-Based, and Neural MT Engines, The Prague Bulletin of Mathematical Linguistics

@article{burchardt-macketanz-dehdari-heigold-peter-williams:2017,
author = {Aljoscha Burchardt and Vivien Macketanz and Jon Dehdari and Georg Heigold and Jan-Thorsten Peter and Philip Williams},
title = {{A Linguistic Evaluation of Rule-Based, Phrase-Based, and Neural MT Engines}},
journal = {The Prague Bulletin of Mathematical Linguistics},
month = {June},
volume = {108},
pages = {159--170},
doi = {10.1515/pralin-2017-0017},
issn = {0032-6585},
url = {
https://ufal.mff.cuni.cz/pbml/108/art-burchardt-macketanz-dehdari-heigold-peter-williams.pdf},
year = 2017
}
Burchardt et al. (2017) pose difficult linguistic challenges to assess several statistical, neural, and rule-based systems for German-English and English-German, showing better performance for the rule-based system for verb tense and valency, but better performance for the neural system for many other categories such as handling of composition, function words, multi-word expressions, and subordination.
Kim Harris and Lucia Specia and Aljoscha Burchardt (2017):
Feature-Rich NMT and SMT Post-Edited Corpora for Productivity and Evaluation Tasks with a Subset of MQM-Annotated Data, Machine Translation Summit XVI

@inproceedings{mtsummit2017:Harris,
author = {Kim Harris and Lucia Specia and Aljoscha Burchardt},
title = {Feature-Rich {NMT} and {SMT} Post-Edited Corpora for Productivity and Evaluation Tasks with a Subset of {MQM}-Annotated Data},
booktitle = {Machine Translation Summit XVI},
location = {Nagoya, Japan},
year = 2017
}
Harris et al. (2017) extend this analysis to English-Latvian and English-Czech.
Maja Popović (2017):
Comparing Language Related Issues for NMT and PBMT between German and English, The Prague Bulletin of Mathematical Linguistics

@article{popovic:2017,
author = {Maja Popovi\'{c}},
title = {{Comparing Language Related Issues for NMT and PBMT between German and English}},
journal = {The Prague Bulletin of Mathematical Linguistics},
month = {June},
volume = {108},
pages = {209--220},
doi = {10.1515/pralin-2017-0021},
issn = {0032-6585},
url = {
https://ufal.mff.cuni.cz/pbml/108/art-popovic.pdf},
year = 2017
}
Popović (2017) uses similar manual annotation of different linguistic error categories to compare a neural and statistical system for these language pairs.
Shantipriya Parida and Ondřej Bojar (2018):
Translating Short Segments with NMT: A Case Study in English-to-Hindi, Proceedings of the 21st Annual Conference of the European Association for Machine Translation

@inproceedings{eamt18-Parida,
author = {Shantipriya Parida and Ond\v{r}ej Bojar},
title = {Translating Short Segments with NMT: A Case Study in English-to-Hindi},
booktitle = {Proceedings of the 21st Annual Conference of the European Association for Machine Translation},
location = {Alicante, Spain},
url = {
https://rua.ua.es/dspace/bitstream/10045/76083/1/EAMT2018-Proceedings\_25.pdf},
year = 2018
}
Parida and Bojar (2018) compare a phrase-based statistical model, a recurrent neural translation model and a transformer model for the task of translation of short English-to-Hindi segments, with the transformer model coming out on top.
Toral, Antonio and Sánchez-Cartagena, Víctor M. (2017):
A Multifaceted Evaluation of Neural versus Phrase-Based Machine Translation for 9 Language Directions, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

@InProceedings{toral-sanchezcartagena:2017:EACLlong,
author = {Toral, Antonio and S\'{a}nchez-Cartagena, V\'{i}ctor M.},
title = {A Multifaceted Evaluation of Neural versus Phrase-Based Machine Translation for 9 Language Directions},
booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers},
month = {April},
address = {Valencia, Spain},
publisher = {Association for Computational Linguistics},
pages = {1063--1073},
url = {
http://www.aclweb.org/anthology/E17-1100},
year = 2017
}
Toral and Sánchez-Cartagena (2017) compared different broad aspects such as fluency and reordering for nine language directions.
Sheila Castilho and Joss Moorkens and Federico Gaspari and Iacer Calixto and John Tinsley and Andy Way (2017):
Is Neural Machine Translation the New State of the Art?, The Prague Bulletin of Mathematical Linguistics

@article{castilho-moorkens-gaspari-tinsley-calixto-way:2017,
author = {Sheila Castilho and Joss Moorkens and Federico Gaspari and Iacer Calixto and John Tinsley and Andy Way},
title = {{Is Neural Machine Translation the New State of the Art?}},
journal = {The Prague Bulletin of Mathematical Linguistics},
month = {June},
volume = {108},
pages = {109--120},
doi = {10.1515/pralin-2017-0013},
issn = {0032-6585},
url = {
https://ufal.mff.cuni.cz/pbml/108/art-castilho-moorkens-gaspari-tinsley-calixto-way.pdf},
year = 2017
}
Castilho et al. (2017) use automatic scores when comparing neural and statistical machine translation for different domains (e-commerce, patents, educational content), showing better performance for the neural systems except for patent abstracts and e-commerce. They followed this up
Sheila Castilho and Joss Moorkens and Federico Gaspari and Rico Sennrich and Vilelmini Sosoni and Panayota Georgakopoulou and Pintu Lohar and Andy Way and Antonio Valerio Miceli Barone and Maria Gialama (2017):
A Comparative Quality Evaluation of PBSMT and NMT using Professional Translators, Machine Translation Summit XVI

@inproceedings{mtsummit2017:Castilho2,
author = {Sheila Castilho and Joss Moorkens and Federico Gaspari and Rico Sennrich and Vilelmini Sosoni and Panayota Georgakopoulou and Pintu Lohar and Andy Way and Antonio Valerio Miceli Barone and Maria Gialama},
title = {A Comparative Quality Evaluation of {PBSMT} and {NMT} using Professional Translators},
booktitle = {Machine Translation Summit XVI},
location = {Nagoya, Japan},
year = 2017
}
(Castilho et al., 2017) with a more detailed human assessment of linguistic aspects for the educational content. They find better performance for the neural model across categories such as inflectional morphology, word order, omission, addition and mistranslation for 4 languages.
Cohn-Gordon, Reuben and Goodman, Noah (2019):
Lost in Machine Translation: A Method to Reduce Meaning Loss, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

@inproceedings{cohn-gordon-goodman-2019-lost,
author = {Cohn-Gordon, Reuben and Goodman, Noah},
title = {Lost in Machine Translation: A Method to Reduce Meaning Loss},
booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
month = {jun},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/N19-1042},
pages = {437--441},
year = 2019
}
Cohn-Gordon and Goodman (2019) examine how sentences are translated that are ambiguous in one language due to underspecification.
Addressing the use of machine translation,
Marianna J. Martindale and Marine Carpuat (2018):
Fluency Over Adequacy: A Pilot Study in Measuring User Trust in Imperfect MT, Annual Meeting of the Association for Machine Translation in the Americas (AMTA)

@inproceedings{AMTA2018-Martindale,
author = {Marianna J. Martindale and Marine Carpuat},
title = {Fluency Over Adequacy: A Pilot Study in Measuring User Trust in Imperfect MT},
booktitle = {Annual Meeting of the Association for Machine Translation in the Americas (AMTA)},
location = {Boston, USA},
url = {
https://arxiv.org/pdf/1802.06041.pdf},
year = 2018
}
Martindale and Carpuat (2018) highlight that the typically fluent output of neural machine translation systems may lead to unwarranted high level of trust. They show that exposure to bad translations reduces user's trust, but more so for disfluent than misleading translations.
Sheila Castilho and Ana Guerberof (2018):
Reading Comprehension of Machine Translation Output: What Makes for a Better Read?, Proceedings of the 21st Annual Conference of the European Association for Machine Translation

@inproceedings{eamt18-Castilho,
author = {Sheila Castilho and Ana Guerberof},
title = {Reading Comprehension of Machine Translation Output: What Makes for a Better Read?},
booktitle = {Proceedings of the 21st Annual Conference of the European Association for Machine Translation},
location = {Alicante, Spain},
url = {
https://rua.ua.es/dspace/bitstream/10045/76032/1/EAMT2018-Proceedings\_10.pdf},
year = 2018
}
Castilho and Guerberof (2018) carry out a task-based comparative evaluation between a neural and a statistical machine translation system for 3 language pairs. The human evaluators read the translation and answered questions about the content, allowing for measurement of reading speed and correctness of answers, as well as solicitation of feedback.
The claim of human parity for Chinese-English news translation
Hany Hassan and Anthony Aue and Chang Chen and Vishal Chowdhary and Jonathan Clark and Christian Federmann and Xuedong Huang and Marcin Junczys-Dowmunt and William Lewis and Mu Li and Shujie Liu and Tie-Yan Liu and Renqian Luo and Arul Menezes and Tao Qin and Frank Seide and Xu Tan and Fei Tian and Lijun Wu and Shuangzhi Wu and Yingce Xia and Dongdong Zhang and Zhirui Zhang and Ming Zhou (2018):
Achieving Human Parity on Automatic Chinese to English News Translation, CoRR

@article{DBLP:journals/corr/abs-1803-05567,
author = {Hany Hassan and Anthony Aue and Chang Chen and Vishal Chowdhary and Jonathan Clark and Christian Federmann and Xuedong Huang and Marcin Junczys
}Dowmunt and William Lewis and Mu Li and Shujie Liu and Tie{Yan Liu and Renqian Luo and Arul Menezes and Tao Qin and Frank Seide and Xu Tan and Fei Tian and Lijun Wu and Shuangzhi Wu and Yingce Xia and Dongdong Zhang and Zhirui Zhang and Ming Zhou},
title = {Achieving Human Parity on Automatic Chinese to English News Translation},
journal = {CoRR},
volume = {abs/1803.05567},
url = {
http://arxiv.org/abs/1803.05567},
archiveprefix = {arXiv},
eprint = {1803.05567},
timestamp = {Mon, 13 Aug 2018 16:47:23 +0200},
biburl = {
https://dblp.org/rec/bib/journals/corr/abs-1803-05567},
bibsource = {dblp computer science bibliography,
https://dblp.org},
year = 2018
}
(Hassan et al., 2018) has triggered a number of responses.
Toral, Antonio and Castilho, Sheila and Hu, Ke and Way, Andy (2018):
Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation, Proceedings of the Third Conference on Machine Translation: Research Papers

@inproceedings{W18-6312,
author = {Toral, Antonio and Castilho, Sheila and Hu, Ke and Way, Andy},
title = {Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation},
booktitle = {Proceedings of the Third Conference on Machine Translation: Research Papers},
month = {oct},
address = {Belgium, Brussels},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W18-6312},
pages = {113--123},
year = 2018
}
Toral et al. (2018) call this claim into question by observing the impact of using test sets created in the reverse order (translated from the target side to the source side, opposite to the machine translation direction and the skill of human evaluators.
Läubli, Samuel and Sennrich, Rico and Volk, Martin (2018):
Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1512,
author = {L{\"a}ubli, Samuel and Sennrich, Rico and Volk, Martin},
title = {Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1512},
pages = {4791--4796},
year = 2018
}
Läubli et al. (2018) present results that show annotators gave machine translation higher scores on adequacy than human translations, but only on the sentence level, not the document level, and also that human translations are ranked higher in terms of fluency.
Targeted Test Sets:
Isabelle, Pierre and Cherry, Colin and Foster, George (2017):
A Challenge Set Approach to Evaluating Machine Translation, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
mentioned in Manual Metrics and Analysis And Visualization@InProceedings{D17-1262,
author = {Isabelle, Pierre and Cherry, Colin and Foster, George},
title = {A Challenge Set Approach to Evaluating Machine Translation},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},
publisher = {Association for Computational Linguistics},
pages = {2476--2486},
location = {Copenhagen, Denmark},
url = {
http://aclweb.org/anthology/D17-1262},
year = 2017
}
Isabelle et al. (2017) pose a challenge set of manually crafted French sentences for a number of linguistic categories that pose hard problems for translations, such as long distance agreement or preservation of polarity.
Sennrich, Rico (2017):
How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

@InProceedings{sennrich:2017:EACLshort,
author = {Sennrich, Rico},
title = {How Grammatical is Character-level Neural Machine Translation? Assessing {MT} Quality with Contrastive Translation Pairs},
booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers},
month = {April},
address = {Valencia, Spain},
publisher = {Association for Computational Linguistics},
pages = {376--382},
url = {
http://www.aclweb.org/anthology/E17-2060},
year = 2017
}
Sennrich (2017) developed an automatic method to detect specific morphosyntactic errors. First a test set is created by taking sentence pairs, and modifying the target sentence to exhibit specific types of error, such as wrong gender of determiners, wrong particles for verbs, wrong transliteration. Then a neural translation model is evaluated by how often it scores the correct translation higher then the faulty translations. The paper compares byte-pair encoding against character-based models for rare and unknown words.
Rios, Annette and Mascarell, Laura and Sennrich, Rico (2017):
Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings, Proceedings of the Second Conference on Machine Translation, Volume 1: Research Paper
mentioned in Linguistic Annotation and Analysis And Visualization@InProceedings{riosgonzales-mascarell-sennrich:2017:WMT,
author = {Rios, Annette and Mascarell, Laura and Sennrich, Rico},
title = {Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings},
booktitle = {Proceedings of the Second Conference on Machine Translation, Volume 1: Research Paper},
month = {September},
address = {Copenhagen, Denmark},
publisher = {Association for Computational Linguistics},
pages = {11--19},
url = {
http://www.aclweb.org/anthology/W17-4702},
year = 2017
}
Rios et al. (2017) use this method to create contrastive translation pair to address the problem of translating ambiguous nouns.
Burlot, Franck and Yvon, François (2017):
Evaluating the morphological competence of Machine Translation Systems, Proceedings of the Second Conference on Machine Translation, Volume 1: Research Paper

@InProceedings{burlot-yvon:2017:WMT,
author = {Burlot, Franck and Yvon, Fran\c{c}ois},
title = {Evaluating the morphological competence of Machine Translation Systems},
booktitle = {Proceedings of the Second Conference on Machine Translation, Volume 1: Research Paper},
month = {September},
address = {Copenhagen, Denmark},
publisher = {Association for Computational Linguistics},
pages = {43--55},
url = {
http://www.aclweb.org/anthology/W17-4705},
year = 2017
}
Burlot and Yvon (2017) use it to create a test set for selecting the correct morphological variant in a morphologically rich target language, Latvian.
Müller, Mathias and Rios, Annette and Voita, Elena and Sennrich, Rico (2018):
A Large-Scale Test Set for the Evaluation of Context-Aware Pronoun Translation in Neural Machine Translation, Proceedings of the Third Conference on Machine Translation: Research Papers

@inproceedings{W18-6307,
author = {M{\"u}ller, Mathias and Rios, Annette and Voita, Elena and Sennrich, Rico},
title = {A Large-Scale Test Set for the Evaluation of Context-Aware Pronoun Translation in Neural Machine Translation},
booktitle = {Proceedings of the Third Conference on Machine Translation: Research Papers},
month = {oct},
address = {Belgium, Brussels},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W18-6307},
pages = {61--72},
year = 2018
}
Müller et al. (2018) created a test set to evaluate the translation of pronouns, although
Guillou, Liane and Hardmeier, Christian (2018):
Automatic Reference-Based Evaluation of Pronoun Translation Misses the Point, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1513,
author = {Guillou, Liane and Hardmeier, Christian},
title = {Automatic Reference-Based Evaluation of Pronoun Translation Misses the Point},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1513},
pages = {4797--4802},
year = 2018
}
Guillou and Hardmeier (2018) point out that automatic evaluation of pronoun translation is tricky and may not correlate well with human judgment.
Yutong Shao and Rico Sennrich and Bonnie Webber and Federico Fancellu (2018):
Evaluating Machine Translation Performance on Chinese Idioms with a Blacklist Method, Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

@InProceedings{LREC2018-SHAO18.381,
author = {Yutong Shao and Rico Sennrich and Bonnie Webber and Federico Fancellu},
title = {Evaluating Machine Translation Performance on Chinese Idioms with a Blacklist Method},
booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
month = {May 7-12, 2018},
address = {Miyazaki, Japan},
publisher = {European Language Resources Association (ELRA)},
isbn = {979-10-95546-00-9},
language = {english},
url = {
https://arxiv.org/pdf/1711.07646.pdf},
year = 2018
}
Shao et al. (2018) propose to evaluate the translation of idioms with a blacklist method: if words that are part of a literal translation of the idiomatic phrase occur in the output, it is flagged as incorrect.
Visualization:
It is common to plot word embeddings
van der Maaten, L.J.P. and Hinton, Geoffrey (2008):
Visualizing Data Using t-SNE, Journal of Machine Learning Research

@article{word-embedding-viz,
author = {van der Maaten, L.J.P. and Hinton, Geoffrey},
title = {Visualizing Data Using t-SNE},
month = {November},
url = {
http://jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf},
journal = {Journal of Machine Learning Research},
volume = {9},
pages = {2579--2605},
year = 2008
}
(Maaten and Hinton, 2008) or attention weights
Koehn, Philipp and Knowles, Rebecca (2017):
Six Challenges for Neural Machine Translation, Proceedings of the First Workshop on Neural Machine Translation

@InProceedings{koehn-knowles:2017:NMT,
author = {Koehn, Philipp and Knowles, Rebecca},
title = {Six Challenges for Neural Machine Translation},
booktitle = {Proceedings of the First Workshop on Neural Machine Translation},
month = {August},
address = {Vancouver},
publisher = {Association for Computational Linguistics},
pages = {28--39},
url = {
http://www.aclweb.org/anthology/W17-3204},
year = 2017
}
(Koehn and Knowles, 2017;
Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, \L ukasz and Polosukhin, Illia (2017):
Attention is All you Need, Advances in Neural Information Processing Systems 30
mentioned in Alternative Architectures and Analysis And Visualization@incollection{NIPS2017-7181,
author = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, \L ukasz and Polosukhin, Illia},
title = {Attention is All you Need},
booktitle = {Advances in Neural Information Processing Systems 30},
editor = {I. Guyon and U. V. Luxburg and S. Bengio and H. Wallach and R. Fergus and S. Vishwanathan and R. Garnett},
pages = {5998--6008},
publisher = {Curran Associates, Inc.},
url = {
http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf},
year = 2017
}
Vaswani et al., 2017) for inspection of parameters and model states.
Rebecca Marvin and Philipp Koehn (2018):
Exploring Word Sense Disambiguation Abilities of Neural Machine Translation Systems, Annual Meeting of the Association for Machine Translation in the Americas (AMTA)

@inproceedings{AMTA2018-Marvin,
author = {Rebecca Marvin and Philipp Koehn},
title = {Exploring Word Sense Disambiguation Abilities of Neural Machine Translation Systems},
booktitle = {Annual Meeting of the Association for Machine Translation in the Americas (AMTA)},
location = {Boston, USA},
year = 2018
}
Marvin and Koehn (2018) plot embedding states for words marked with their senses.
Ghader, Hamidreza and Monz, Christof (2017):
What does Attention in Neural Machine Translation Pay Attention to?, Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

@InProceedings{ghader-monz:2017:I17-1,
author = {Ghader, Hamidreza and Monz, Christof},
title = {What does Attention in Neural Machine Translation Pay Attention to?},
booktitle = {Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
month = {November},
address = {Taipei, Taiwan},
publisher = {Asian Federation of Natural Language Processing},
pages = {30--39},
url = {
http://www.aclweb.org/anthology/I17-1004},
year = 2017
}
Ghader and Monz (2017) more closely examine attention states, in comparison to traditional word alignments.
Tran, Ke and Bisazza, Arianna and Monz, Christof (2016):
Recurrent Memory Networks for Language Modeling, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

@InProceedings{tran-bisazza-monz:2016:N16-1,
author = {Tran, Ke and Bisazza, Arianna and Monz, Christof},
title = {Recurrent Memory Networks for Language Modeling},
booktitle = {Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
month = {June},
address = {San Diego, California},
publisher = {Association for Computational Linguistics},
pages = {321--331},
url = {
http://www.aclweb.org/anthology/N16-1036},
year = 2016
}
Tran et al. (2016) integrate an attention mechanism into a language model and show which previous words had the most influence on predictions of the next word.
Stahlberg, Felix and Saunders, Danielle and Byrne, Bill (2018):
An Operation Sequence Model for Explainable Neural Machine Translation, Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

@inproceedings{W18-5420,
author = {Stahlberg, Felix and Saunders, Danielle and Byrne, Bill},
title = {An Operation Sequence Model for Explainable Neural Machine Translation},
booktitle = {Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP},
month = {nov},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W18-5420},
pages = {175--186},
year = 2018
}
Stahlberg et al. (2018) add additional markup to the target side of the parallel corpus and hence the output of the translation model that flags translation decisions.
Lee, Jaesong and Shin, Joong-Hwi and Kim, Jun-Seok (2017):
Interactive Visualization and Manipulation of Attention-based Neural Machine Translation, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

@InProceedings{D17-2021,
author = {Lee, Jaesong and Shin, Joong-Hwi and Kim, Jun-Seok},
title = {Interactive Visualization and Manipulation of Attention-based Neural Machine Translation},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
publisher = {Association for Computational Linguistics},
pages = {121--126},
location = {Copenhagen, Denmark},
url = {
http://aclweb.org/anthology/D17-2021},
year = 2017
}
Lee et al. (2017) developed an interactive tool that allows exploration of the behavior of beam search.
Strobelt, Hendrik and Gehrmann, Sebastian and Behrisch, Michael and Perer, Adam and Pfister, Hanspeter and Rush, Alexander M (2019):
Seq2seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models, IEEE transactions on visualization and computer graphics

@article{strobelt2019s,
author = {Strobelt, Hendrik and Gehrmann, Sebastian and Behrisch, Michael and Perer, Adam and Pfister, Hanspeter and Rush, Alexander M},
title = {Seq2seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models},
journal = {IEEE transactions on visualization and computer graphics},
volume = {25},
number = {1},
pages = {353--363},
publisher = {IEEE},
year = 2019
}
Strobelt et al. (2019) present the more comprehensive tool Seq2seq-Vis that also allows the plotting and comparison of encoder and decoder states to neighbor states seen during training.
Neubig, Graham and Dou, Zi-Yi and Hu, Junjie and Michel, Paul and Pruthi, Danish and Wang, Xinyi (2019):
compare-mt: A Tool for Holistic Comparison of Language Generation Systems, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)

@inproceedings{neubig-etal-2019-compare,
author = {Neubig, Graham and Dou, Zi-Yi and Hu, Junjie and Michel, Paul and Pruthi, Danish and Wang, Xinyi},
title = {compare-mt: A Tool for Holistic Comparison of Language Generation Systems},
booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics (Demonstrations)},
month = {jun},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/N19-4007},
pages = {35--41},
year = 2019
}
Neubig et al. (2019) present the tool compare-mt that allows more fine-grained error analysis by comparing the output of two systems in terms of automatic scores, break-downs by word frequency, part-of-speech tags, and others, as well as identification of source words with strongly divergent translation quality.
Schwarzenberg, Robert and Harbecke, David and Macketanz, Vivien and Avramidis, Eleftherios and Möller, Sebastian (2019):
Train, Sort, Explain: Learning to Diagnose Translation Models, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)

@inproceedings{schwarzenberg-etal-2019-train,
author = {Schwarzenberg, Robert and Harbecke, David and Macketanz, Vivien and Avramidis, Eleftherios and M{\"o}ller, Sebastian},
title = {Train, Sort, Explain: Learning to Diagnose Translation Models},
booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics (Demonstrations)},
month = {jun},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/N19-4006},
pages = {29--34},
year = 2019
}
Schwarzenberg et al. (2019) train a classifier using a convolutional neural network to distinguish between human and machine translations and use the contribution of word-based features to identify words that drive this decision.
Predicting Properties from Internal Representations:
To probe intermediate representations, such as encoder and decoder states, a strategy is to use them as input to a classifier that predicts specific, mostly linguistic, properties.
Belinkov, Yonatan and Durrani, Nadir and Dalvi, Fahim and Sajjad, Hassan and Glass, James (2017):
What do Neural Machine Translation Models Learn about Morphology?, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

@InProceedings{belinkov-EtAl:2017:Long,
author = {Belinkov, Yonatan and Durrani, Nadir and Dalvi, Fahim and Sajjad, Hassan and Glass, James},
title = {What do Neural Machine Translation Models Learn about Morphology?},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {July},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
pages = {861--872},
url = {
http://aclweb.org/anthology/P17-1080},
year = 2017
}
Belinkov et al. (2017) predict the part of speech and morphological features of words linked to encoder and decoder states, showing better performance of character-based models, but not much difference for deeper layers.
Belinkov, Yonatan and Màrquez, Lluís and Sajjad, Hassan and Durrani, Nadir and Dalvi, Fahim and Glass, James (2017):
Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks, Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

@inproceedings{belinkov-etal-2017-evaluating,
author = {Belinkov, Yonatan and M{\`a}rquez, Llu{\'\i}s and Sajjad, Hassan and Durrani, Nadir and Dalvi, Fahim and Glass, James},
title = {Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks},
booktitle = {Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
month = {nov},
address = {Taipei, Taiwan},
publisher = {Asian Federation of Natural Language Processing},
url = {
https://www.aclweb.org/anthology/I17-1001},
pages = {1--10},
year = 2017
}
Belinkov et al. (2017) also consider semantic properties.
Shi, Xing and Padhi, Inkit and Knight, Kevin (2016):
Does String-Based Neural MT Learn Source Syntax?, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

@InProceedings{shi-padhi-knight:2016:EMNLP2016,
author = {Shi, Xing and Padhi, Inkit and Knight, Kevin},
title = {Does String-Based Neural {MT} Learn Source Syntax?},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {1526--1534},
url = {
https://aclweb.org/anthology/D16-1159},
year = 2016
}
Shi et al. (2016) find that basic syntactic properties are learned by translation models.
Poliak, Adam and Belinkov, Yonatan and Glass, James and Van Durme, Benjamin (2018):
On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

@InProceedings{N18-2082,
author = {Poliak, Adam and Belinkov, Yonatan and Glass, James and Van Durme, Benjamin},
title = {On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)},
publisher = {Association for Computational Linguistics},
pages = {513--523},
location = {New Orleans, Louisiana},
url = {
http://aclweb.org/anthology/N18-2082},
year = 2018
}
Poliak et al. (2018) probe if sentence embeddings (the first and last state of the RNN encoder) have sufficient semantic information to serve as input to semantic entailment tasks.
Raganato, Alessand ro and Tiedemann, Jörg (2018):
An Analysis of Encoder Representations in Transformer-Based Machine Translation, Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

@inproceedings{W18-5431,
author = {Raganato, Alessand ro and Tiedemann, J{\"o}rg},
title = {An Analysis of Encoder Representations in Transformer-Based Machine Translation},
booktitle = {Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP},
month = {nov},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W18-5431},
pages = {287--297},
year = 2018
}
Raganato and Tiedemann (2018) assess the encoder states of the transformer model. They develop 4 syntactic probing tasks (part-of-speech tagging, chunking, named entity recognition, and semantic dependency) and find that the earlier layers contain more syntactic information (e.g., part-of-speech tagging) while later layer contain more semantic information (e.g., semantic dependencies).
Tang, Gongbo and Sennrich, Rico and Nivre, Joakim (2018):
An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation, Proceedings of the Third Conference on Machine Translation: Research Papers

@inproceedings{W18-6304,
author = {Tang, Gongbo and Sennrich, Rico and Nivre, Joakim},
title = {An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation},
booktitle = {Proceedings of the Third Conference on Machine Translation: Research Papers},
month = {oct},
address = {Belgium, Brussels},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W18-6304},
pages = {26--35},
year = 2018
}
Tang et al. (2018) examine the role of the attention mechanism when handling ambiguous nouns. Contrary to their intuition, the decoder pays more attention to the word itself instead of context words in the case of ambiguous nouns compared to nouns in general. This is the case both for RNN-based and transformer-based translation models. They suspect that word sense disambiguation already takes place in the encoder.
A number of studies of internal representation focus on just language modeling.
Linzen, Tal and Dupoux, Emmanuel and Goldberg, Yoav (2016):
Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies, Transactions of the Association for Computational Linguistics

@article{linzen-etal-2016-assessing,
author = {Linzen, Tal and Dupoux, Emmanuel and Goldberg, Yoav},
title = {Assessing the Ability of {LSTM}s to Learn Syntax-Sensitive Dependencies},
journal = {Transactions of the Association for Computational Linguistics},
volume = {4},
url = {
https://www.aclweb.org/anthology/Q16-1037},
doi = {10.1162/tacl\_a\_00115},
pages = {521--535},
year = 2016
}
Linzen et al. (2016) propose the task of subject-verb agreement, especially when interrupted by other nouns, as a challenge to sequence models that have to preserve agreement information.
Gulordava, Kristina and Bojanowski, Piotr and Grave, Edouard and Linzen, Tal and Baroni, Marco (2018):
Colorless Green Recurrent Networks Dream Hierarchically, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

@inproceedings{gulordava-etal-2018-colorless,
author = {Gulordava, Kristina and Bojanowski, Piotr and Grave, Edouard and Linzen, Tal and Baroni, Marco},
title = {Colorless Green Recurrent Networks Dream Hierarchically},
booktitle = {Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)},
month = {jun},
address = {New Orleans, Louisiana},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/N18-1108},
doi = {10.18653/v1/N18-1108},
pages = {1195--1205},
year = 2018
}
Gulordava et al. (2018) extend this idea into several other hierarchical language problems.
Giulianelli, Mario and Harding, Jack and Mohnert, Florian and Hupkes, Dieuwke and Zuidema, Willem (2018):
Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information, Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

@inproceedings{W18-5426,
author = {Giulianelli, Mario and Harding, Jack and Mohnert, Florian and Hupkes, Dieuwke and Zuidema, Willem},
title = {Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information},
booktitle = {Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP},
month = {nov},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W18-5426},
pages = {240--248},
year = 2018
}
Giulianelli et al. (2018) build classifiers to predict the verb agreement information from the internal states at different layers of an LSTM language model and go even a step further and demonstrate that changing the decoder states based on insight gained from the classifiers allows them to make better decisions.
Tran, Ke and Bisazza, Arianna and Monz, Christof (2018):
The Importance of Being Recurrent for Modeling Hierarchical Structure, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1503,
author = {Tran, Ke and Bisazza, Arianna and Monz, Christof},
title = {The Importance of Being Recurrent for Modeling Hierarchical Structure},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1503},
pages = {4731--4736},
year = 2018
}
Tran et al. (2018) compare how well fully attentional (transformer) models compare against recurrent neural networks when it comes to decisions depending of hierarchical structure. Their experiments show that recurrent neural networks perform better at tasks such as subject verb agreement separated by recursive phrases.
Zhang, Kelly and Bowman, Samuel (2018):
Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis, Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

@inproceedings{W18-5448,
author = {Zhang, Kelly and Bowman, Samuel},
title = {Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis},
booktitle = {Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP},
month = {nov},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W18-5448},
pages = {359--361},
year = 2018
}
Zhang and Bowman (2018) show that states obtained from bidirectional language models are a better at part of speech tagging and supertagging tasks than the encoder states of a neural translation model.
Dhar, Prajit and Bisazza, Arianna (2018):
Does Syntactic Knowledge in Multilingual Language Models Transfer Across Languages?, Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

@inproceedings{W18-5453,
author = {Dhar, Prajit and Bisazza, Arianna},
title = {Does Syntactic Knowledge in Multilingual Language Models Transfer Across Languages?},
booktitle = {Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP},
month = {nov},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W18-5453},
pages = {374--377},
year = 2018
}
Dhar and Bisazza (2018) explore if multi-lingual language training leads generalizing more general syntactic but find only small improvement on agreement tasks when completely separating the vocabularies.
Role of Individual Neurons:
Andrej Karpathy and Justin Johnson and Fei-Fei Li (2016):
Visualizing and Understanding Recurrent Networks, International Conference on Learning Representations (ICLR)

@inproceedings{DBLP:journals/corr/KarpathyJL15,
author = {Andrej Karpathy and Justin Johnson and Fei{-}Fei Li},
title = {Visualizing and Understanding Recurrent Networks},
url = {
https://arxiv.org/pdf/1506.02078},
booktitle = {International Conference on Learning Representations (ICLR)},
year = 2016
}
Karpathy et al. (2016) inspect individual neurons in a character-based language model and find single neurons that appear to keep track of position in the line (expecting a line break character), and the opening of brackets.
Shi, Xing and Knight, Kevin and Yuret, Deniz (2016):
Why Neural Translations are the Right Length, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

@InProceedings{shi-knight-yuret:2016:EMNLP2016,
author = {Shi, Xing and Knight, Kevin and Yuret, Deniz},
title = {Why Neural Translations are the Right Length},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {2278--2282},
url = {
https://aclweb.org/anthology/D16-1248},
year = 2016
}
Shi et al. (2016) correlated activation values of specific nodes in the state of a simple LSTM encoder-decoder translation model (without attention) with the length of the output and discovered nodes that count the number of words to ensure proper output length.
Tracing Decisions Back to Prior States:
Ding, Yanzhuo and Liu, Yang and Luan, Huanbo and Sun, Maosong (2017):
Visualizing and Understanding Neural Machine Translation, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

@InProceedings{ding-EtAl:2017:Long,
author = {Ding, Yanzhuo and Liu, Yang and Luan, Huanbo and Sun, Maosong},
title = {Visualizing and Understanding Neural Machine Translation},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {July},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
pages = {1150--1159},
url = {
http://aclweb.org/anthology/P17-1106},
year = 2017
}
Ding et al. (2017) propose to use layer-wise relevance feedback to measure which of the input states or intermediate states had the biggest influence on prediction decisions. Tackling the same problem,
Ding, Shuoyang and Xu, Hainan and Koehn, Philipp (2019):
Saliency-driven Word Alignment Interpretation for Neural Machine Translation, Proceedings of the Fourth Conference on Machine Translation

@InProceedings{ding-xu-koehn:2019:WMT,
author = {Ding, Shuoyang and Xu, Hainan and Koehn, Philipp},
title = {Saliency-driven Word Alignment Interpretation for Neural Machine Translation},
booktitle = {Proceedings of the Fourth Conference on Machine Translation},
month = {August},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
pages = {1--12},
url = {
http://www.aclweb.org/anthology/W19-5201},
year = 2019
}
Ding et al. (2019) propose to use saliency, a method that measures the impact of input states based on how much small changes in their values (as indicated by the gradients) impact prediction decisions.
Xutai Ma and Ke Li and Philipp Koehn (2018):
An Analysis of Source Context Dependency in Neural Machine Translation, Proceedings of the 21st Annual Conference of the European Association for Machine Translation

@inproceedings{eamt18-Ma,
author = {Xutai Ma and Ke Li and Philipp Koehn},
title = {An Analysis of Source Context Dependency in Neural Machine Translation},
booktitle = {Proceedings of the 21st Annual Conference of the European Association for Machine Translation},
location = {Alicante, Spain},
year = 2018
}
Ma et al. (2018) examine the relative role of source context and prior decoder states on output word predictions.
Knowles, Rebecca and Koehn, Philipp (2018):
Context and Copying in Neural Machine Translation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1339,
author = {Knowles, Rebecca and Koehn, Philipp},
title = {Context and Copying in Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1339},
pages = {3034--3041},
year = 2018
}
Knowles and Koehn (2018) explore what drives decisions of the model to copy input words such as names. They show the impact of both the context and properties of the word (such as capitalization.
Wallace, Eric and Feng, Shi and Boyd-Graber, Jordan (2018):
Interpreting Neural Networks with Nearest Neighbors, Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

@inproceedings{W18-5416,
author = {Wallace, Eric and Feng, Shi and Boyd-Graber, Jordan},
title = {Interpreting Neural Networks with Nearest Neighbors},
booktitle = {Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP},
month = {nov},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W18-5416},
pages = {136--144},
year = 2018
}
Wallace et al. (2018) change the way predictions are made in neural models. Instead of a softmax prediction layer, final decoder states are compared to states during training, providing examples that explain the decision of the network.
Challenges:
Koehn, Philipp and Knowles, Rebecca (2017):
Six Challenges for Neural Machine Translation, Proceedings of the First Workshop on Neural Machine Translation

@InProceedings{koehn-knowles:2017:NMT,
author = {Koehn, Philipp and Knowles, Rebecca},
title = {Six Challenges for Neural Machine Translation},
booktitle = {Proceedings of the First Workshop on Neural Machine Translation},
month = {August},
address = {Vancouver},
publisher = {Association for Computational Linguistics},
pages = {28--39},
url = {
http://www.aclweb.org/anthology/W17-3204},
year = 2017
}
Koehn and Knowles (2017) identify six challenges for neural machine translation, such as domain mismatch, low resource, beam search, etc.
Khayrallah, Huda and Koehn, Philipp (2018):
On the Impact of Various Types of Noise on Neural Machine Translation, Proceedings of the 2nd Workshop on Neural Machine Translation and Generation
mentioned in Corpus Cleaning and Analysis And Visualization@InProceedings{W18-2709,
author = {Khayrallah, Huda and Koehn, Philipp},
title = {On the Impact of Various Types of Noise on Neural Machine Translation},
booktitle = {Proceedings of the 2nd Workshop on Neural Machine Translation and Generation},
publisher = {Association for Computational Linguistics},
pages = {74--83},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/W18-2709},
year = 2018
}
Khayrallah and Koehn (2018) find that neural methods are more sensitive to noise that previous statistical methods, especially untranslated source sentences. The copy noise problem was also identified by
Ott, Myle and Auli, Michael and Grangier, David and Ranzato, Marc'Aurelio (2018):
Analyzing Uncertainty in Neural Machine Translation, Proceedings of the 35th International Conference on Machine Learning
mentioned in Corpus Cleaning, Inference and Analysis And Visualization@InProceedings{pmlr-v80-ott18a,
author = {Ott, Myle and Auli, Michael and Grangier, David and Ranzato, Marc'Aurelio},
title = {Analyzing Uncertainty in Neural Machine Translation},
booktitle = {Proceedings of the 35th International Conference on Machine Learning},
pages = {3956--3965},
editor = {Dy, Jennifer and Krause, Andreas},
volume = {80},
series = {Proceedings of Machine Learning Research},
address = {Stockholmsmässan, Stockholm Sweden},
month = {10--15 Jul},
publisher = {PMLR},
url = {
http://proceedings.mlr.press/v80/ott18a/ott18a.pdf},
year = 2018
}
Ott et al. (2018) who suggest several remedies.
Yonatan Belinkov and Yonatan Bisk (2018):
Synthetic and Natural Noise Both Break Neural Machine Translation, International Conference on Learning Representations
mentioned in Corpus Cleaning and Analysis And Visualization@inproceedings{belinkov2018synthetic,
author = {Yonatan Belinkov and Yonatan Bisk},
title = {Synthetic and Natural Noise Both Break Neural Machine Translation},
booktitle = {International Conference on Learning Representations},
url = {
https://openreview.net/forum?id=BJ8vJebC-},
year = 2018
}
Belinkov and Bisk (2018) consider natural and synthetic noise in the spelling of words and propose character-based word embedding models. They develop their own character models to address it, while
Georg Heigold and Guenter Neumann and Josef van Genabith and Stalin Varanasi (2018):
How Robust Are Character-BasedWord Embeddings in Tagging and MT Against Wrod Scramlbing or Randdm Nouse?, Annual Meeting of the Association for Machine Translation in the Americas (AMTA)

@inproceedings{AMTA2018-Heigold,
author = {Georg Heigold and Guenter Neumann and Josef van~Genabith and Stalin Varanasi},
title = {How Robust Are Character-BasedWord Embeddings in Tagging and {MT} Against Wrod Scramlbing or Randdm Nouse?},
booktitle = {Annual Meeting of the Association for Machine Translation in the Americas (AMTA)},
location = {Boston, USA},
url = {
https://arxiv.org/pdf/1704.04441.pdf},
year = 2018
}
Heigold et al. (2018) show that character-based models are better than byte-pair-encoding-based models.
Michel, Paul and Li, Xian and Neubig, Graham and Pino, Juan (2019):
On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

@inproceedings{michel-etal-2019-evaluation,
author = {Michel, Paul and Li, Xian and Neubig, Graham and Pino, Juan},
title = {On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models},
booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
month = {jun},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/N19-1314},
pages = {3103--3114},
year = 2019
}
Michel et al. (2019) develop metrics aimed to find minimal changes to the input that result in maximal changes in the output, so-called adversarial examples.
Michel, Paul and Neubig, Graham (2018):
MTNT: A Testbed for Machine Translation of Noisy Text, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1050,
author = {Michel, Paul and Neubig, Graham},
title = {MTNT: A Testbed for Machine Translation of Noisy Text},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1050},
pages = {543--553},
year = 2018
}
Michel and Neubig (2018) propose a test set of noisy text, derived from the web forum Reddit, consisting of acronyms, misspellings, hashtags, emoticons and exaggerated capitalization.
Marlies Van der Wees and Arianna Bisazza and Christof Monz (2018):
Evaluation of Machine Translation Performance Across Multiple Genres and Languages, Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

@InProceedings{LREC2018-VAN-DER-WEES18.853,
author = {Marlies Van der Wees and Arianna Bisazza and Christof Monz},
title = {Evaluation of Machine Translation Performance Across Multiple Genres and Languages},
booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
month = {May 7-12, 2018},
address = {Miyazaki, Japan},
publisher = {European Language Resources Association (ELRA)},
isbn = {979-10-95546-00-9},
language = {english},
url = {
https://www.aclweb.org/anthology/L18-1604},
year = 2018
}
Wees et al. (2018) propose a test set for 4 different domains (news, colloquial, editorial, and speech) for 4 different language (Arabic, Chinese, Bulgarian, Persian) as challenge for research in adaptation.
Wei, Johnny and Pham, Khiem and O'Connor, Brendan and Dillon, Brian (2018):
Evaluating Grammaticality in Seq2seq Models with a Broad Coverage HPSG Grammar: A Case Study on Machine Translation, Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

@inproceedings{W18-5432,
author = {Wei, Johnny and Pham, Khiem and O{'}Connor, Brendan and Dillon, Brian},
title = {Evaluating Grammaticality in Seq2seq Models with a Broad Coverage HPSG Grammar: A Case Study on Machine Translation},
booktitle = {Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP},
month = {nov},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W18-5432},
pages = {298--305},
year = 2018
}
Wei et al. (2018) examine if the output of neural machine translation models is syntactically well formed by parsing it with is linguistically precise HPSG grammar. While they find that 93% of output sentence conform to the grammar, they also identify a number of constructions that pose challenges.
Benchmarks
Discussion
Related Topics
New Publications
Freitag, Markus and Caswell, Isaac and Roy, Scott (2019):
APE at Scale and Its Implications on MT Evaluation Biases, Proceedings of the Fourth Conference on Machine Translation

@InProceedings{freitag-caswell-roy:2019:WMT,
author = {Freitag, Markus and Caswell, Isaac and Roy, Scott},
title = {APE at Scale and Its Implications on MT Evaluation Biases},
booktitle = {Proceedings of the Fourth Conference on Machine Translation},
month = {August},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
pages = {34--44},
url = {
http://www.aclweb.org/anthology/W19-5204},
year = 2019
}
Freitag et al. (2019)
Hashimoto, Kazuma and Buschiazzo, Raffaella and Bradbury, James and Marshall, Teresa and Socher, Richard and Xiong, Caiming (2019):
A High-Quality Multilingual Dataset for Structured Documentation Translation, Proceedings of the Fourth Conference on Machine Translation

@InProceedings{hashimoto-EtAl:2019:WMT,
author = {Hashimoto, Kazuma and Buschiazzo, Raffaella and Bradbury, James and Marshall, Teresa and Socher, Richard and Xiong, Caiming},
title = {A High-Quality Multilingual Dataset for Structured Documentation Translation},
booktitle = {Proceedings of the Fourth Conference on Machine Translation},
month = {August},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
pages = {116--127},
url = {
http://www.aclweb.org/anthology/W19-5212},
year = 2019
}
Hashimoto et al. (2019)
Zhang, Mike and Toral, Antonio (2019):
The Effect of Translationese in Machine Translation Test Sets, Proceedings of the Fourth Conference on Machine Translation

@InProceedings{zhang-toral:2019:WMT,
author = {Zhang, Mike and Toral, Antonio},
title = {The Effect of Translationese in Machine Translation Test Sets},
booktitle = {Proceedings of the Fourth Conference on Machine Translation},
month = {August},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
pages = {73--81},
url = {
http://www.aclweb.org/anthology/W19-5208},
year = 2019
}
Zhang and Toral (2019)
Anastasopoulos, Antonios (2019):
An Analysis of Source-Side Grammatical Errors in NMT, Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

@inproceedings{anastasopoulos-2019-analysis,
author = {Anastasopoulos, Antonios},
title = {An Analysis of Source-Side Grammatical Errors in {NMT}},
booktitle = {Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP},
month = {aug},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W19-4822},
pages = {213--223},
year = 2019
}
Anastasopoulos (2019)
Clark, Kevin and Khandelwal, Urvashi and Levy, Omer and Manning, Christopher D. (2019):
What Does BERT Look at? An Analysis of BERT's Attention, Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

@inproceedings{clark-etal-2019-bert,
author = {Clark, Kevin and Khandelwal, Urvashi and Levy, Omer and Manning, Christopher D.},
title = {What Does {BERT} Look at? An Analysis of {BERT}{'}s Attention},
booktitle = {Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP},
month = {aug},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W19-4828},
pages = {276--286},
year = 2019
}
Clark et al. (2019)
Li, Xintong and Li, Guanlin and Liu, Lemao and Meng, Max and Shi, Shuming (2019):
On the Word Alignment from Neural Machine Translation, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{li-etal-2019-word,
author = {Li, Xintong and Li, Guanlin and Liu, Lemao and Meng, Max and Shi, Shuming},
title = {On the Word Alignment from Neural Machine Translation},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1124},
pages = {1293--1303},
year = 2019
}
Li et al. (2019)
Voita, Elena and Talbot, David and Moiseev, Fedor and Sennrich, Rico and Titov, Ivan (2019):
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{voita-etal-2019-analyzing,
author = {Voita, Elena and Talbot, David and Moiseev, Fedor and Sennrich, Rico and Titov, Ivan},
title = {Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1580},
pages = {5797--5808},
year = 2019
}
Voita et al. (2019)
Kepler, Fabio and Trénous, Jonay and Treviso, Marcos and Vera, Miguel and Martins, André F. T. (2019):
OpenKiwi: An Open Source Framework for Quality Estimation, Proceedings of the 57th Conference of the Association for Computational Linguistics: System Demonstrations

@inproceedings{kepler-etal-2019-openkiwi,
author = {Kepler, Fabio and Tr{\'e}nous, Jonay and Treviso, Marcos and Vera, Miguel and Martins, Andr{\'e} F. T.},
title = {O}pen{Kiwi: An Open Source Framework for Quality Estimation},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics: System Demonstrations},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-3020},
pages = {117--122},
year = 2019
}
Kepler et al. (2019)
Mara Chinea-Rios and Germán Sanchis-Trilles and Francisco Casacuberta (2018):
Creating the best development corpus for Statistical Machine Translation systems, Proceedings of the 21st Annual Conference of the European Association for Machine Translation

@inproceedings{eamt18-Chinea-Rios,
author = {Mara Chinea-Rios and Germ{\'a}n Sanchis-Trilles and Francisco Casacuberta},
title = {Creating the best development corpus for Statistical Machine Translation systems},
booktitle = {Proceedings of the 21st Annual Conference of the European Association for Machine Translation},
location = {Alicante, Spain},
url = {
https://rua.ua.es/dspace/bitstream/10045/76033/1/EAMT2018-Proceedings\_12.pdf},
year = 2018
}
Chinea-Rios et al. (2018)
Grundkiewicz, Roman and Junczys-Dowmunt, Marcin (2018):
Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

@InProceedings{N18-2046,
author = {Grundkiewicz, Roman and Junczys-Dowmunt, Marcin},
title = {Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)},
publisher = {Association for Computational Linguistics},
pages = {284--290},
location = {New Orleans, Louisiana},
url = {
http://aclweb.org/anthology/N18-2046},
year = 2018
}
Grundkiewicz and Junczys-Dowmunt (2018)
Pham, Minh Quang and Crego, Josep and Senellart, Jean and Yvon, François (2018):
Fixing Translation Divergences in Parallel Corpora for Neural MT, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1328,
author = {Pham, Minh Quang and Crego, Josep and Senellart, Jean and Yvon, Fran{\c{c}}ois},
title = {Fixing Translation Divergences in Parallel Corpora for Neural MT},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1328},
pages = {2967--2973},
year = 2018
}
Pham et al. (2018)
Jauregi Unanue, Inigo and Zare Borzeshi, Ehsan and Piccardi, Massimo (2018):
A Shared Attention Mechanism for Interpretation of Neural Automatic Post-Editing Systems, Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

@InProceedings{W18-2702,
author = {Jauregi Unanue, Inigo and Zare Borzeshi, Ehsan and Piccardi, Massimo},
title = {A Shared Attention Mechanism for Interpretation of Neural Automatic Post-Editing Systems},
booktitle = {Proceedings of the 2nd Workshop on Neural Machine Translation and Generation},
publisher = {Association for Computational Linguistics},
pages = {11--17},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/W18-2702},
year = 2018
}
Unanue et al. (2018)
Domhan, Tobias (2018):
How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

@InProceedings{P18-1167,
author = {Domhan, Tobias},
title = {How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
publisher = {Association for Computational Linguistics},
pages = {1799--1808},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/P18-1167},
year = 2018
}
Domhan (2018)
Tang, Gongbo and Müller, Mathias and Rios, Annette and Sennrich, Rico (2018):
Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1458,
author = {Tang, Gongbo and M{\"u}ller, Mathias and Rios, Annette and Sennrich, Rico},
title = {Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1458},
pages = {4263--4272},
year = 2018
}
Tang et al. (2018)
Alignment
Alkhouli, Tamer and Bretschner, Gabriel and Ney, Hermann (2018):
On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation, Proceedings of the Third Conference on Machine Translation: Research Papers

@inproceedings{W18-6318,
author = {Alkhouli, Tamer and Bretschner, Gabriel and Ney, Hermann},
title = {On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation},
booktitle = {Proceedings of the Third Conference on Machine Translation: Research Papers},
month = {oct},
address = {Belgium, Brussels},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W18-6318},
pages = {177--185},
year = 2018
}
Alkhouli et al. (2018)
Linguistic Properties of Hidden Representations
Eger, Steffen and Hoenen, Armin and Mehler, Alexander (2016):
Language classification from bilingual word embedding graphs, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

@InProceedings{eger-hoenen-mehler:2016:COLING,
author = {Eger, Steffen and Hoenen, Armin and Mehler, Alexander},
title = {Language classification from bilingual word embedding graphs},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {3507--3518},
url = {
http://aclweb.org/anthology/C16-1331},
year = 2016
}
Eger et al. (2016)
Translation Quality
Rabinovich, Ella and Nisioi, Sergiu and Ordan, Noam and Wintner, Shuly (2016):
On the Similarities Between Native, Non-native and Translated Texts, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

@InProceedings{rabinovich-EtAl:2016:P16-1,
author = {Rabinovich, Ella and Nisioi, Sergiu and Ordan, Noam and Wintner, Shuly},
title = {On the Similarities Between Native, Non-native and Translated Texts},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {1870--1881},
url = {
http://www.aclweb.org/anthology/P16-1176},
year = 2016
}
Rabinovich et al. (2016)
Mara Chinea-Rios and \'Alvaro Peris and Francisco Casacuberta (2018):
Are Automatic Metrics Robust and Reliable in Specific Machine Translation Tasks?, Proceedings of the 21st Annual Conference of the European Association for Machine Translation

@inproceedings{eamt18-Chinea-Rios2,
author = {Mara Chinea-Rios and {\'A}lvaro Peris and Francisco Casacuberta},
title = {Are Automatic Metrics Robust and Reliable in Specific Machine Translation Tasks?},
booktitle = {Proceedings of the 21st Annual Conference of the European Association for Machine Translation},
location = {Alicante, Spain},
url = {
https://rua.ua.es/dspace/bitstream/10045/76022/1/EAMT2018-Proceedings\_11.pdf},
year = 2018
}
Chinea-Rios et al. (2018)
Guta, Andreas and Alkhouli, Tamer and Peter, Jan-Thorsten and Wuebker, Joern and Ney, Hermann (2015):
A Comparison between Count and Neural Network Models Based on Joint Translation and Reordering Sequences, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

@InProceedings{guta-EtAl:2015:EMNLP,
author = {Guta, Andreas and Alkhouli, Tamer and Peter, Jan-Thorsten and Wuebker, Joern and Ney, Hermann},
title = {A Comparison between Count and Neural Network Models Based on Joint Translation and Reordering Sequences},
booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing},
month = {September},
address = {Lisbon, Portugal},
publisher = {Association for Computational Linguistics},
pages = {1401--1411},
url = {
http://aclweb.org/anthology/D15-1165},
year = 2015
}
Guta et al. (2015)
Evaluation Metrics
Shimanaka, Hiroki and Kajiwara, Tomoyuki and Komachi, Mamoru (2018):
Metric for Automatic Machine Translation Evaluation based on Universal Sentence Representations, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

@InProceedings{N18-4015,
author = {Shimanaka, Hiroki and Kajiwara, Tomoyuki and Komachi, Mamoru},
title = {Metric for Automatic Machine Translation Evaluation based on Universal Sentence Representations},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop},
publisher = {Association for Computational Linguistics},
pages = {106--111},
location = {New Orleans, Louisiana, USA},
url = {
http://aclweb.org/anthology/N18-4015},
year = 2018
}
Shimanaka et al. (2018)
Apidianaki, Marianna and Wisniewski, Guillaume and Cocos, Anne and Callison-Burch, Chris (2018):
Automated Paraphrase Lattice Creation for HyTER Machine Translation Evaluation, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

@InProceedings{N18-2077,
author = {Apidianaki, Marianna and Wisniewski, Guillaume and Cocos, Anne and Callison-Burch, Chris},
title = {Automated Paraphrase Lattice Creation for HyTER Machine Translation Evaluation},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)},
publisher = {Association for Computational Linguistics},
pages = {480--485},
location = {New Orleans, Louisiana},
url = {
http://aclweb.org/anthology/N18-2077},
year = 2018
}
Apidianaki et al. (2018)
Forcada, Mikel L. and Scarton, Carolina and Specia, Lucia and Haddow, Barry and Birch, Alexand ra (2018):
Exploring gap filling as a cheaper alternative to reading comprehension questionnaires when evaluating machine translation for gisting, Proceedings of the Third Conference on Machine Translation: Research Papers

@inproceedings{W18-6320,
author = {Forcada, Mikel L. and Scarton, Carolina and Specia, Lucia and Haddow, Barry and Birch, Alexand ra},
title = {Exploring gap filling as a cheaper alternative to reading comprehension questionnaires when evaluating machine translation for gisting},
booktitle = {P