Vocabulary
The large number of words in natural language vocabulary is a challenge for the vector space representations used in neural networks. Several strategies have been explored to handle large vocabulary or resort to sub-word representations of words.
Vocabulary is the main subject of 47 publications. 22 are discussed here.
Publications
Special Handling of Rare Words:
A significant limitation of neural machine translation models is the computational burden to support very large vocabularies. To avoid this, the vocabulary may be reduced to a shortlist of, say, 20,000 words, and the remaining tokens are replaced with the unknown word token "UNK". To translate such an unknown word,
Luong, Thang and Sutskever, Ilya and Le, Quoc V. and Vinyals, Oriol and Zaremba, Wojciech (2015):
Addressing the Rare Word Problem in Neural Machine Translation, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
@InProceedings{luong-EtAl:2015:ACL-IJCNLP,
author = {Luong, Thang and Sutskever, Ilya and Le, Quoc V. and Vinyals, Oriol and Zaremba, Wojciech},
title = {Addressing the Rare Word Problem in Neural Machine Translation},
booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
month = {July},
address = {Beijing, China},
publisher = {Association for Computational Linguistics},
pages = {11--19},
url = {
http://www.aclweb.org/anthology/P15-1002},
year = 2015
}
Luong et al. (2015);
Jean, Sébastien and Cho, Kyunghyun and Memisevic, Roland and Bengio, Yoshua (2015):
On Using Very Large Target Vocabulary for Neural Machine Translation, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
@InProceedings{jean-EtAl:2015:ACL-IJCNLP,
author = {Jean, S\'{e}bastien and Cho, Kyunghyun and Memisevic, Roland and Bengio, Yoshua},
title = {On Using Very Large Target Vocabulary for Neural Machine Translation},
booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
month = {July},
address = {Beijing, China},
publisher = {Association for Computational Linguistics},
pages = {1--10},
url = {
http://www.aclweb.org/anthology/P15-1001},
year = 2015
}
Jean et al. (2015) resort to a separate dictionary.
Arthur, Philip and Neubig, Graham and Nakamura, Satoshi (2016):
Incorporating Discrete Translation Lexicons into Neural Machine Translation, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
@InProceedings{arthur-neubig-nakamura:2016:EMNLP2016,
author = {Arthur, Philip and Neubig, Graham and Nakamura, Satoshi},
title = {Incorporating Discrete Translation Lexicons into Neural Machine Translation},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {1557--1567},
url = {
https://aclweb.org/anthology/D16-1162},
year = 2016
}
Arthur et al. (2016) argue that neural translation models are worse for rare words and interpolate a traditional probabilistic bilingual dictionary with the prediction of the neural machine translation model. They use the attention mechanism to link each target word to a distribution of source words and weigh the word translations accordingly.
Source words such as names and numbers may also be directly copied into the target.
Gulcehre, Caglar and Ahn, Sungjin and Nallapati, Ramesh and Zhou, Bowen and Bengio, Yoshua (2016):
Pointing the Unknown Words, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
@InProceedings{gulcehre-EtAl:2016:P16-1,
author = {Gulcehre, Caglar and Ahn, Sungjin and Nallapati, Ramesh and Zhou, Bowen and Bengio, Yoshua},
title = {Pointing the Unknown Words},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {140--149},
url = {
http://www.aclweb.org/anthology/P16-1014},
year = 2016
}
Gulcehre et al. (2016) use a so-called switching network to predict either a traditional translation operation or a copying operation aided by a softmax layer over the source sentence. They preprocess the training data to change some target words into word positions of copied source words. Similarly,
Gu, Jiatao and Lu, Zhengdong and Li, Hang and Li, Victor O.K. (2016):
Incorporating Copying Mechanism in Sequence-to-Sequence Learning, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
@InProceedings{gu-EtAl:2016:P16-1,
author = {Gu, Jiatao and Lu, Zhengdong and Li, Hang and Li, Victor O.K.},
title = {Incorporating Copying Mechanism in Sequence-to-Sequence Learning},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {1631--1640},
url = {
http://www.aclweb.org/anthology/P16-1154},
year = 2016
}
Gu et al. (2016) augment the word prediction step of the neural translation model to either translate a word or copy a source word. They observe that the attention mechanism is mostly driven by semantics and the language model in the case of word translation, but by location in case of copying.
Subwords:
Sennrich, Rico and Haddow, Barry and Birch, Alexandra (2016):
Neural Machine Translation of Rare Words with Subword Units, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
@InProceedings{sennrich-haddow-birch:2016:P16-12,
author = {Sennrich, Rico and Haddow, Barry and Birch, Alexandra},
title = {Neural Machine Translation of Rare Words with Subword Units},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {1715--1725},
url = {
http://www.aclweb.org/anthology/P16-1162},
year = 2016
}
Sennrich et al. (2016) split up all words to sub-word units, using character n-gram models and a segmentation based on the byte pair encoding compression algorithm.
M. Schuster and K. Nakajima (2012):
Japanese and Korean voice search, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
@INPROCEEDINGS{WordPiece,
author = {M. Schuster and K. Nakajima},
title = {Japanese and Korean voice search},
booktitle = {2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages = {5149-5152},
doi = {10.1109/ICASSP.2012.6289079},
issn = {2379-190X},
url = {
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6289079},
month = {March},
year = 2012
}
Schuster and Nakajima (2012) developed a similar method originally for speech recognition, called word piece or sentence piece, that also starts with breaking up all words into character strings and join them together to obtain a lower perplexity unigram language model trained on the data.
Kudo, Taku and Richardson, John (2018):
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
@inproceedings{D18-2012,
author = {Kudo, Taku and Richardson, John},
title = {SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-2012},
pages = {66--71},
year = 2018
}
Kudo and Richardson (2018) present a toolkit for the sentence piece method and describe it in more detail.
Kudo, Taku (2018):
Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
@InProceedings{P18-1007,
author = {Kudo, Taku},
title = {Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
publisher = {Association for Computational Linguistics},
pages = {66--75},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/P18-1007},
year = 2018
}
Kudo (2018) propose subword regularization that samples different subword segmentation during training to allow for richer data to learn smaller subword units.
Morishita, Makoto and Suzuki, Jun and Nagata, Masaaki (2018):
Improving Neural Machine Translation by Incorporating Hierarchical Subword Features, Proceedings of the 27th International Conference on Computational Linguistics
@inproceedings{C18-1052,
author = {Morishita, Makoto and Suzuki, Jun and Nagata, Masaaki},
title = {Improving Neural Machine Translation by Incorporating Hierarchical Subword Features},
booktitle = {Proceedings of the 27th International Conference on Computational Linguistics},
month = {aug},
address = {Santa Fe, New Mexico, USA},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/C18-1052},
pages = {618--629},
year = 2018
}
Morishita et al. (2018) use different granularities of subword segmentation (using 16,000, 1000, and 300 operations) in the model and during decoding for the input words and the output word conditioning by summing up the different representations (a single subword from the large vocabulary may decompose into multiple subwords from the smaller vocabularies).
Duygu Ataman and Matteo Negri and Marco Turchi and Marcello Federico (2017):
Linguistically Motivated Vocabulary Reduction for Neural Machine Translation from Turkish to English, The Prague Bulletin of Mathematical Linguistics
@article{pbml-108-ataman,
author = {Duygu Ataman and Matteo Negri and Marco Turchi and Marcello Federico},
title = {Linguistically Motivated Vocabulary Reduction for Neural Machine Translation from {Turkish} to {English}},
url = {
https://ufal.mff.cuni.cz/pbml/108/art-ataman-negri-turchi-federico.pdf},
pages = {331-342},
journal = {The Prague Bulletin of Mathematical Linguistics},
volume = {108},
month = {June},
year = 2017
}
Ataman et al. (2017) proposes a linguistically motivated vocabulary reduction methods that models word formation as a sequence of stem and morphemes with a hidden Markov model, which can be optimized for a fixed target vocabulary size.
Duygu Ataman and Marcello Federico (2018):
An Evaluation of Two Vocabulary Reduction Methods for Neural Machine Translation, Annual Meeting of the Association for Machine Translation in the Americas (AMTA)
@inproceedings{AMTA2018-Federico,
author = {Duygu Ataman and Marcello Federico},
title = {An Evaluation of Two Vocabulary Reduction Methods for Neural Machine Translation},
booktitle = {Annual Meeting of the Association for Machine Translation in the Americas (AMTA)},
url = {
http://www.aclweb.org/anthology/W18-1810},
location = {Boston, USA},
year = 2018
}
Ataman and Federico (2018) show that this method outperforms byte pair encoding for several morphologically rich language pairs.
Banerjee, Tamali and Bhattacharyya, Pushpak (2018):
Meaningless yet meaningful: Morphology grounded subword-level NMT, Proceedings of the Second Workshop on Subword/Character LEvel Models
@inproceedings{W18-1207,
author = {Banerjee, Tamali and Bhattacharyya, Pushpak},
title = {Meaningless yet meaningful: Morphology grounded subword-level NMT},
booktitle = {Proceedings of the Second Workshop on Subword/Character LEvel Models},
month = {jun},
address = {New Orleans},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W18-1207},
doi = {10.18653/v1/W18-1207},
pages = {55--60},
year = 2018
}
Banerjee and Bhattacharyya (2018) also not that morphologically inspired segmentation, as provided by a tool called Morfessor
(Virpioja et al., 2013), sometimes gives better results than byte pair encoding, and that both methods combined may outperform either.
Nikolov, Nikola and Hu, Yuhuang and Tan, Mi Xue and Hahnloser, Richard H.R. (2018):
Character-level Chinese-English Translation through ASCII Encoding, Proceedings of the Third Conference on Machine Translation: Research Papers
@inproceedings{W18-6302,
author = {Nikolov, Nikola and Hu, Yuhuang and Tan, Mi Xue and Hahnloser, Richard H.R.},
title = {Character-level Chinese-English Translation through ASCII Encoding},
booktitle = {Proceedings of the Third Conference on Machine Translation: Research Papers},
month = {oct},
address = {Belgium, Brussels},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W18-6302},
pages = {10--16},
year = 2018
}
Nikolov et al. (2018);
Zhang, Longtu and Komachi, Mamoru (2018):
Neural Machine Translation of Logographic Language Using Sub-character Level Information, Proceedings of the Third Conference on Machine Translation: Research Papers
@inproceedings{W18-6303,
author = {Zhang, Longtu and Komachi, Mamoru},
title = {Neural Machine Translation of Logographic Language Using Sub-character Level Information},
booktitle = {Proceedings of the Third Conference on Machine Translation: Research Papers},
month = {oct},
address = {Belgium, Brussels},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W18-6303},
pages = {17--25},
year = 2018
}
Zhang and Komachi (2018) extend the idea of splitting up words to logographic languages such as Chinese by allowing breaking up characters based on their romanized version or decomposition into strokes.
Character-Based Models:
Generating word representations from their character sequence has been originally proposed for machine translation by
Costa-jussà, Marta R. and España-Bonet, Cristina and Madhyastha, Pranava and Escolano, Carlos and Fonollosa, José A. R. (2016):
The TALP--UPC Spanish--English WMT Biomedical Task: Bilingual Embeddings and Char-based Neural Language Model Rescoring in a Phrase-based System, Proceedings of the First Conference on Machine Translation
@InProceedings{costajussa-EtAl:2016:WMT,
author = {Costa-juss\`{a}, Marta R. and Espa\~{n}a-Bonet, Cristina and Madhyastha, Pranava and Escolano, Carlos and Fonollosa, Jos\'{e} A. R.},
title = {The TALP--UPC Spanish--English WMT Biomedical Task: Bilingual Embeddings and Char-based Neural Language Model Rescoring in a Phrase-based System},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {463--468},
url = {
http://www.aclweb.org/anthology/W/W16/W16-2336},
year = 2016
}
Costa-jussà et al. (2016). They use a convolutional neural network to encode input words, but
Costa-jussà, Marta R. and Fonollosa, José A. R. (2016):
Character-based Neural Machine Translation, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
@InProceedings{costajussa-fonollosa:2016:P16-2,
author = {Costa-juss\`{a}, Marta R. and Fonollosa, Jos\'{e} A. R.},
title = {Character-based Neural Machine Translation},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {357--361},
url = {
http://anthology.aclweb.org/P16-2058},
year = 2016
}
Costa-jussà and Fonollosa (2016) show success also with character-based language models in reranking machine translation .
Chung, Junyoung and Cho, Kyunghyun and Bengio, Yoshua (2016):
A Character-level Decoder without Explicit Segmentation for Neural Machine Translation, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
@InProceedings{chung-cho-bengio:2016:P16-1,
author = {Chung, Junyoung and Cho, Kyunghyun and Bengio, Yoshua},
title = {A Character-level Decoder without Explicit Segmentation for Neural Machine Translation},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {1693--1703},
url = {
http://www.aclweb.org/anthology/P16-1160},
year = 2016
}
Chung et al. (2016) propose using a recurrent neural network to encode target words and also propose a bi-scale decoder where a fast layer outputs a character at a time, while a slow layer outputs a word at a time.
Duygu Ataman and Mattia Antonino Di Gangi and Marcello Federico (2018):
Compositional Source Word Representations for Neural Machine Translation, Proceedings of the 21st Annual Conference of the European Association for Machine Translation
@inproceedings{eamt18-Ataman,
author = {Duygu Ataman and Mattia Antonino Di Gangi and Marcello Federico},
title = {Compositional Source Word Representations for Neural Machine Translation},
booktitle = {Proceedings of the 21st Annual Conference of the European Association for Machine Translation},
url = {
https://arxiv.org/pdf/1805.02036.pdf},
location = {Alicante, Spain},
year = 2018
}
Ataman et al. (2018);
Ataman, Duygu and Federico, Marcello (2018):
Compositional Representation of Morphologically-Rich Input for Neural Machine Translation, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
@InProceedings{P18-2049,
author = {Ataman, Duygu and Federico, Marcello},
title = {Compositional Representation of Morphologically-Rich Input for Neural Machine Translation},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
publisher = {Association for Computational Linguistics},
pages = {305--311},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/P18-2049},
year = 2018
}
Ataman and Federico (2018) show good results with a recurrent neural network over character trigrams for input words but not output words.
Benchmarks
Discussion
Related Topics
New Publications
Durgar El-Kahlout, \.Ilknur and Bektaş, Emre and Erdem, Naime \cSeyma and Kaya, Hamza (2019):
Translating Between Morphologically Rich Languages: An Arabic-to-Turkish Machine Translation System, Proceedings of the Fourth Arabic Natural Language Processing Workshop
@inproceedings{durgar-el-kahlout-etal-2019-translating,
author = {Durgar El-Kahlout, {\.I}lknur and Bekta{\c{s}}, Emre and Erdem, Naime {\c{S}}eyma and Kaya, Hamza},
title = {Translating Between Morphologically Rich Languages: An {A}rabic-to-{T}urkish Machine Translation System},
booktitle = {Proceedings of the Fourth Arabic Natural Language Processing Workshop},
month = {aug},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W19-4617},
pages = {158--166},
year = 2019
}
El-Kahlout et al. (2019)
Julia Kreutzer and Artem Sokolov (2018):
Learning to Segment Inputs for NMT Shows Preference for Character-Level Processing, Proceedings of the International Workshop on Spoken Language Translation (IWSLT)
@inproceedings{iwslt18-Segment-Kreutzer,
author = {Julia Kreutzer and Artem Sokolov},
title = {Learning to Segment Inputs for NMT Shows Preference for Character-Level Processing},
booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)},
year = 2018
}
Kreutzer and Sokolov (2018)
Tang, Gongbo and Cap, Fabienne and Pettersson, Eva and Nivre, Joakim (2018):
An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization, Proceedings of the 27th International Conference on Computational Linguistics
@inproceedings{C18-1112,
author = {Tang, Gongbo and Cap, Fabienne and Pettersson, Eva and Nivre, Joakim},
title = {An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization},
booktitle = {Proceedings of the 27th International Conference on Computational Linguistics},
month = {aug},
address = {Santa Fe, New Mexico, USA},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/C18-1112},
pages = {1320--1331},
year = 2018
}
Tang et al. (2018)
Ugawa, Arata and Tamura, Akihiro and Ninomiya, Takashi and Takamura, Hiroya and Okumura, Manabu (2018):
Neural Machine Translation Incorporating Named Entity, Proceedings of the 27th International Conference on Computational Linguistics
@inproceedings{C18-1274,
author = {Ugawa, Arata and Tamura, Akihiro and Ninomiya, Takashi and Takamura, Hiroya and Okumura, Manabu},
title = {Neural Machine Translation Incorporating Named Entity},
booktitle = {Proceedings of the 27th International Conference on Computational Linguistics},
month = {aug},
address = {Santa Fe, New Mexico, USA},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/C18-1274},
pages = {3240--3250},
year = 2018
}
Ugawa et al. (2018)
Angli Liu and Katrin Kirchhoff (2018):
Context Models for OOV Word Translation in Low-Resource Languages, Annual Meeting of the Association for Machine Translation in the Americas (AMTA)
@inproceedings{AMTA2018-Liu,
author = {Angli Liu and Katrin Kirchhoff},
title = {Context Models for OOV Word Translation in Low-Resource Languages},
booktitle = {Annual Meeting of the Association for Machine Translation in the Americas (AMTA)},
location = {Boston, USA},
year = 2018
}
Liu and Kirchhoff (2018)
Nguyen, Toan and Chiang, David (2018):
Improving Lexical Choice in Neural Machine Translation, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
@InProceedings{N18-1031,
author = {Nguyen, Toan and Chiang, David},
title = {Improving Lexical Choice in Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)},
publisher = {Association for Computational Linguistics},
pages = {334--343},
location = {New Orleans, Louisiana},
url = {
http://aclweb.org/anthology/N18-1031},
year = 2018
}
Nguyen and Chiang (2018)
Liu, Frederick and Lu, Han and Neubig, Graham (2018):
Handling Homographs in Neural Machine Translation, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
@InProceedings{N18-1121,
author = {Liu, Frederick and Lu, Han and Neubig, Graham},
title = {Handling Homographs in Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)},
publisher = {Association for Computational Linguistics},
pages = {1336--1345},
location = {New Orleans, Louisiana},
url = {
http://aclweb.org/anthology/N18-1121},
year = 2018
}
Liu et al. (2018)
Pham, Ngoc-Quan and Niehues, Jan and Waibel, Alex (2018):
Towards one-shot learning for rare-word translation with external experts, Proceedings of the 2nd Workshop on Neural Machine Translation and Generation
@InProceedings{W18-2712,
author = {Pham, Ngoc-Quan and Niehues, Jan and Waibel, Alex},
title = {Towards one-shot learning for rare-word translation with external experts},
booktitle = {Proceedings of the 2nd Workshop on Neural Machine Translation and Generation},
publisher = {Association for Computational Linguistics},
pages = {100--109},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/W18-2712},
year = 2018
}
Pham et al. (2018)
Zhao, Yang and Zhang, Jiajun and He, Zhongjun and Zong, Chengqing and Wu, Hua (2018):
Addressing Troublesome Words in Neural Machine Translation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
@inproceedings{D18-1036,
author = {Zhao, Yang and Zhang, Jiajun and He, Zhongjun and Zong, Chengqing and Wu, Hua},
title = {Addressing Troublesome Words in Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1036},
pages = {391--400},
year = 2018
}
Zhao et al. (2018)
Character-Based Models
Lee, Jason and Cho, Kyunghyun and Hofmann, Thomas (2017):
Fully Character-Level Neural Machine Translation without Explicit Segmentation, Transactions of the Association for Computational Linguistics
@article{TACL1051,
author = {Lee, Jason and Cho, Kyunghyun and Hofmann, Thomas },
title = {Fully Character-Level Neural Machine Translation without Explicit Segmentation},
journal = {Transactions of the Association for Computational Linguistics},
volume = {5},
keywords = {{}},
issn = {2307-387X},
url = {
https://transacl.org/ojs/index.php/tacl/article/view/1051},
pages = {365--378},
year = 2017
}
Lee et al. (2017)
Ebrahimi, Javid and Lowd, Daniel and Dou, Dejing (2018):
On Adversarial Examples for Character-Level Neural Machine Translation, Proceedings of the 27th International Conference on Computational Linguistics
@inproceedings{C18-1055,
author = {Ebrahimi, Javid and Lowd, Daniel and Dou, Dejing},
title = {On Adversarial Examples for Character-Level Neural Machine Translation},
booktitle = {Proceedings of the 27th International Conference on Computational Linguistics},
month = {aug},
address = {Santa Fe, New Mexico, USA},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/C18-1055},
pages = {653--663},
year = 2018
}
Ebrahimi et al. (2018)
Cherry, Colin and Foster, George and Bapna, Ankur and Firat, Orhan and Macherey, Wolfgang (2018):
Revisiting Character-Based Neural Machine Translation with Capacity and Compression, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
@inproceedings{D18-1461,
author = {Cherry, Colin and Foster, George and Bapna, Ankur and Firat, Orhan and Macherey, Wolfgang},
title = {Revisiting Character-Based Neural Machine Translation with Capacity and Compression},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1461},
pages = {4295--4305},
year = 2018
}
Cherry et al. (2018)
Passban, Peyman and Liu, Qun and Way, Andy (2018):
Improving Character-Based Decoding Using Target-Side Morphological Information for Neural Machine Translation, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
@InProceedings{N18-1006,
author = {Passban, Peyman and Liu, Qun and Way, Andy},
title = {Improving Character-Based Decoding Using Target-Side Morphological Information for Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)},
publisher = {Association for Computational Linguistics},
pages = {58--68},
location = {New Orleans, Louisiana},
url = {
http://aclweb.org/anthology/N18-1006},
year = 2018
}
Passban et al. (2018)
Yang, Zhen and Chen, Wei and Wang, Feng and Xu, Bo (2016):
A Character-Aware Encoder for Neural Machine Translation, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
@InProceedings{yang-EtAl:2016:COLING,
author = {Yang, Zhen and Chen, Wei and Wang, Feng and Xu, Bo},
title = {A Character-Aware Encoder for Neural Machine Translation},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {3063--3070},
url = {
http://aclweb.org/anthology/C16-1288},
year = 2016
}
Yang et al. (2016)
Luong, Minh-Thang and Manning, Christopher D. (2016):
Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
@InProceedings{luong-manning:2016:P16-1,
author = {Luong, Minh-Thang and Manning, Christopher D.},
title = {Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {1054--1063},
url = {
http://www.aclweb.org/anthology/P16-1100},
year = 2016
}
Luong and Manning (2016)
Jason Lee and Kyunghyun Cho and Thomas Hofmann (2016):
Fully Character-Level Neural Machine Translation without Explicit Segmentation, CoRR
@article{DBLP:journals/corr/LeeCH16,
author = {Jason Lee and Kyunghyun Cho and Thomas Hofmann},
title = {Fully Character-Level Neural Machine Translation without Explicit Segmentation},
journal = {CoRR},
volume = {abs/1610.03017},
url = {
http://arxiv.org/abs/1610.03017},
timestamp = {Wed, 02 Nov 2016 09:51:26 +0100},
biburl = {
http://dblp.uni-trier.de/rec/bib/journals/corr/LeeCH16},
bibsource = {dblp computer science bibliography,
http://dblp.org},
year = 2016
}
Lee et al. (2016)
Eriguchi, Akiko and Hashimoto, Kazuma and Tsuruoka, Yoshimasa (2016):
Character-based Decoding in Tree-to-Sequence Attention-based Neural Machine Translation, Proceedings of the 3rd Workshop on Asian Translation (WAT2016)
@InProceedings{eriguchi-hashimoto-tsuruoka:2016:WAT2016,
author = {Eriguchi, Akiko and Hashimoto, Kazuma and Tsuruoka, Yoshimasa},
title = {Character-based Decoding in Tree-to-Sequence Attention-based Neural Machine Translation},
booktitle = {Proceedings of the 3rd Workshop on Asian Translation (WAT2016)},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {175--183},
url = {
http://aclweb.org/anthology/W16-4617},
year = 2016
}
Eriguchi et al. (2016)
Hybrid / Use of Translation Lexicons
Zi Long and Ryuichiro Kimura and Takehito Utsuro and Tomoharu Mitsuhashi and Mikio Yamamoto (2017):
Neural Machine Translation Model with a Large Vocabulary Selected by Branching Entropy, Machine Translation Summit XVI
@inproceedings{mtsummit2017:Long,
author = {Zi Long and Ryuichiro Kimura and Takehito Utsuro and Tomoharu Mitsuhashi and Mikio Yamamoto},
title = {Neural Machine Translation Model with a Large Vocabulary Selected by Branching Entropy},
booktitle = {Machine Translation Summit XVI},
location = {Nagoya, Japan},
url = {
https://arxiv.org/pdf/1704.04520.pdf},
year = 2017
}
Long et al. (2017)
Neubig, Graham (2016):
Lexicons and Minimum Risk Training for Neural Machine Translation: NAIST-CMU at WAT2016, Proceedings of the 3rd Workshop on Asian Translation (WAT2016)
mentioned in Training and Vocabulary@InProceedings{neubig:2016:WAT2016,
author = {Neubig, Graham},
title = {Lexicons and Minimum Risk Training for Neural Machine Translation: NAIST-CMU at WAT2016},
booktitle = {Proceedings of the 3rd Workshop on Asian Translation (WAT2016)},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {119--125},
url = {
http://aclweb.org/anthology/W16-4610},
year = 2016
}
Neubig (2016)
Wang, Weiyue and Alkhouli, Tamer and Zhu, Derui and Ney, Hermann (2017):
Hybrid Neural Network Alignment and Lexicon Model in Direct HMM for Statistical Machine Translation, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
@InProceedings{wang-EtAl:2017:Short1,
author = {Wang, Weiyue and Alkhouli, Tamer and Zhu, Derui and Ney, Hermann},
title = {Hybrid Neural Network Alignment and Lexicon Model in Direct HMM for Statistical Machine Translation},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
month = {July},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
pages = {125--131},
url = {
http://aclweb.org/anthology/P17-2020},
year = 2017
}
Wang et al. (2017)
Wang, Xing and Lu, Zhengdong and Tu, Zhaopeng and Li, Hang and Xiong, Deyi and Zhang, Min (2016):
Neural Machine Translation Advised by Statistical Machine Translation, arXiv preprint arXiv:1610.05150
@article{wang2016neural,
author = {Wang, Xing and Lu, Zhengdong and Tu, Zhaopeng and Li, Hang and Xiong, Deyi and Zhang, Min},
title = {Neural Machine Translation Advised by Statistical Machine Translation},
journal = {arXiv preprint arXiv:1610.05150},
url = {
https://arxiv.org/pdf/1610.05150v2.pdf},
year = 2016
}
Wang et al. (2016)
Thang Luong and Ilya Sutskever and Quoc V. Le and Oriol Vinyals and Wojciech Zaremba (2014):
Addressing the Rare Word Problem in Neural Machine Translation, CoRR
@article{DBLP:journals/corr/LuongSLVZ14,
author = {Thang Luong and Ilya Sutskever and Quoc V. Le and Oriol Vinyals and Wojciech Zaremba},
title = {Addressing the Rare Word Problem in Neural Machine Translation},
journal = {CoRR},
volume = {abs/1410.8206},
url = {
http://arxiv.org/abs/1410.8206},
timestamp = {Sun, 02 Nov 2014 11:25:59 +0100},
biburl = {
http://dblp.uni-trier.de/rec/bib/journals/corr/LuongSLVZ14},
bibsource = {dblp computer science bibliography,
http://dblp.org},
year = 2014
}
Luong et al. (2014)
Sébastien Jean and Kyunghyun Cho and Roland Memisevic and Yoshua Bengio (2014):
On Using Very Large Target Vocabulary for Neural Machine Translation, CoRR
@article{DBLP:journals/corr/JeanCMB14,
author = {S{\'{e}}bastien Jean and Kyunghyun Cho and Roland Memisevic and Yoshua Bengio},
title = {On Using Very Large Target Vocabulary for Neural Machine Translation},
journal = {CoRR},
volume = {abs/1412.2007},
url = {
http://arxiv.org/abs/1412.2007},
timestamp = {Thu, 01 Jan 2015 19:51:08 +0100},
biburl = {
http://dblp.uni-trier.de/rec/bib/journals/corr/JeanCMB14},
bibsource = {dblp computer science bibliography,
http://dblp.org},
year = 2014
}
Jean et al. (2014)
Hashimoto, Kazuma and Eriguchi, Akiko and Tsuruoka, Yoshimasa (2016):
Domain Adaptation and Attention-Based Unknown Word Replacement in Chinese-to-Japanese Neural Machine Translation, Proceedings of the 3rd Workshop on Asian Translation (WAT2016)
@InProceedings{hashimoto-eriguchi-tsuruoka:2016:WAT2016,
author = {Hashimoto, Kazuma and Eriguchi, Akiko and Tsuruoka, Yoshimasa},
title = {Domain Adaptation and Attention-Based Unknown Word Replacement in Chinese-to-Japanese Neural Machine Translation},
booktitle = {Proceedings of the 3rd Workshop on Asian Translation (WAT2016)},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {75--83},
url = {
http://aclweb.org/anthology/W16-4605},
year = 2016
}
Hashimoto et al. (2016)
Long, Zi and Utsuro, Takehito and Mitsuhashi, Tomoharu and Yamamoto, Mikio (2016):
Translation of Patent Sentences with a Large Vocabulary of Technical Terms Using Neural Machine Translation, Proceedings of the 3rd Workshop on Asian Translation (WAT2016)
@InProceedings{long-EtAl:2016:WAT2016,
author = {Long, Zi and Utsuro, Takehito and Mitsuhashi, Tomoharu and Yamamoto, Mikio},
title = {Translation of Patent Sentences with a Large Vocabulary of Technical Terms Using Neural Machine Translation},
booktitle = {Proceedings of the 3rd Workshop on Asian Translation (WAT2016)},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {47--57},
url = {
http://aclweb.org/anthology/W16-4602},
year = 2016
}
Long et al. (2016)
Chitnis, Rohan and DeNero, John (2015):
Variable-Length Word Encodings for Neural Translation Models, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
@InProceedings{chitnis-denero:2015:EMNLP,
author = {Chitnis, Rohan and DeNero, John},
title = {Variable-Length Word Encodings for Neural Translation Models},
booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing},
month = {September},
address = {Lisbon, Portugal},
publisher = {Association for Computational Linguistics},
pages = {2088--2093},
url = {
http://aclweb.org/anthology/D15-1249},
year = 2015
}
Chitnis and DeNero (2015)