Neural Language Models

Various neural network architectures have been applied to the basic task of language modelling, such as n-gram feed-forward models, recurrent neural networks, convolutional neural networks.

Neural Language Models is the main subject of 31 publications. 15 are discussed here.

Topics in NeuralNetworkModels

Publications

The first vanguard of neural network research tackled language models. A prominent reference for neural language model is Bengio et al. (2003), who implement an n-gram language model as a feed-forward neural network with the history words as input and the predicted word as output. Schwenk et al. (2006) introduce such language models to machine translation (also called "continuous space language models"), and use them in re-ranking, similar to the earlier work in speech recognition. Schwenk (2007) propose a number of speed-ups. They made their implementation available as a open source toolkit (Schwenk, 2010), which also supports training on a graphical processing unit (GPU) (Schwenk et al., 2012).

By first clustering words into classes and encoding words as pair of class and word-in-class bits, Baltescu et al. (2014) reduce the computational complexity sufficiently to allow integration of the neural network language model into the decoder. Another way to reduce computational complexity to enable decoder integration is the use of noise contrastive estimation by Vaswani et al. (2013), which roughly self-normalizes the output scores of the model during training, hence removing the need to compute the values for all possible output words. Baltescu and Blunsom (2015) compare the two techniques - class-based word encoding with normalized scores vs. noise-contrastive estimation without normalized scores - and show that the latter gives better performance with much higher speed.

As another way to allow straightforward decoder integration, Wang et al. (2013) convert a continuous space language model for a short list of 8192 words into a traditional n-gram language model in ARPA (SRILM) format. Wang et al. (2014) present a method to merge (or "grow") a continuous space language model with a traditional n-gram language model, to take advantage of both better estimate for the words in the short list and the full coverage from the traditional model.

Finch et al. (2012) use a recurrent neural network language model to rescore n-best lists for a transliteration system. Sundermeyer et al. (2013) compare feed-forward with long short-term neural network language models, a variant of recurrent neural network language models, showing better performance for the latter in a speech recognition re-ranking task. Mikolov (2012) reports significant improvements with reranking n-best lists of machine translation systems with a recurrent neural network language model.

Neural language models are not deep learning in the sense that they use a lot of hidden layers. However, Luong et al. (2015) show that having 3-4 hidden layers improves over having just the typical 1 layer.

Language Models in Neural Machine Translation: Traditional statistical machine translation models have a straightforward mechanism to integrate additional knowledge sources, such as a large out of domain language model. It is harder for end-to-end neural machine translation. Gülçehre et al. (2015) add a language model trained on additional monolingual data to this model, in form of a recurrently neural network that runs in parallel. They compare the use of the language model in re-ranking (or, re-scoring) against deeper integration where a gated unit regulates the relative contribution of the language model and the translation model when predicting a word.

Benchmarks

Discussion

New Publications

Herold, Christian and Gao, Yingbo and Ney, Hermann (2018): Improving Neural Language Models with Weight Norm Initialization and Regularization, Proceedings of the Third Conference on Machine Translation: Research Papers
add
@inproceedings{W18-6310,
author = {Herold, Christian and Gao, Yingbo and Ney, Hermann},
title = {Improving Neural Language Models with Weight Norm Initialization and Regularization},
booktitle = {Proceedings of the Third Conference on Machine Translation: Research Papers},
month = {oct},
address = {Belgium, Brussels},
publisher = {Association for Computational Linguistics},
url = {https://www.aclweb.org/anthology/W18-6310},
pages = {93--100},
year = 2018
}
Herold et al. (2018)
Stahlberg, Felix and Cross, James and Stoyanov, Veselin (2018): Simple Fusion: Return of the Language Model, Proceedings of the Third Conference on Machine Translation: Research Papers
add
@inproceedings{W18-6321,
author = {Stahlberg, Felix and Cross, James and Stoyanov, Veselin},
title = {Simple Fusion: Return of the Language Model},
booktitle = {Proceedings of the Third Conference on Machine Translation: Research Papers},
month = {oct},
address = {Belgium, Brussels},
publisher = {Association for Computational Linguistics},
url = {https://www.aclweb.org/anthology/W18-6321},
pages = {204--211},
year = 2018
}
Stahlberg et al. (2018)
Aram Ter-Sarkisov and Holger Schwenk and Fethi Bougares and Loïc Barrault (2014): Incremental Adaptation Strategies for Neural Network Language Models, CoRR
add
@article{DBLP:journals/corr/Ter-SarkisovSBB14,
author = {Aram Ter{-}Sarkisov and Holger Schwenk and Fethi Bougares and Lo{\"{\i}}c Barrault},
title = {Incremental Adaptation Strategies for Neural Network Language Models},
journal = {CoRR},
volume = {abs/1412.6650},
url = {http://arxiv.org/abs/1412.6650},
timestamp = {Thu, 01 Jan 2015 19:51:08 +0100},
biburl = {http://dblp.uni-trier.de/rec/bib/journals/corr/Ter-SarkisovSBB14},
bibsource = {dblp computer science bibliography, http://dblp.org},
year = 2014
}
Ter-Sarkisov et al. (2014)
Verwimp, Lyan and Pelemans, Joris and Van hamme, Hugo and Wambacq, Patrick (2017): Character-Word LSTM Language Models, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers
add
@InProceedings{verwimp-EtAl:2017:EACLlong,
author = {Verwimp, Lyan and Pelemans, Joris and Van hamme, Hugo and Wambacq, Patrick},
title = {Character-Word LSTM Language Models},
booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers},
month = {April},
address = {Valencia, Spain},
publisher = {Association for Computational Linguistics},
pages = {417--427},
url = {http://www.aclweb.org/anthology/E17-1040},
year = 2017
}
Verwimp et al. (2017)
Pham, Ngoc-Quan and Kruszewski, Germán and Boleda, Gemma (2016): Convolutional Neural Network Language Models, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{pham-kruszewski-boleda:2016:EMNLP2016,
author = {Pham, Ngoc-Quan and Kruszewski, Germ\'{a}n and Boleda, Gemma},
title = {Convolutional Neural Network Language Models},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {1153--1162},
url = {https://aclweb.org/anthology/D16-1123},
year = 2016
}
Pham et al. (2016)
Miyamoto, Yasumasa and Cho, Kyunghyun (2016): Gated Word-Character Recurrent Language Model, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{miyamoto-cho:2016:EMNLP2016,
author = {Miyamoto, Yasumasa and Cho, Kyunghyun},
title = {Gated Word-Character Recurrent Language Model},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {1992--1997},
url = {https://aclweb.org/anthology/D16-1209},
year = 2016
}
Miyamoto and Cho (2016)
Neubig, Graham and Dyer, Chris (2016): Generalizing and Hybridizing Count-based and Neural Language Models, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{neubig-dyer:2016:EMNLP2016,
author = {Neubig, Graham and Dyer, Chris},
title = {Generalizing and Hybridizing Count-based and Neural Language Models},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {1163--1172},
url = {https://aclweb.org/anthology/D16-1124},
year = 2016
}
Neubig and Dyer (2016)
Niehues, Jan and Ha, Thanh-Le and Cho, Eunah and Waibel, Alex (2016): Using Factored Word Representation in Neural Network Language Models, Proceedings of the First Conference on Machine Translation
add
@InProceedings{niehues-EtAl:2016:WMT,
author = {Niehues, Jan and Ha, Thanh-Le and Cho, Eunah and Waibel, Alex},
title = {Using Factored Word Representation in Neural Network Language Models},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {74--82},
url = {http://www.aclweb.org/anthology/W/W16/W16-2208},
year = 2016
}
Niehues et al. (2016)
Chen, Wenlin and Grangier, David and Auli, Michael (2016): Strategies for Training Large Vocabulary Neural Language Models, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
add
@InProceedings{chen-grangier-auli:2016:P16-1,
author = {Chen, Wenlin and Grangier, David and Auli, Michael},
title = {Strategies for Training Large Vocabulary Neural Language Models},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {1975--1985},
url = {http://www.aclweb.org/anthology/P16-1186},
year = 2016
}
Chen et al. (2016)
Chen, Yunchuan and Mou, Lili and Xu, Yan and Li, Ge and Jin, Zhi (2016): Compressing Neural Language Models by Sparse Word Representations, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
add
@InProceedings{chen-EtAl:2016:P16-11,
author = {Chen, Yunchuan and Mou, Lili and Xu, Yan and Li, Ge and Jin, Zhi},
title = {Compressing Neural Language Models by Sparse Word Representations},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {226--235},
url = {http://www.aclweb.org/anthology/P16-1022},
year = 2016
}
Chen et al. (2016)
Devlin, Jacob and Quirk, Chris and Menezes, Arul (2015): Pre-Computable Multi-Layer Neural Network Language Models, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{devlin-quirk-menezes:2015:EMNLP,
author = {Devlin, Jacob and Quirk, Chris and Menezes, Arul},
title = {Pre-Computable Multi-Layer Neural Network Language Models},
booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing},
month = {September},
address = {Lisbon, Portugal},
publisher = {Association for Computational Linguistics},
pages = {256--260},
url = {http://aclweb.org/anthology/D15-1029},
year = 2015
}
Devlin et al. (2015)
Walid Aransa and Holger Schwenk and Loïc Barrault (2015): Improving continuous space language models auxiliary features, Proceedings of the International Workshop on Spoken Language Translation (IWSLT)
add
@inproceedings{IWSLT-2015-Aransa,
author = {Walid Aransa and Holger Schwenk and Loïc Barrault},
title = {Improving continuous space language models auxiliary features},
pages = {151-158},
booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)},
location = {Da Nang, Vietnam},
url = {http://www.mt-archive.info/15/IWSLT-2015-aransa.pdf},
month = {December},
year = 2015
}
Aransa et al. (2015)
Auli, Michael and Gao, Jianfeng (2014): Decoder Integration and Expected BLEU Training for Recurrent Neural Network Language Models, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
add
@InProceedings{auli-gao:2014:P14-2,
author = {Auli, Michael and Gao, Jianfeng},
title = {Decoder Integration and Expected BLEU Training for Recurrent Neural Network Language Models},
booktitle = {Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
month = {June},
address = {Baltimore, Maryland},
publisher = {Association for Computational Linguistics},
pages = {136--142},
url = {http://www.aclweb.org/anthology/P14-2023},
year = 2014
}
Auli and Gao (2014)
Jan Niehues and Alexander Allauzen and François Yvon and Alex Waibel (2014): Combining techniques from different NN-based language models for machine translation, Proceedings of the Eleventh Conference of the Association for Machine Translation in the Americas (AMTA)
add
@inproceedings{AMTA-2014-Niehues,
author = {Jan Niehues and Alexander Allauzen and Fran{\,c}ois Yvon and Alex Waibel},
title = {Combining techniques from different NN-based language models for machine translation},
pages = {222-233},
url = {http://www.mt-archive.info/10/AMTA-2014-Niehues.pdf},
volume = {1},
booktitle = {Proceedings of the Eleventh Conference of the Association for Machine Translation in the Americas (AMTA)},
location = {Vancouver, BC, Canada},
year = 2014
}
Niehues et al. (2014)
Jan Niehus and Alex Waibel (2012): Continuous space language models using restricted Boltzmann machines, Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT)
add
@inproceedings{iwslt12:Niehues,
author = {Jan Niehus and Alex Waibel},
title = {Continuous space language models using restricted Boltzmann machines},
url = {http://www.mt-archive.info/IWSLT-2012-Niehues.pdf},
pages = {164-170},
booktitle = {Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT)},
location = {Hong Kong},
year = 2012
}
Niehus and Waibel (2012)
Alkhouli, Tamer and Rietig, Felix and Ney, Hermann (2015): Investigations on Phrase-based Decoding with Recurrent Neural Network Language and Translation Models, Proceedings of the Tenth Workshop on Statistical Machine Translation
add
@InProceedings{alkhouli-rietig-ney:2015:WMT,
author = {Alkhouli, Tamer and Rietig, Felix and Ney, Hermann},
title = {Investigations on Phrase-based Decoding with Recurrent Neural Network Language and Translation Models},
booktitle = {Proceedings of the Tenth Workshop on Statistical Machine Translation},
month = {September},
address = {Lisbon, Portugal},
publisher = {Association for Computational Linguistics},
pages = {294--303},
url = {http://aclweb.org/anthology/W15-3034},
year = 2015
}
Alkhouli et al. (2015)
Wang, Rui and Utiyama, Masao and Goto, Isao and Sumita, Eiichro and Zhao, Hai and Lu, Bao-Liang (2013): Converting Continuous-Space Language Models into N-Gram Language Models for Statistical Machine Translation, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{wang-EtAl:2013:EMNLP2,
author = {Wang, Rui and Utiyama, Masao and Goto, Isao and Sumita, Eiichro and Zhao, Hai and Lu, Bao-Liang},
title = {Converting Continuous-Space Language Models into N-Gram Language Models for Statistical Machine Translation},
booktitle = {Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing},
month = {October},
address = {Seattle, Washington, USA},
publisher = {Association for Computational Linguistics},
pages = {845--850},
url = {http://www.aclweb.org/anthology/D13-1082},
year = 2013
}
Wang et al. (2013)

MT Research Survey Wiki

A Comprehensive Survey of Neural and Statistical Machine Translation Research Publications

Search Descriptions

Neural Language Models

Publications

Benchmarks

Discussion

Related Topics

New Publications