Very Large Language Models
Since language models training from vast monolingual corpora are easily very large, a constant research topic is to handle such large models efficiently.
Very Large Language Models is the main subject of 15 publications. 10 are discussed here.
Publications
Using very language models that require more space than the available working memory may be distributed over a cluster of machines, but may not need sophisticated smoothing methods
Brants, Thorsten and Popat, Ashok C. and Xu, Peng and Och, Franz Josef and Dean, Jeffrey (2007):
Large Language Models in Machine Translation, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)
@InProceedings{brants-EtAl:2007:EMNLP-CoNLL2007,
author = {Brants, Thorsten and Popat, Ashok C. and Xu, Peng and Och, Franz Josef and Dean, Jeffrey},
title = {Large Language Models in Machine Translation},
booktitle = {Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)},
pages = {858--867},
url = {
http://www.aclweb.org/anthology/D/D07/D07-1090},
year = 2007
}
(Brants et al., 2007). Alternatively, storing the language model on disk using memory mapping is an option
Federico, Marcello and Cettolo, Mauro (2007):
Efficient Handling of N-gram Language Models for Statistical Machine Translation, Proceedings of the Second Workshop on Statistical Machine Translation
@InProceedings{federico-cettolo:2007:WMT,
author = {Federico, Marcello and Cettolo, Mauro},
title = {Efficient Handling of N-gram Language Models for Statistical Machine Translation},
booktitle = {Proceedings of the Second Workshop on Statistical Machine Translation},
month = {June},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {88--95},
url = {
http://www.aclweb.org/anthology/W/W07/W07-0212},
year = 2007
}
(Federico and Cettolo, 2007). The methods for quantizing language model probabilities are presented by
Federico, Marcello and Bertoldi, Nicola (2006):
How Many Bits Are Needed To Store Probabilities for Phrase-Based Translation?, Proceedings on the Workshop on Statistical Machine Translation
@InProceedings{federico-bertoldi:2006:WMT,
author = {Federico, Marcello and Bertoldi, Nicola},
title = {How Many Bits Are Needed To Store Probabilities for Phrase-Based Translation?},
booktitle = {Proceedings on the Workshop on Statistical Machine Translation},
month = {June},
address = {New York City},
publisher = {Association for Computational Linguistics},
pages = {94--101},
url = {
http://www.aclweb.org/anthology/W/W06/W06-3113},
year = 2006
}
Federico and Bertoldi (2006), who also examine this for translation model probabilities.
Heafield, Kenneth (2011):
KenLM: Faster and Smaller Language Model Queries, Proceedings of the Sixth Workshop on Statistical Machine Translation
@InProceedings{heafield:2011:WMT,
author = {Heafield, Kenneth},
title = {KenLM: Faster and Smaller Language Model Queries},
booktitle = {Proceedings of the Sixth Workshop on Statistical Machine Translation},
month = {July},
address = {Edinburgh, Scotland},
publisher = {Association for Computational Linguistics},
pages = {187--197},
url = {
http://www.aclweb.org/anthology/W11-2123},
year = 2011
}
Heafield (2011) introduces efficient data structures that enable compact storage and quick lookup, outperforming traditional approaches.
Alternatively, lossy data structures such as bloom filters may be used to store very large language models efficiently
Talbot, David and Osborne, Miles (2007):
Smoothed Bloom Filter Language Models: Tera-Scale LMs on the Cheap, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)
@InProceedings{talbot-osborne:2007:EMNLP-CoNLL2007,
author = {Talbot, David and Osborne, Miles},
title = {Smoothed {Bloom} Filter Language Models: Tera-Scale {LMs} on the Cheap},
booktitle = {Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)},
pages = {468--476},
url = {
http://www.aclweb.org/anthology/D/D07/D07-1049},
year = 2007
}
(Talbot and Osborne, 2007;
Talbot, David and Osborne, Miles (2007):
Randomised Language Modelling for Statistical Machine Translation, Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
@InProceedings{talbot-osborne:2007:ACLMain,
author = {Talbot, David and Osborne, Miles},
title = {Randomised Language Modelling for Statistical Machine Translation},
booktitle = {Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics},
month = {June},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {512--519},
url = {
http://www.aclweb.org/anthology/P/P07/P07-1065},
year = 2007
}
Talbot and Osborne, 2007b). Such randomized language models allow for incremental updating from a incoming stream of new training data
Levenberg, Abby and Osborne, Miles (2009):
Stream-based Randomised Language Models for SMT, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
@InProceedings{levenberg-osborne:2009:EMNLP,
author = {Levenberg, Abby and Osborne, Miles},
title = {Stream-based Randomised Language Models for {SMT}},
booktitle = {Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing},
month = {August},
address = {Singapore},
publisher = {Association for Computational Linguistics},
pages = {756--764},
url = {
http://www.aclweb.org/anthology/D/D09/D09-1079},
year = 2009
}
(Levenberg and Osborne, 2009), or even multiple streams
Levenberg, Abby and Osborne, Miles and Matthews, David (2011):
Multiple-stream Language Models for Statistical Machine Translation, Proceedings of the Sixth Workshop on Statistical Machine Translation
@InProceedings{levenberg-osborne-matthews:2011:WMT,
author = {Levenberg, Abby and Osborne, Miles and Matthews, David},
title = {Multiple-stream Language Models for Statistical Machine Translation},
booktitle = {Proceedings of the Sixth Workshop on Statistical Machine Translation},
month = {July},
address = {Edinburgh, Scotland},
publisher = {Association for Computational Linguistics},
pages = {177--186},
url = {
http://www.aclweb.org/anthology/W11-2122},
year = 2011
}
(Levenberg et al., 2011).
The use of very large language models is often reduced to a reranking stage
Olteanu, Marian and Suriyentrakorn, Pasin and Moldovan, Dan (2006):
Language Models and Reranking for Machine Translation, Proceedings on the Workshop on Statistical Machine Translation
@InProceedings{olteanu-suriyentrakorn-moldovan:2006:WMT,
author = {Olteanu, Marian and Suriyentrakorn, Pasin and Moldovan, Dan},
title = {Language Models and Reranking for Machine Translation},
booktitle = {Proceedings on the Workshop on Statistical Machine Translation},
month = {June},
address = {New York City},
publisher = {Association for Computational Linguistics},
pages = {150--153},
url = {
http://www.aclweb.org/anthology/W/W06/W06-3122},
year = 2006
}
(Olteanu et al., 2006).
Benchmarks
Discussion
Related Topics
New Publications
Heafield, Kenneth and Pouzyrevsky, Ivan and Clark, Jonathan H. and Koehn, Philipp (2013):
Scalable Modified Kneser-Ney Language Model Estimation, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
@InProceedings{heafield-EtAl:2013:Short,
author = {Heafield, Kenneth and Pouzyrevsky, Ivan and Clark, Jonathan H. and Koehn, Philipp},
title = {Scalable Modified Kneser-Ney Language Model Estimation},
booktitle = {Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {690--696},
url = {
http://www.aclweb.org/anthology/P13-2121},
year = 2013
}
Heafield et al. (2013)
Yasuhara, Makoto and Tanaka, Toru and Norimatsu, Jun-ya and Yamamoto, Mikio (2013):
An Efficient Language Model Using Double-Array Structures, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
@InProceedings{yasuhara-EtAl:2013:EMNLP,
author = {Yasuhara, Makoto and Tanaka, Toru and Norimatsu, Jun-ya and Yamamoto, Mikio},
title = {An Efficient Language Model Using Double-Array Structures},
booktitle = {Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing},
month = {October},
address = {Seattle, Washington, USA},
publisher = {Association for Computational Linguistics},
pages = {222--232},
url = {
http://www.aclweb.org/anthology/D13-1023},
year = 2013
}
Yasuhara et al. (2013)
Vaswani, Ashish and Zhao, Yinggong and Fossum, Victoria and Chiang, David (2013):
Decoding with Large-Scale Neural Language Models Improves Translation, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
mentioned in Very Large Language Models and Neural Language Models@InProceedings{vaswani-EtAl:2013:EMNLP,
author = {Vaswani, Ashish and Zhao, Yinggong and Fossum, Victoria and Chiang, David},
title = {Decoding with Large-Scale Neural Language Models Improves Translation},
booktitle = {Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing},
month = {October},
address = {Seattle, Washington, USA},
publisher = {Association for Computational Linguistics},
pages = {1387--1392},
url = {
http://www.aclweb.org/anthology/D13-1140},
year = 2013
}
Vaswani et al. (2013)
Dani Yogatama and Chong Wang and Bryan R. Routledge and Noah A. Smith and Eric P. Xing (2014):
Dynamic Language Models for Streaming Text, Transactions of the Association for Computational Linguistics (TACL)
@article{tacl14-Yogatama,
author = {Dani Yogatama and Chong Wang and Bryan R. Routledge and Noah A. Smith and Eric P. Xing},
title = {Dynamic Language Models for Streaming Text},
number = {2},
pages = {181-192},
url = {
http://www.aclweb.org/anthology/Q/Q14/Q14-1015.pdf},
booktitle = {Transactions of the Association for Computational Linguistics (TACL)},
year = 2014
}
Yogatama et al. (2014)
Guthrie, David and Hepple, Mark (2010):
Storing the Web in Memory: Space Efficient Language Models with Constant Time Retrieval, Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
@InProceedings{guthrie-hepple:2010:EMNLP,
author = {Guthrie, David and Hepple, Mark},
title = {Storing the Web in Memory: Space Efficient Language Models with Constant Time Retrieval},
booktitle = {Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing},
month = {October},
address = {Cambridge, MA},
publisher = {Association for Computational Linguistics},
pages = {262--272},
url = {
http://www.aclweb.org/anthology/D/D10/D10-1026},
year = 2010
}
Guthrie and Hepple (2010)
Tan, Ming and Zhou, Wenli and Zheng, Lei and Wang, Shaojun (2011):
A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies
@InProceedings{tan-EtAl:2011:ACL-HLT2011,
author = {Tan, Ming and Zhou, Wenli and Zheng, Lei and Wang, Shaojun},
title = {A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies},
month = {June},
address = {Portland, Oregon, USA},
publisher = {Association for Computational Linguistics},
pages = {201--210},
url = {
http://www.aclweb.org/anthology/P11-1021},
year = 2011
}
Tan et al. (2011)