Very Large Language Models
Since language models training from vast monolingual corpora are easily very large, a constant research topic is to handle such large models efficiently.
Very Large Language Models is the main subject of 15 publications. 10 are discussed here.
Using very language models that require more space than the available working memory may be distributed over a cluster of machines, but may not need sophisticated smoothing methods (Brants et al., 2007)
. Alternatively, storing the language model on disk using memory mapping is an option (Federico and Cettolo, 2007)
. The methods for quantizing language model probabilities are presented by Federico and Bertoldi (2006)
, who also examine this for translation model probabilities. Heafield (2011)
introduces efficient data structures that enable compact storage and quick lookup, outperforming traditional approaches.
Alternatively, lossy data structures such as bloom filters may be used to store very large language models efficiently (Talbot and Osborne, 2007
; Talbot and Osborne, 2007b)
. Such randomized language models allow for incremental updating from a incoming stream of new training data (Levenberg and Osborne, 2009)
, or even multiple streams (Levenberg et al., 2011)
The use of very large language models is often reduced to a reranking stage (Olteanu et al., 2006)
- Heafield et al. (2013)
- Yasuhara et al. (2013)
- Vaswani et al. (2013)
- Yogatama et al. (2014)
- Guthrie and Hepple (2010)
- Tan et al. (2011)