Embeddings
Embeddings of words, phrases, sentences, and entire documents have several uses, one among them is to work towards interlingual representations of meaning.
Embeddings is the main subject of 26 publications. 10 are discussed here.
Publications
Word embeddings have become a common feature in current research in natural language processing.
Tomas Mikolov and Ilya Sutskever and Kai Chen and Greg Corrado and Jeffrey Dean (2013):
Distributed Representations of Words and Phrases and their Compositionality, CoRR
@article{DBLP:journals/corr/MikolovSCCD13,
author = {Tomas Mikolov and Ilya Sutskever and Kai Chen and Greg Corrado and Jeffrey Dean},
title = {Distributed Representations of Words and Phrases and their Compositionality},
journal = {CoRR},
volume = {abs/1310.4546},
url = {
http://arxiv.org/abs/1310.4546},
archiveprefix = {arXiv},
eprint = {1310.4546},
timestamp = {Mon, 13 Aug 2018 16:47:09 +0200},
biburl = {
https://dblp.org/rec/bib/journals/corr/MikolovSCCD13},
bibsource = {dblp computer science bibliography,
https://dblp.org},
year = 2013
}
Mikolov et al. (2013) propose the skip-gram method to obtain these representations.
Tomas Mikolov and Kai Chen and Greg Corrado and Jeffrey Dean (2013):
Efficient Estimation of Word Representations in Vector Space, CoRR
@article{DBLP:journals/corr/abs-1301-3781,
author = {Tomas Mikolov and Kai Chen and Greg Corrado and Jeffrey Dean},
title = {Efficient Estimation of Word Representations in Vector Space},
journal = {CoRR},
volume = {abs/1301.3781},
url = {
http://arxiv.org/abs/1301.3781},
archiveprefix = {arXiv},
eprint = {1301.3781},
timestamp = {Mon, 13 Aug 2018 16:48:33 +0200},
biburl = {
https://dblp.org/rec/bib/journals/corr/abs-1301-3781},
bibsource = {dblp computer science bibliography,
https://dblp.org},
year = 2013
}
Mikolov et al. (2013) introduce efficient training methods for the skip-gram and continuous bag of words models, are used in the very popular word2vec implementation and publicly available word embedding sets for many languages.
Pennington, Jeffrey and Socher, Richard and Manning, Christopher (2014):
Glove: Global Vectors for Word Representation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
@InProceedings{pennington-socher-manning:2014:EMNLP2014,
author = {Pennington, Jeffrey and Socher, Richard and Manning, Christopher},
title = {Glove: Global Vectors for Word Representation},
booktitle = {Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
month = {October},
address = {Doha, Qatar},
publisher = {Association for Computational Linguistics},
pages = {1532--1543},
url = {
http://www.aclweb.org/anthology/D14-1162},
year = 2014
}
Pennington et al. (2014) train word embedding models on the co-occurrence statistics of a word over the entire corpus.
Contextualized Word Embeddings
Peters, Matthew and Neumann, Mark and Iyyer, Mohit and Gardner, Matt and Clark, Christopher and Lee, Kenton and Zettlemoyer, Luke (2018):
Deep Contextualized Word Representations, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
@InProceedings{N18-1202,
author = {Peters, Matthew and Neumann, Mark and Iyyer, Mohit and Gardner, Matt and Clark, Christopher and Lee, Kenton and Zettlemoyer, Luke},
title = {Deep Contextualized Word Representations},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)},
publisher = {Association for Computational Linguistics},
pages = {2227--2237},
location = {New Orleans, Louisiana},
url = {
http://aclweb.org/anthology/N18-1202},
year = 2018
}
Peters et al. (2018) demonstrate that various natural language tasks can be improved by contextualizing word embeddings through bi-directional neural language model layers (called ELMo), just as it is done in encoders in machine translations.
Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina (2019):
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
@inproceedings{devlin-etal-2019-bert,
author = {Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
title = {BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
month = {jun},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/N19-1423},
pages = {4171--4186},
year = 2019
}
Devlin et al. (2019) show superior results with a method called BERT which pre-trains word embeddings on a masked language model and next sentence prediction task using the transformer architecture.
Zhilin Yang and Zihang Dai and Yiming Yang and Jaime G. Carbonell and Ruslan Salakhutdinov and Quoc V. Le (2019):
XLNet: Generalized Autoregressive Pretraining for Language Understanding, CoRR
@article{DBLP:journals/corr/abs-1906-08237,
author = {Zhilin Yang and Zihang Dai and Yiming Yang and Jaime G. Carbonell and Ruslan Salakhutdinov and Quoc V. Le},
title = {XLNet: Generalized Autoregressive Pretraining for Language Understanding},
journal = {CoRR},
volume = {abs/1906.08237},
url = {
http://arxiv.org/abs/1906.08237},
archiveprefix = {arXiv},
eprint = {1906.08237},
timestamp = {Mon, 24 Jun 2019 17:28:45 +0200},
biburl = {
https://dblp.org/rec/bib/journals/corr/abs-1906-08237},
bibsource = {dblp computer science bibliography,
https://dblp.org},
year = 2019
}
Yang et al. (2019) refine the BERT model by predicting one masked word at a time, with permutation of the order of the masked words. They call their variant XLNet.
Using Pre-Training Word Embedding
Xing, Chao and Wang, Dong and Liu, Chao and Lin, Yiye (2015):
Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation, Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
mentioned in Embeddings and Multilingual Word Embeddings@InProceedings{xing-EtAl:2015:NAACL-HLT,
author = {Xing, Chao and Wang, Dong and Liu, Chao and Lin, Yiye},
title = {Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation},
booktitle = {Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
month = {May--June},
address = {Denver, Colorado},
publisher = {Association for Computational Linguistics},
pages = {1006--1011},
url = {
http://www.aclweb.org/anthology/N15-1104},
year = 2015
}
Xing et al. (2015) point out inconsistencies in the representation of word embeddings and the objective function for translation transforms between word embeddings, which they address with normalization.
Hirasawa, Tosho and Yamagishi, Hayahide and Matsumura, Yukio and Komachi, Mamoru (2019):
Multimodal Machine Translation with Embedding Prediction, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
@inproceedings{hirasawa-etal-2019-multimodal,
author = {Hirasawa, Tosho and Yamagishi, Hayahide and Matsumura, Yukio and Komachi, Mamoru},
title = {Multimodal Machine Translation with Embedding Prediction},
booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Student Research Workshop},
month = {jun},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/N19-3012},
pages = {86--91},
year = 2019
}
Hirasawa et al. (2019) de-bias word embeddings and show gains with pre-trained word embeddings in a low resource setting.
Phrase Embeddings
Zhang, Jiajun and Liu, Shujie and Li, Mu and Zhou, Ming and Zong, Chengqing (2014):
Bilingually-constrained Phrase Embeddings for Machine Translation, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
@InProceedings{zhang-EtAl:2014:P14-11,
author = {Zhang, Jiajun and Liu, Shujie and Li, Mu and Zhou, Ming and Zong, Chengqing},
title = {Bilingually-constrained Phrase Embeddings for Machine Translation},
booktitle = {Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {June},
address = {Baltimore, Maryland},
publisher = {Association for Computational Linguistics},
pages = {111--121},
url = {
http://www.aclweb.org/anthology/P14-1011},
year = 2014
}
Zhang et al. (2014) learn phrase embeddings using recursive neural networks and auto-encoders and a mapping between input and output phrase to add an additional score to the phrase translations and to filter the phrase table.
Hu, Baotian and Tu, Zhaopeng and Lu, Zhengdong and Li, Hang and Chen, Qingcai (2015):
Context-Dependent Translation Selection Using Convolutional Neural Network, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
@InProceedings{hu-EtAl:2015:ACL-IJCNLP2,
author = {Hu, Baotian and Tu, Zhaopeng and Lu, Zhengdong and Li, Hang and Chen, Qingcai},
title = {Context-Dependent Translation Selection Using Convolutional Neural Network},
booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
month = {July},
address = {Beijing, China},
publisher = {Association for Computational Linguistics},
pages = {536--541},
url = {
http://www.aclweb.org/anthology/P15-2088},
year = 2015
}
Hu et al. (2015) use convolutional neural networks to encode the input and output phrase and pass them to matching that computes their similarity. They include the full input sentence context in the and use a learning strategy called curriculum learning that first learns from the easy training examples and then the harder ones.
Benchmarks
Discussion
Related Topics
New Publications
Jauregi Unanue, Inigo and Zare Borzeshi, Ehsan and Esmaili, Nazanin and Piccardi, Massimo (2019):
ReWE: Regressing Word Embeddings for Regularization of Neural Machine Translation Systems, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
@inproceedings{jauregi-unanue-etal-2019-rewe,
author = {Jauregi Unanue, Inigo and Zare Borzeshi, Ehsan and Esmaili, Nazanin and Piccardi, Massimo},
title = {R}e{WE: Regressing Word Embeddings for Regularization of Neural Machine Translation Systems},
booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
month = {jun},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/N19-1041},
pages = {430--436},
year = 2019
}
Unanue et al. (2019)
McCann, Bryan and Bradbury, James and Xiong, Caiming and Socher, Richard (2017):
Learned in Translation: Contextualized Word Vectors, Advances in Neural Information Processing Systems 30
@incollection{NIPS2017-7209,
author = {McCann, Bryan and Bradbury, James and Xiong, Caiming and Socher, Richard},
title = {Learned in Translation: Contextualized Word Vectors},
booktitle = {Advances in Neural Information Processing Systems 30},
editor = {I. Guyon and U. V. Luxburg and S. Bengio and H. Wallach and R. Fergus and S. Vishwanathan and R. Garnett},
pages = {6294--6305},
publisher = {Curran Associates, Inc.},
url = {
http://papers.nips.cc/paper/7209-learned-in-translation-contextualized-word-vectors.pdf},
year = 2017
}
McCann et al. (2017)
Mrksic, Nikola and Vulio, Ivan and O Seaghdha, Diarmuid and Leviant, Ira and Reichart, Roi and Gasic, Milica and Korhonen, Anna and Young, Steve (2017):
Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints, Transactions of the Association for Computational Linguistics
@article{TACL1171,
author = {Mrksic, Nikola and Vulio, Ivan and O Seaghdha, Diarmuid and Leviant, Ira and Reichart, Roi and Gasic, Milica and Korhonen, Anna and Young, Steve},
title = {Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints},
journal = {Transactions of the Association for Computational Linguistics},
volume = {5},
keywords = {{}},
issn = {2307-387X},
url = {
https://transacl.org/ojs/index.php/tacl/article/view/1171},
pages = {309--324},
year = 2017
}
Mrksic et al. (2017)
Wieting, John and Gimpel, Kevin (2018):
ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
@InProceedings{P18-1042,
author = {Wieting, John and Gimpel, Kevin},
title = {ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
publisher = {Association for Computational Linguistics},
pages = {451--462},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/P18-1042},
year = 2018
}
Wieting and Gimpel (2018)
Pilehvar, Mohammad Taher and Collier, Nigel (2017):
Inducing Embeddings for Rare and Unseen Words by Leveraging Lexical Resources, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
@InProceedings{pilehvar-collier:2017:EACLshort,
author = {Pilehvar, Mohammad Taher and Collier, Nigel},
title = {Inducing Embeddings for Rare and Unseen Words by Leveraging Lexical Resources},
booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers},
month = {April},
address = {Valencia, Spain},
publisher = {Association for Computational Linguistics},
pages = {388--393},
url = {
http://www.aclweb.org/anthology/E17-2062},
year = 2017
}
Pilehvar and Collier (2017)
Passban, Peyman and Liu, Qun and Way, Andy (2016):
Enriching Phrase Tables for Statistical Machine Translation Using Mixed Embeddings, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
@InProceedings{passban-liu-way:2016:COLING,
author = {Passban, Peyman and Liu, Qun and Way, Andy},
title = {Enriching Phrase Tables for Statistical Machine Translation Using Mixed Embeddings},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {2582--2591},
url = {
http://aclweb.org/anthology/C16-1243},
year = 2016
}
Passban et al. (2016)
Sergienya, Irina and Schütze, Hinrich (2015):
Learning Better Embeddings for Rare Words Using Distributional Representations, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
@InProceedings{sergienya-schutze:2015:EMNLP,
author = {Sergienya, Irina and Sch\"{u}tze, Hinrich},
title = {Learning Better Embeddings for Rare Words Using Distributional Representations},
booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing},
month = {September},
address = {Lisbon, Portugal},
publisher = {Association for Computational Linguistics},
pages = {280--285},
url = {
http://aclweb.org/anthology/D15-1033},
year = 2015
}
Sergienya and Schütze (2015)
Köhn, Arne (2015):
What's in an Embedding? Analyzing Word Embeddings through Multilingual Evaluation, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
@InProceedings{kohn:2015:EMNLP,
author = {K\"{o}hn, Arne},
title = {What's in an Embedding? Analyzing Word Embeddings through Multilingual Evaluation},
booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing},
month = {September},
address = {Lisbon, Portugal},
publisher = {Association for Computational Linguistics},
pages = {2067--2073},
url = {
http://aclweb.org/anthology/D15-1246},
year = 2015
}
Köhn (2015)
Sachdeva, Kunal and Sharma, Dipti (2015):
Exploring the effect of semantic similarity for Phrase-based Machine Translation, Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality
@InProceedings{sachdeva-sharma:2015:CVSC,
author = {Sachdeva, Kunal and Sharma, Dipti},
title = {Exploring the effect of semantic similarity for Phrase-based Machine Translation},
booktitle = {Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality},
month = {July},
address = {Beijing, China},
publisher = {Association for Computational Linguistics},
pages = {41--47},
url = {
http://www.aclweb.org/anthology/W15-4005},
year = 2015
}
Sachdeva and Sharma (2015)
Zhao, Kai and Hassan, Hany and Auli, Michael (2015):
Learning Translation Models from Monolingual Continuous Representations, Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
@InProceedings{zhao-hassan-auli:2015:NAACL-HLT,
author = {Zhao, Kai and Hassan, Hany and Auli, Michael},
title = {Learning Translation Models from Monolingual Continuous Representations},
booktitle = {Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
month = {May--June},
address = {Denver, Colorado},
publisher = {Association for Computational Linguistics},
pages = {1527--1536},
url = {
http://www.aclweb.org/anthology/N15-1176},
year = 2015
}
Zhao et al. (2015)
Martinez Garcia, Eva and Tiedemann, Jörg and España-Bonet, Cristina and Màrquez, Lluís (2014):
Word's Vector Representations meet Machine Translation, Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation
@InProceedings{martinezgarcia-EtAl:2014:SSST-8,
author = {Martinez Garcia, Eva and Tiedemann, J\"{o}rg and Espa\~{n}a-Bonet, Cristina and M\`{a}rquez, Llu\'{i}s},
title = {Word's Vector Representations meet Machine Translation},
booktitle = {Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation},
month = {October},
address = {Doha, Qatar},
publisher = {Association for Computational Linguistics},
pages = {132--134},
url = {
http://www.aclweb.org/anthology/W14-4015},
year = 2014
}
Garcia et al. (2014)
Thanh-Le Ha and Jan Niehues and Alex Waibel (2014):
Lexical Translation Model Using A Deep Neural Network Architecture, Proceedings of the International Workshop on Spoken Language Translation (IWSLT)
mentioned in Lexical Choice, Context Features and Embeddings@inproceedings{Ha:iwslt:2014,
author = {Thanh-Le Ha and Jan Niehues and Alex Waibel},
title = {Lexical Translation Model Using A Deep Neural Network Architecture},
pages = {223--229},
booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)},
year = 2014
}
Ha et al. (2014)
Gao, Jianfeng and He, Xiaodong and Yih, Wen-tau and Deng, Li (2014):
Learning Continuous Phrase Representations for Translation Modeling, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
@InProceedings{gao-EtAl:2014:P14-1,
author = {Gao, Jianfeng and He, Xiaodong and Yih, Wen-tau and Deng, Li},
title = {Learning Continuous Phrase Representations for Translation Modeling},
booktitle = {Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {June},
address = {Baltimore, Maryland},
publisher = {Association for Computational Linguistics},
pages = {699--709},
url = {
http://www.aclweb.org/anthology/P14-1066},
year = 2014
}
Gao et al. (2014)
Cho, Kyunghyun and van Merrienboer, Bart and Gulcehre, Caglar and Bahdanau, Dzmitry and Bougares, Fethi and Schwenk, Holger and Bengio, Yoshua (2014):
Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
@InProceedings{cho-EtAl:2014:EMNLP2014,
author = {Cho, Kyunghyun and van Merrienboer, Bart and Gulcehre, Caglar and Bahdanau, Dzmitry and Bougares, Fethi and Schwenk, Holger and Bengio, Yoshua},
title = {Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation},
booktitle = {Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
month = {October},
address = {Doha, Qatar},
publisher = {Association for Computational Linguistics},
pages = {1724--1734},
url = {
http://www.aclweb.org/anthology/D14-1179},
year = 2014
}
Cho et al. (2014)
Levinboim, Tomer and Chiang, David (2015):
Supervised Phrase Table Triangulation with Neural Word Embeddings for Low-Resource Languages, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
@InProceedings{levinboim-chiang:2015:EMNLP,
author = {Levinboim, Tomer and Chiang, David},
title = {Supervised Phrase Table Triangulation with Neural Word Embeddings for Low-Resource Languages},
booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing},
month = {September},
address = {Lisbon, Portugal},
publisher = {Association for Computational Linguistics},
pages = {1079--1083},
url = {
http://aclweb.org/anthology/D15-1126},
year = 2015
}
Levinboim and Chiang (2015)
Alkhouli, Tamer and Guta, Andreas and Ney, Hermann (2014):
Vector Space Models for Phrase-based Machine Translation, Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation
@InProceedings{alkhouli-guta-ney:2014:SSST-8,
author = {Alkhouli, Tamer and Guta, Andreas and Ney, Hermann},
title = {Vector Space Models for Phrase-based Machine Translation},
booktitle = {Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation},
month = {October},
address = {Doha, Qatar},
publisher = {Association for Computational Linguistics},
pages = {1--10},
url = {
http://www.aclweb.org/anthology/W14-4001},
year = 2014
}
Alkhouli et al. (2014)