Multilingual Word Embeddings
Mapping between the word embeddings spaces for different languages, or a common word embedding space for all languages enables a shared semantic space that reveals word correspondences across languages.
Multilingual Word Embeddings is the main subject of 60 publications. 44 are discussed here.
Publications
Sebastian Ruder and Ivan Vuli\'c and Anders Søgaard (2017):
A survey of cross-lingual embedding models, CoRR

@article{DBLP:journals/corr/Ruder17,
author = {Sebastian Ruder and Ivan Vuli{\'c} and Anders S{\o}gaard},
title = {A survey of cross-lingual embedding models},
journal = {CoRR},
volume = {abs/1706.04902},
url = {
http://arxiv.org/abs/1706.04902},
archiveprefix = {arXiv},
eprint = {1706.04902},
timestamp = {Mon, 13 Aug 2018 16:48:29 +0200},
biburl = {
https://dblp.org/rec/bib/journals/corr/Ruder17},
bibsource = {dblp computer science bibliography,
https://dblp.org},
year = 2017
}
Ruder et al. (2017) gives a comprehensive overview of work on cross-lingual word embeddings. The observation that word representations obtained from their distributional properties (i.e., how they are used in text) are similar across languages has been made long known, but
Tomas Mikolov and Quoc V. Le and Ilya Sutskever (2013):
Exploiting Similarities among Languages for Machine Translation, CoRR

@article{DBLP:journals/corr/MikolovLS13,
author = {Tomas Mikolov and Quoc V. Le and Ilya Sutskever},
title = {Exploiting Similarities among Languages for Machine Translation},
journal = {CoRR},
volume = {abs/1309.4168},
url = {
http://arxiv.org/abs/1309.4168},
timestamp = {Wed, 07 Jun 2017 14:40:03 +0200},
biburl = {
http://dblp.uni-trier.de/rec/bib/journals/corr/MikolovLS13},
bibsource = {dblp computer science bibliography,
http://dblp.org},
year = 2013
}
Mikolov et al. (2013) was among the first to observe this for the word embeddings generated by neural models and suggest that a simple linear transformation from word embeddings in one language to word embeddings in another language may be used to translate words.
Aligning Embedding Spaces
Tomas Mikolov and Quoc V. Le and Ilya Sutskever (2013):
Exploiting Similarities among Languages for Machine Translation, CoRR

@article{DBLP:journals/corr/MikolovLS13,
author = {Tomas Mikolov and Quoc V. Le and Ilya Sutskever},
title = {Exploiting Similarities among Languages for Machine Translation},
journal = {CoRR},
volume = {abs/1309.4168},
url = {
http://arxiv.org/abs/1309.4168},
timestamp = {Wed, 07 Jun 2017 14:40:03 +0200},
biburl = {
http://dblp.uni-trier.de/rec/bib/journals/corr/MikolovLS13},
bibsource = {dblp computer science bibliography,
http://dblp.org},
year = 2013
}
Mikolov et al. (2013) learn the linear mapping between pre-existing embedding spaces by minimizing the distance between a projected source word vector and a target word vector for a given seed lexicon.
Xing, Chao and Wang, Dong and Liu, Chao and Lin, Yiye (2015):
Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation, Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
mentioned in Embeddings and Multilingual Word Embeddings@InProceedings{xing-EtAl:2015:NAACL-HLT,
author = {Xing, Chao and Wang, Dong and Liu, Chao and Lin, Yiye},
title = {Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation},
booktitle = {Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
month = {May--June},
address = {Denver, Colorado},
publisher = {Association for Computational Linguistics},
pages = {1006--1011},
url = {
http://www.aclweb.org/anthology/N15-1104},
year = 2015
}
Xing et al. (2015) improve this method by requiring the mapping matrix to be orthogonal.
Artetxe, Mikel and Labaka, Gorka and Agirre, Eneko (2016):
Learning principled bilingual mappings of word embeddings while preserving monolingual invariance, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

@InProceedings{artetxe-labaka-agirre:2016:EMNLP2016,
author = {Artetxe, Mikel and Labaka, Gorka and Agirre, Eneko},
title = {Learning principled bilingual mappings of word embeddings while preserving monolingual invariance},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {2289--2294},
url = {
https://aclweb.org/anthology/D16-1250},
year = 2016
}
Artetxe et al. (2016) refine this method further with mean centering.
Faruqui, Manaal and Dyer, Chris (2014):
Improving Vector Space Word Representations Using Multilingual Correlation, Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

@InProceedings{faruqui-dyer:2014:EACL,
author = {Faruqui, Manaal and Dyer, Chris},
title = {Improving Vector Space Word Representations Using Multilingual Correlation},
booktitle = {Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics},
month = {April},
address = {Gothenburg, Sweden},
publisher = {Association for Computational Linguistics},
pages = {462--471},
url = {
http://www.aclweb.org/anthology/E14-1049},
year = 2014
}
Faruqui and Dyer (2014) map monolingually generated word embeddings into a shared bilingual embedding state using canonical correlation analysis by maximizing the correlation of the two vector for each word translation pair.
Braune, Fabienne and Hangya, Viktor and Eder, Tobias and Fraser, Alexander (2018):
Evaluating bilingual word embeddings on the long tail, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

@InProceedings{N18-2030,
author = {Braune, Fabienne and Hangya, Viktor and Eder, Tobias and Fraser, Alexander},
title = {Evaluating bilingual word embeddings on the long tail},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)},
publisher = {Association for Computational Linguistics},
pages = {188--193},
location = {New Orleans, Louisiana},
url = {
http://aclweb.org/anthology/N18-2030},
year = 2018
}
Braune et al. (2018) point out that the accuracy of obtained bilingual lexicons is much lower for rare words, a problem that can be somewhat addressed with additional features such as representations built on letter n-grams and taking orthographic distance into account when mapping words.
Heyman, Geert and Verreet, Bregt and Vuli\'c, Ivan and Moens, Marie-Francine (2019):
Learning Unsupervised Multilingual Word Embeddings with Incremental Multilingual Hubs, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

@inproceedings{heyman-etal-2019-learning,
author = {Heyman, Geert and Verreet, Bregt and Vuli{\'c}, Ivan and Moens, Marie-Francine},
title = {Learning Unsupervised Multilingual Word Embeddings with Incremental Multilingual Hubs},
booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
month = {jun},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/N19-1188},
pages = {1890--1902},
year = 2019
}
Heyman et al. (2019) learn a linear transform between embedding spaces based on an automatically generated seed lexicon and show improvements by incrementally adding languages and matching the spaces of newly added languages to all previous languages (multi-hub).
Alqaisi, Taghreed and O'Keefe, Simon (2019):
En-Ar Bilingual Word Embeddings without Word Alignment: Factors Effects, Proceedings of the Fourth Arabic Natural Language Processing Workshop

@inproceedings{alqaisi-okeefe-2019-en,
author = {Alqaisi, Taghreed and O{'}Keefe, Simon},
title = {En-Ar Bilingual Word Embeddings without Word Alignment: Factors Effects},
booktitle = {Proceedings of the Fourth Arabic Natural Language Processing Workshop},
month = {aug},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W19-4611},
pages = {97--107},
year = 2019
}
Alqaisi and O'Keefe (2019) consider the problem of morphological rich languages at the example of Arabic and demonstrate the importance of morphological analysis and word splitting.
Seed Lexicon
Supervised and semi-supervised approaches to map embedding spaces require a seed lexicon of word translation pairs. These are most commonly generated with traditional statistical methods from parallel corpora
Faruqui, Manaal and Dyer, Chris (2014):
Improving Vector Space Word Representations Using Multilingual Correlation, Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

@InProceedings{faruqui-dyer:2014:EACL,
author = {Faruqui, Manaal and Dyer, Chris},
title = {Improving Vector Space Word Representations Using Multilingual Correlation},
booktitle = {Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics},
month = {April},
address = {Gothenburg, Sweden},
publisher = {Association for Computational Linguistics},
pages = {462--471},
url = {
http://www.aclweb.org/anthology/E14-1049},
year = 2014
}
(Faruqui and Dyer, 2014).
Yehezkel Lubin, Noa and Goldberger, Jacob and Goldberg, Yoav (2019):
Aligning Vector-spaces with Noisy Supervised Lexicon, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

@inproceedings{yehezkel-lubin-etal-2019-aligning,
author = {Yehezkel Lubin, Noa and Goldberger, Jacob and Goldberg, Yoav},
title = {Aligning Vector-spaces with Noisy Supervised Lexicon},
booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
month = {jun},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/N19-1045},
pages = {460--465},
year = 2019
}
Lubin et al. (2019) address the problem of noisy word pairs in such automatically generated lexicons, showing that they cause significant harm, and develop a method that learns the noise level and finds noisy pairs.
Søgaard, Anders and Ruder, Sebastian and Vuli\'c, Ivan (2018):
On the Limitations of Unsupervised Bilingual Dictionary Induction, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

@inproceedings{sogaard-etal-2018-limitations,
author = {S{\o}gaard, Anders and Ruder, Sebastian and Vuli{\'c}, Ivan},
title = {On the Limitations of Unsupervised Bilingual Dictionary Induction},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {jul},
address = {Melbourne, Australia},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P18-1072},
doi = {10.18653/v1/P18-1072},
pages = {778--788},
year = 2018
}
Søgaard et al. (2018) use identically spelled words in both languages as seeds.
Shi, Weijia and Chen, Muhao and Tian, Yingtao and Chang, Kai-Wei (2019):
Learning Bilingual Word Embeddings Using Lexical Definitions, Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

@inproceedings{shi-etal-2019-learning,
author = {Shi, Weijia and Chen, Muhao and Tian, Yingtao and Chang, Kai-Wei},
title = {Learning Bilingual Word Embeddings Using Lexical Definitions},
booktitle = {Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)},
month = {aug},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W19-4316},
pages = {142--147},
year = 2019
}
Shi et al. (2019) use off-the-shelf bilingual dictionaries and detail how such human-targeted dictionary definitions needs to be preprocessed.
Artetxe, Mikel and Labaka, Gorka and Agirre, Eneko (2017):
Learning bilingual word embeddings with (almost) no bilingual data, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

@inproceedings{artetxe-etal-2017-learning,
author = {Artetxe, Mikel and Labaka, Gorka and Agirre, Eneko},
title = {Learning bilingual word embeddings with (almost) no bilingual data},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {jul},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P17-1042},
doi = {10.18653/v1/P17-1042},
pages = {451--462},
year = 2017
}
Artetxe et al. (2017) reduce the need for large seed dictionaries by starting with just 25 entries and iteratively increasing the dictionary based on the obtained mappings. Making do with weaker supervision,
Gouws, Stephan and Bengio, Yoshua and Corrado, Greg (2015):
BilBOWA: Fast Bilingual Distributed Representations Without Word Alignments, Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37

@inproceedings{Gouws:2015:BFB:3045118.3045199,
author = {Gouws, Stephan and Bengio, Yoshua and Corrado, Greg},
title = {BilBOWA: Fast Bilingual Distributed Representations Without Word Alignments},
booktitle = {Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37},
series = {ICML'15},
location = {Lille, France},
pages = {748--756},
numpages = {9},
url = {
http://arxiv.org/pdf/1410.2455.pdf},
acmid = {3045199},
publisher = {JMLR.org},
year = 2015
}
Gouws et al. (2015) learn directly from sentence pairs by predicting words in a target sentence from words in a source sentence.
Coulmance, Jocelyn and Marty, Jean-Marc and Wenzek, Guillaume and Benhalloum, Amine (2015):
Trans-gram, Fast Cross-lingual Word-embeddings, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

@InProceedings{coulmance-EtAl:2015:EMNLP,
author = {Coulmance, Jocelyn and Marty, Jean-Marc and Wenzek, Guillaume and Benhalloum, Amine},
title = {Trans-gram, Fast Cross-lingual Word-embeddings},
booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing},
month = {September},
address = {Lisbon, Portugal},
publisher = {Association for Computational Linguistics},
pages = {1109--1113},
url = {
http://aclweb.org/anthology/D15-1131},
year = 2015
}
Coulmance et al. (2015) explore a variant of this idea.
Vulić, Ivan and Moens, Marie-Francine (2015):
Bilingual Word Embeddings from Non-Parallel Document-Aligned Data Applied to Bilingual Lexicon Induction, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

@InProceedings{vulic-moens:2015:ACL-IJCNLP,
author = {Vuli\'{c}, Ivan and Moens, Marie-Francine},
title = {Bilingual Word Embeddings from Non-Parallel Document-Aligned Data Applied to Bilingual Lexicon Induction},
booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
month = {July},
address = {Beijing, China},
publisher = {Association for Computational Linguistics},
pages = {719--725},
url = {
http://www.aclweb.org/anthology/P15-2118},
year = 2015
}
Vulić and Moens (2015) use pairs of Wikipedia document pairs, aiming to predict words in mixed language documents.
Zhou, Chunting and Ma, Xuezhe and Wang, Di and Neubig, Graham (2019):
Density Matching for Bilingual Word Embedding, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

@inproceedings{zhou-etal-2019-density,
author = {Zhou, Chunting and Ma, Xuezhe and Wang, Di and Neubig, Graham},
title = {Density Matching for Bilingual Word Embedding},
booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
month = {jun},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/N19-1161},
pages = {1588--1598},
year = 2019
}
Zhou et al. (2019) use identically spelled words as seeds.
Vulić, Ivan and Korhonen, Anna (2016):
On the Role of Seed Lexicons in Learning Bilingual Word Embeddings, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

@InProceedings{vulic-korhonen:2016:P16-1,
author = {Vuli\'{c}, Ivan and Korhonen, Anna},
title = {On the Role of Seed Lexicons in Learning Bilingual Word Embeddings},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {247--257},
url = {
http://www.aclweb.org/anthology/P16-1024},
year = 2016
}
Vulić and Korhonen (2016) compare different types and sizes of seed lexicons.
Unsupervised Methods
Miceli Barone, Antonio Valerio (2016):
Towards cross-lingual distributed representations without parallel text trained with adversarial autoencoders, Proceedings of the 1st Workshop on Representation Learning for NLP

@inproceedings{miceli-barone-2016-towards,
author = {Miceli Barone, Antonio Valerio},
title = {Towards cross-lingual distributed representations without parallel text trained with adversarial autoencoders},
booktitle = {Proceedings of the 1st Workshop on Representation Learning for {NLP}},
month = {aug},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W16-1614},
doi = {10.18653/v1/W16-1614},
pages = {121--126},
year = 2016
}
Barone (2016) suggests the idea of using auto-encoders and adversarial training to learn a mapping between monolingual alignment spaces without any parallel data or any other bilingual signal but does not report any results.
Zhang, Meng and Liu, Yang and Luan, Huanbo and Sun, Maosong (2017):
Adversarial Training for Unsupervised Bilingual Lexicon Induction, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

@inproceedings{zhang-etal-2017-adversarial,
author = {Zhang, Meng and Liu, Yang and Luan, Huanbo and Sun, Maosong},
title = {Adversarial Training for Unsupervised Bilingual Lexicon Induction},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {jul},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P17-1179},
doi = {10.18653/v1/P17-1179},
pages = {1959--1970},
year = 2017
}
Zhang et al. (2017) demonstrate the effectiveness this idea, exploring both unidirectional and bidirectional mappings.
Alexis Conneau and Guillaume Lample and Marc'Aurelio Ranzato and Ludovic Denoyer and Hervé Jégou (2018):
Word translation without parallel data, International Conference on Learning Representations

@inproceedings{lample2018word,
author = {Alexis Conneau and Guillaume Lample and Marc'Aurelio Ranzato and Ludovic Denoyer and Hervé Jégou},
title = {Word translation without parallel data},
booktitle = {International Conference on Learning Representations},
url = {
https://openreview.net/pdf?id=H196sainb},
year = 2018
}
Conneau et al. (2018) add a fine-tuning step based on a synthetic dictionary of high-confidence word pairs, achieving vastly better results.
Mohiuddin, Tasnim and Joty, Shafiq (2019):
Revisiting Adversarial Autoencoder for Unsupervised Word Translation with Cycle Consistency and Improved Training, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

@inproceedings{mohiuddin-joty-2019-revisiting,
author = {Mohiuddin, Tasnim and Joty, Shafiq},
title = {Revisiting Adversarial Autoencoder for Unsupervised Word Translation with Cycle Consistency and Improved Training},
booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
month = {jun},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/N19-1386},
pages = {3857--3867},
year = 2019
}
Mohiuddin and Joty (2019) extend this approach into a symmetric setup that learns mappings into both directions, with a discriminator for each languages (called a CycleGAN), and reconstruction loss as a training objective component.
Xu, Ruochen and Yang, Yiming and Otani, Naoki and Wu, Yuexin (2018):
Unsupervised Cross-lingual Transfer of Word Embedding Spaces, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{xu-etal-2018-unsupervised-cross,
author = {Xu, Ruochen and Yang, Yiming and Otani, Naoki and Wu, Yuexin},
title = {Unsupervised Cross-lingual Transfer of Word Embedding Spaces},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
month = {oct # "-" # nov},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1268},
doi = {10.18653/v1/D18-1268},
pages = {2465--2474},
year = 2018
}
Xu et al. (2018) propose a similar method, using Sinkhorn distance.
Chen, Xilun and Cardie, Claire (2018):
Unsupervised Multilingual Word Embeddings, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1024,
author = {Chen, Xilun and Cardie, Claire},
title = {Unsupervised Multilingual Word Embeddings},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1024},
pages = {261--270},
year = 2018
}
Chen and Cardie (2018) extend the adversarial training approach to more than two languages.
Instead of using adversarial training,
Zhang, Meng and Liu, Yang and Luan, Huanbo and Sun, Maosong (2017):
Earth Mover's Distance Minimization for Unsupervised Bilingual Lexicon Induction, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
mentioned in Dictionaries From Comparable Corpora and Multilingual Word Embeddings@InProceedings{D17-1206,
author = {Zhang, Meng and Liu, Yang and Luan, Huanbo and Sun, Maosong},
title = {Earth Mover's Distance Minimization for Unsupervised Bilingual Lexicon Induction},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},
publisher = {Association for Computational Linguistics},
pages = {1924--1935},
location = {Copenhagen, Denmark},
url = {
http://www.aclweb.org/anthology/D17-1207},
year = 2017
}
Zhang et al. (2017) measure the difference between the two embedding spaces with earth mover's distance, defined as the sum of distances of how far each word vector has to be moved towards the nearest vector in the other language's embedding space.
Hoshen, Yedid and Wolf, Lior (2018):
Non-Adversarial Unsupervised Word Translation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1043,
author = {Hoshen, Yedid and Wolf, Lior},
title = {Non-Adversarial Unsupervised Word Translation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1043},
pages = {469--478},
year = 2018
}
Hoshen and Wolf (2018) follow the same intuition but first reduce the complexity of the word vectors with principle component analysis (PCA) and align the spaces alongside the resulting axis first. Their iterative algorithm moves the projections of word vectors to the closest target-side vector in the projected space.
Alvarez-Melis, David and Jaakkola, Tommi (2018):
Gromov-Wasserstein Alignment of Word Embedding Spaces, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{alvarez-melis-jaakkola-2018-gromov,
author = {Alvarez-Melis, David and Jaakkola, Tommi},
title = {G}romov-{Wasserstein Alignment of Word Embedding Spaces},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
month = {oct # "-" # nov},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1214},
doi = {10.18653/v1/D18-1214},
pages = {1881--1890},
year = 2018
}
Alvarez-Melis and Jaakkola (2018) draw parallels between this approach and Optimal Transport. In their method, they minimize the distance between a projected vector and all target-side vectors, measured by the L2 norm.
Jean Alaux and Edouard Grave and Marco Cuturi and Armand Joulin (2019):
Unsupervised Hyper-alignment for Multilingual Word Embeddings, International Conference on Learning Representations (ICLR)

@inproceedings{iclr-multilingual-embeddings-2019,
author = {Jean Alaux and Edouard Grave and Marco Cuturi and Armand Joulin},
title = {Unsupervised Hyper-alignment for Multilingual Word Embeddings},
booktitle = {International Conference on Learning Representations (ICLR)},
url = {
http://arxiv.org/pdf/1811.01124.pdf},
year = 2019
}
Alaux et al. (2019) extend this to more than two languages, by mapping all languages into a common space and matching the word embedding distributions of any two languages at a time.
Mukherjee, Tanmoy and Yamada, Makoto and Hospedales, Timothy (2018):
Learning Unsupervised Word Translations Without Adversaries, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1063,
author = {Mukherjee, Tanmoy and Yamada, Makoto and Hospedales, Timothy},
title = {Learning Unsupervised Word Translations Without Adversaries},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1063},
pages = {627--632},
year = 2018
}
Mukherjee et al. (2018) use squared-loss mutual information (SMI) as optimization measure to match the monolingual distributions.
Zhou, Chunting and Ma, Xuezhe and Wang, Di and Neubig, Graham (2019):
Density Matching for Bilingual Word Embedding, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

@inproceedings{zhou-etal-2019-density,
author = {Zhou, Chunting and Ma, Xuezhe and Wang, Di and Neubig, Graham},
title = {Density Matching for Bilingual Word Embedding},
booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
month = {jun},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/N19-1161},
pages = {1588--1598},
year = 2019
}
Zhou et al. (2019) first learn a density distribution over the each of the monolingual word embedding spaces using Gaussian mixture models, and then map these spaces so that word vectors in one language are mapped to vectors in the language space with similar density, measured with KL divergence.
Instead of learning a mapping between embedding spaces,
Marie, Benjamin and Fujita, Atsushi (2019):
Unsupervised Joint Training of Bilingual Word Embeddings, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{marie-fujita-2019-unsupervised-joint,
author = {Marie, Benjamin and Fujita, Atsushi},
title = {Unsupervised Joint Training of Bilingual Word Embeddings},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1312},
pages = {3224--3230},
year = 2019
}
Marie and Fujita (2019) learn a joint embedding space for multiple languages using a skip-gram model trained on mixed-language text. Their method is bootstrapped with
unsupervised statistical machine translation.
Wada, Takashi and Iwata, Tomoharu and Matsumoto, Yuji (2019):
Unsupervised Multilingual Word Embedding with Limited Resources using Neural Language Models, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{wada-etal-2019-unsupervised,
author = {Wada, Takashi and Iwata, Tomoharu and Matsumoto, Yuji},
title = {Unsupervised Multilingual Word Embedding with Limited Resources using Neural Language Models},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1300},
pages = {3113--3124},
year = 2019
}
Wada et al. (2019) train a multi-lingual bidirectional language model with language-specific embeddings but shared state progression parameters. The resulting word embeddings are in a common space, with words close to their translations.
Properties of the Mapping
Methods that operate on fixed monolingual embedding spaces often learn a linear mapping between them, hence assuming that they are orthogonal.
Nakashole, Ndapa and Flauger, Raphael (2018):
Characterizing Departures from Linearity in Word Translation, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

@InProceedings{P18-2036,
author = {Nakashole, Ndapa and Flauger, Raphael},
title = {Characterizing Departures from Linearity in Word Translation},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
publisher = {Association for Computational Linguistics},
pages = {221--227},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/P18-2036},
year = 2018
}
Nakashole and Flauger (2018) show that this assumption is less accurate when distant languages are involved.
Søgaard, Anders and Ruder, Sebastian and Vuli\'c, Ivan (2018):
On the Limitations of Unsupervised Bilingual Dictionary Induction, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

@inproceedings{sogaard-etal-2018-limitations,
author = {S{\o}gaard, Anders and Ruder, Sebastian and Vuli{\'c}, Ivan},
title = {On the Limitations of Unsupervised Bilingual Dictionary Induction},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {jul},
address = {Melbourne, Australia},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P18-1072},
doi = {10.18653/v1/P18-1072},
pages = {778--788},
year = 2018
}
Søgaard et al. (2018) find the same when the languages are linguistically different, using a metric based on eigenvectors. They also note that the method works less well when the monolingual data is not drawn from the same domain or when different methods for monolingual word embedding training are used.
Nakashole, Ndapa (2018):
NORMA: Neighborhood Sensitive Maps for Multilingual Word Embeddings, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1047,
author = {Nakashole, Ndapa},
title = {NORMA: Neighborhood Sensitive Maps for Multilingual Word Embeddings},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1047},
pages = {512--522},
year = 2018
}
Nakashole (2018) proposes to use linear mappings that are local to neighborhoods of words.
Xing, Chao and Wang, Dong and Liu, Chao and Lin, Yiye (2015):
Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation, Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
mentioned in Embeddings and Multilingual Word Embeddings@InProceedings{xing-EtAl:2015:NAACL-HLT,
author = {Xing, Chao and Wang, Dong and Liu, Chao and Lin, Yiye},
title = {Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation},
booktitle = {Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
month = {May--June},
address = {Denver, Colorado},
publisher = {Association for Computational Linguistics},
pages = {1006--1011},
url = {
http://www.aclweb.org/anthology/N15-1104},
year = 2015
}
Xing et al. (2015) argue that the mapping matrix should be orthogonal and show improvements when constraining it thus.
Patra, Barun and Moniz, Joel Ruben Antony and Garg, Sarthak and Gormley, Matthew R. and Neubig, Graham (2019):
Bilingual Lexicon Induction with Semi-supervision in Non-Isometric Embedding Spaces, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{patra-etal-2019-bilingual,
author = {Patra, Barun and Moniz, Joel Ruben Antony and Garg, Sarthak and Gormley, Matthew R. and Neubig, Graham},
title = {Bilingual Lexicon Induction with Semi-supervision in Non-Isometric Embedding Spaces},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1018},
pages = {184--193},
year = 2019
}
Patra et al. (2019) relax orthogonality to a soft constraint in the training objective.
Hubness Problem
An identified problem in finding the most similar word in another languages is the hubness problem. Some words are close to many other words and hence get more frequently identified as translations.
Alexis Conneau and Guillaume Lample and Marc'Aurelio Ranzato and Ludovic Denoyer and Hervé Jégou (2018):
Word translation without parallel data, International Conference on Learning Representations

@inproceedings{lample2018word,
author = {Alexis Conneau and Guillaume Lample and Marc'Aurelio Ranzato and Ludovic Denoyer and Hervé Jégou},
title = {Word translation without parallel data},
booktitle = {International Conference on Learning Representations},
url = {
https://openreview.net/pdf?id=H196sainb},
year = 2018
}
Conneau et al. (2018) consider the average distance to neighboring words in the other language and scale the distance calculation accordingly.
Joulin, Armand and Bojanowski, Piotr and Mikolov, Tomas and Jégou, Hervé and Grave, Edouard (2018):
Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1330,
author = {Joulin, Armand and Bojanowski, Piotr and Mikolov, Tomas and J{\'e}gou, Herv{\'e} and Grave, Edouard},
title = {Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1330},
pages = {2979--2984},
year = 2018
}
Joulin et al. (2018) use this adjustment during training.
Samuel L. Smith and David H. P. Turban and Steven Hamblin and Nils Y. Hammerla (2017):
Offline bilingual word vectors, orthogonal transformations and the inverted softmax, Proceedings of the International Conference on Learning Representations (ICLR)

@inproceedings{iclr2017-smith,
author = {Samuel L. Smith and David H. P. Turban and Steven Hamblin and Nils Y. Hammerla},
title = {Offline bilingual word vectors, orthogonal transformations and the inverted softmax},
booktitle = {Proceedings of the International Conference on Learning Representations (ICLR)},
year = 2017
}
Smith et al. (2017) propose to normalize the distance matrix between input and output words. Given the distances of source word to every target word, the distances are normalized to add up to 1 using the softmax, and vice versa.
Huang, Jiaji and Qiu, Qiang and Church, Kenneth (2019):
Hubless Nearest Neighbor Search for Bilingual Lexicon Induction, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{huang-etal-2019-hubless,
author = {Huang, Jiaji and Qiu, Qiang and Church, Kenneth},
title = {Hubless Nearest Neighbor Search for Bilingual Lexicon Induction},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1399},
pages = {4072--4080},
year = 2019
}
Huang et al. (2019) formalize the underlying intuition behind this idea as an optimization problem to enforce both normalizations jointly and propose a gradient descent method to solve it.
Multilingual Sentence Embeddings
Schwenk, Holger and Douze, Matthijs (2017):
Learning Joint Multilingual Sentence Representations with Neural Machine Translation, Proceedings of the 2nd Workshop on Representation Learning for NLP

@inproceedings{schwenk-douze-2017-learning,
author = {Schwenk, Holger and Douze, Matthijs},
title = {Learning Joint Multilingual Sentence Representations with Neural Machine Translation},
booktitle = {Proceedings of the 2nd Workshop on Representation Learning for {NLP}},
month = {aug},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/W17-2619},
doi = {10.18653/v1/W17-2619},
pages = {157--167},
year = 2017
}
Schwenk and Douze (2017) propose to obtain sentence embeddings from an LSTM-based neural machine translation models by adopting the final encoder state or max-pooling over all encoder states.
Schwenk, Holger (2018):
Filtering and Mining Parallel Data in a Joint Multilingual Space, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
mentioned in Corpus Cleaning and Multilingual Word Embeddings@InProceedings{P18-2037,
author = {Schwenk, Holger},
title = {Filtering and Mining Parallel Data in a Joint Multilingual Space},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
publisher = {Association for Computational Linguistics},
pages = {228--234},
location = {Melbourne, Australia},
url = {
http://aclweb.org/anthology/P18-2037},
year = 2018
}
Schwenk (2018) obtains better results by training a joint encoder for multiple languages and apply it to filter noisy parallel corpora. Similarly,
C. España-Bonet and Á. C. Varga and A. Barrón-Cedeño and J. van Genabith (2017):
An Empirical Analysis of NMT-Derived Interlingual Embeddings and Their Use in Parallel Sentence Identification, IEEE Journal of Selected Topics in Signal Processing

@ARTICLE{Espana-Bonet-sentence-embedding-2017,
author = {C. {Espa\~na-Bonet} and Á. C. {Varga} and A. {Barrón-Cedeño} and J. {van Genabith}},
title = {An Empirical Analysis of NMT-Derived Interlingual Embeddings and Their Use in Parallel Sentence Identification},
journal = {IEEE Journal of Selected Topics in Signal Processing},
volume = {11},
number = {8},
pages = {1340-1350},
doi = {10.1109/JSTSP.2017.2764273},
issn = {1932-4553},
month = {Dec},
url = {
https://arxiv.org/pdf/1704.05415.pdf},
year = 2017
}
España-Bonet et al. (2017) compute the sum of the encoder states to obtain sentence embeddings.
Artetxe, Mikel and Schwenk, Holger (2019):
Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{artetxe-schwenk-2019-margin,
author = {Artetxe, Mikel and Schwenk, Holger},
title = {Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1309},
pages = {3197--3203},
year = 2019
}
Artetxe and Schwenk (2019) presented a encoder-decoder model built specifically to generate sentence embeddings, trained on parallel sentence pairs but with a single sentence embedding vector as the interface between encoder and decoder.
Mikel Artetxe and Holger Schwenk (2018):
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond, CoRR

@article{DBLP:journals/corr/abs-1812-10464,
author = {Mikel Artetxe and Holger Schwenk},
title = {Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond},
journal = {CoRR},
volume = {abs/1812.10464},
url = {
http://arxiv.org/abs/1812.10464},
archiveprefix = {arXiv},
eprint = {1812.10464},
timestamp = {Wed, 02 Jan 2019 14:40:18 +0100},
biburl = {
https://dblp.org/rec/bib/journals/corr/abs-1812-10464},
bibsource = {dblp computer science bibliography,
https://dblp.org},
year = 2018
}
Artetxe and Schwenk (2018) implemented this approach as a freely available toolkit called LASER.
Holger Schwenk and Vishrav Chaudhary and Shuo Sun and Hongyu Gong and Francisco Guzmán (2019):
WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia, CoRR

@article{DBLP:journals/corr/abs-1907-05791,
author = {Holger Schwenk and Vishrav Chaudhary and Shuo Sun and Hongyu Gong and Francisco Guzm{\'{a}}n},
title = {WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia},
journal = {CoRR},
volume = {abs/1907.05791},
url = {
http://arxiv.org/abs/1907.05791},
archiveprefix = {arXiv},
eprint = {1907.05791},
timestamp = {Wed, 17 Jul 2019 10:27:36 +0200},
biburl = {
https://dblp.org/rec/bib/journals/corr/abs-1907-05791},
bibsource = {dblp computer science bibliography,
https://dblp.org},
year = 2019
}
Schwenk et al. (2019) use it to extract large parallel corpora from Wikipedia.
Ruiter, Dana and España-Bonet, Cristina and van Genabith, Josef (2019):
Self-Supervised Neural Machine Translation, Proceedings of the 57th Conference of the Association for Computational Linguistics

@inproceedings{ruiter-etal-2019-self,
author = {Ruiter, Dana and Espa{\~n}a-Bonet, Cristina and van Genabith, Josef},
title = {Self-Supervised Neural Machine Translation},
booktitle = {Proceedings of the 57th Conference of the Association for Computational Linguistics},
month = {jul},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/P19-1178},
pages = {1828--1834},
year = 2019
}
Ruiter et al. (2019) compute sentence embeddings as sum of word embeddings or encoder states of a neural machine translation model. They use these sentence embeddings to find parallel sentence pairs in a comparable corpus and iterate this process to improve the translation model and then find more and better sentence pairs.
Multilingual Document Embeddings
With the aim to address the task of aligning bilingual documents,
Guo, Mandy and Yang, Yinfei and Stevens, Keith and Cer, Daniel and Ge, Heming and Sung, Yun-hsuan and Strope, Brian and Kurzweil, Ray (2019):
Hierarchical Document Encoder for Parallel Corpus Mining, Proceedings of the Fourth Conference on Machine Translation

@InProceedings{guo-EtAl:2019:WMT2,
author = {Guo, Mandy and Yang, Yinfei and Stevens, Keith and Cer, Daniel and Ge, Heming and Sung, Yun-hsuan and Strope, Brian and Kurzweil, Ray},
title = {Hierarchical Document Encoder for Parallel Corpus Mining},
booktitle = {Proceedings of the Fourth Conference on Machine Translation},
month = {August},
address = {Florence, Italy},
publisher = {Association for Computational Linguistics},
pages = {64--72},
url = {
http://www.aclweb.org/anthology/W19-5207},
year = 2019
}
Guo et al. (2019) present a model to obtain document embeddings, built from word and sentence embeddings.
Benchmarks
Discussion
Related Topics
New Publications
Unsupervised Methods
Aldarmaki, Hanan and Mohan, Mahesh and Diab, Mona (2018):
Unsupervised Word Mapping Using Structural Similarities in Monolingual Embeddings, Transactions of the Association for Computational Linguistics

@article{Q18-1014,
author = {Aldarmaki, Hanan and Mohan, Mahesh and Diab, Mona},
title = {Unsupervised Word Mapping Using Structural Similarities in Monolingual Embeddings},
journal = {Transactions of the Association for Computational Linguistics},
volume = {6},
url = {
https://www.aclweb.org/anthology/Q18-1014},
pages = {185--196},
year = 2018
}
Aldarmaki et al. (2018)
Chi, Ta Chung and Chen, Yun-Nung (2018):
CLUSE: Cross-Lingual Unsupervised Sense Embeddings, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1025,
author = {Chi, Ta Chung and Chen, Yun-Nung},
title = {CLUSE: Cross-Lingual Unsupervised Sense Embeddings},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1025},
pages = {271--281},
year = 2018
}
Chi and Chen (2018)
Duong, Long and Kanayama, Hiroshi and Ma, Tengfei and Bird, Steven and Cohn, Trevor (2016):
Learning Crosslingual Word Embeddings without Bilingual Corpora, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

@InProceedings{duong-EtAl:2016:EMNLP2016,
author = {Duong, Long and Kanayama, Hiroshi and Ma, Tengfei and Bird, Steven and Cohn, Trevor},
title = {Learning Crosslingual Word Embeddings without Bilingual Corpora},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {1285--1295},
url = {
https://aclweb.org/anthology/D16-1136},
year = 2016
}
Duong et al. (2016)
Other
Ramesh, Sree Harsha and Sankaranarayanan, Krishna Prasad (2018):
Neural Machine Translation for Low Resource Languages using Bilingual Lexicon Induced from Comparable Corpora, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

@InProceedings{N18-4016,
author = {Ramesh, Sree Harsha and Sankaranarayanan, Krishna Prasad},
title = {Neural Machine Translation for Low Resource Languages using Bilingual Lexicon Induced from Comparable Corpora},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop},
publisher = {Association for Computational Linguistics},
pages = {112--119},
location = {New Orleans, Louisiana, USA},
url = {
http://aclweb.org/anthology/N18-4016},
year = 2018
}
Ramesh and Sankaranarayanan (2018)
Hazem, Amir and Morin, Emmanuel (2017):
Bilingual Word Embeddings for Bilingual Terminology Extraction from Specialized Comparable Corpora, Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

@inproceedings{hazem-morin-2017-bilingual,
author = {Hazem, Amir and Morin, Emmanuel},
title = {Bilingual Word Embeddings for Bilingual Terminology Extraction from Specialized Comparable Corpora},
booktitle = {Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
month = {nov},
address = {Taipei, Taiwan},
publisher = {Asian Federation of Natural Language Processing},
url = {
https://www.aclweb.org/anthology/I17-1069},
pages = {685--693},
year = 2017
}
Hazem and Morin (2017)
Doval, Yerai and Camacho-Collados, Jose and Espinosa Anke, Luis and Schockaert, Steven (2018):
Improving Cross-Lingual Word Embeddings by Meeting in the Middle, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1027,
author = {Doval, Yerai and Camacho-Collados, Jose and Espinosa Anke, Luis and Schockaert, Steven},
title = {Improving Cross-Lingual Word Embeddings by Meeting in the Middle},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1027},
pages = {294--304},
year = 2018
}
Doval et al. (2018)
Ruder, Sebastian and Cotterell, Ryan and Kementchedjhieva, Yova and Søgaard, Anders (2018):
A Discriminative Latent-Variable Model for Bilingual Lexicon Induction, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1042,
author = {Ruder, Sebastian and Cotterell, Ryan and Kementchedjhieva, Yova and S{\o}gaard, Anders},
title = {A Discriminative Latent-Variable Model for Bilingual Lexicon Induction},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1042},
pages = {458--468},
year = 2018
}
Ruder et al. (2018)
Dou, Zi-Yi and Zhou, Zhi-Hao and Huang, Shujian (2018):
Unsupervised Bilingual Lexicon Induction via Latent Variable Models, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

@inproceedings{D18-1062,
author = {Dou, Zi-Yi and Zhou, Zhi-Hao and Huang, Shujian},
title = {Unsupervised Bilingual Lexicon Induction via Latent Variable Models},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {
https://www.aclweb.org/anthology/D18-1062},
pages = {621--626},
year = 2018
}
Dou et al. (2018)
Duong, Long and Kanayama, Hiroshi and Ma, Tengfei and Bird, Steven and Cohn, Trevor (2017):
Multilingual Training of Crosslingual Word Embeddings, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

@InProceedings{duong-EtAl:2017:EACLlong,
author = {Duong, Long and Kanayama, Hiroshi and Ma, Tengfei and Bird, Steven and Cohn, Trevor},
title = {Multilingual Training of Crosslingual Word Embeddings},
booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers},
month = {April},
address = {Valencia, Spain},
publisher = {Association for Computational Linguistics},
pages = {894--904},
url = {
http://www.aclweb.org/anthology/E17-1084},
year = 2017
}
Duong et al. (2017)
Cao, Hailong and Zhao, Tiejun and ZHANG, Shu and Meng, Yao (2016):
A Distribution-based Model to Learn Bilingual Word Embeddings, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

@InProceedings{cao-EtAl:2016:COLING2,
author = {Cao, Hailong and Zhao, Tiejun and ZHANG, Shu and Meng, Yao},
title = {A Distribution-based Model to Learn Bilingual Word Embeddings},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {1818--1827},
url = {
http://aclweb.org/anthology/C16-1171},
year = 2016
}
Cao et al. (2016)
Shi, Tianze and Liu, Zhiyuan and Liu, Yang and Sun, Maosong (2015):
Learning Cross-lingual Word Embeddings via Matrix Co-factorization, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

@InProceedings{shi-EtAl:2015:ACL-IJCNLP1,
author = {Shi, Tianze and Liu, Zhiyuan and Liu, Yang and Sun, Maosong},
title = {Learning Cross-lingual Word Embeddings via Matrix Co-factorization},
booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
month = {July},
address = {Beijing, China},
publisher = {Association for Computational Linguistics},
pages = {567--572},
url = {
http://www.aclweb.org/anthology/P15-2093},
year = 2015
}
Shi et al. (2015)
Hermann, Karl Moritz and Blunsom, Phil (2014):
Multilingual Distributed Representations without Word Alignment, Proceedings of ICLR

@InProceedings{Hermann:2014:ICLR,
author = {Hermann, Karl Moritz and Blunsom, Phil},
title = {{Multilingual Distributed Representations without Word Alignment}},
booktitle = {Proceedings of ICLR},
month = {apr},
url = {
http://arxiv.org/abs/1312.6173},
year = 2014
}
Hermann and Blunsom (2014)
Zou, Will Y. and Socher, Richard and Cer, Daniel and Manning, Christopher D. (2013):
Bilingual Word Embeddings for Phrase-Based Machine Translation, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

@InProceedings{zou-EtAl:2013:EMNLP,
author = {Zou, Will Y. and Socher, Richard and Cer, Daniel and Manning, Christopher D.},
title = {Bilingual Word Embeddings for Phrase-Based Machine Translation},
booktitle = {Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing},
month = {October},
address = {Seattle, Washington, USA},
publisher = {Association for Computational Linguistics},
pages = {1393--1398},
url = {
http://www.aclweb.org/anthology/D13-1141},
year = 2013
}
Zou et al. (2013)
Chandar A P, Sarath and Lauly, Stanislas and Larochelle, Hugo and Khapra, Mitesh and Ravindran, Balaraman and Raykar, Vikas C and Saha, Amrita (2014):
An Autoencoder Approach to Learning Bilingual Word Representations, Advances in Neural Information Processing Systems 27

@incollection{NIPS2014-5270,
author = {Chandar~A~P, Sarath and Lauly, Stanislas and Larochelle, Hugo and Khapra, Mitesh and Ravindran, Balaraman and Raykar, Vikas C and Saha, Amrita},
title = {An Autoencoder Approach to Learning Bilingual Word Representations},
booktitle = {Advances in Neural Information Processing Systems 27},
editor = {Z. Ghahramani and M. Welling and C. Cortes and N.D. Lawrence and K.Q. Weinberger},
pages = {1853--1861},
publisher = {Curran Associates, Inc.},
url = {
http://papers.nips.cc/paper/5270-an-autoencoder-approach-to-learning-bilingual-word-representations.pdf},
year = 2014
}
Chandar A P et al. (2014)
Huang, Kejun and Gardner, Matt and Papalexakis, Evangelos and Faloutsos, Christos and Sidiropoulos, Nikos and Mitchell, Tom and Talukdar, Partha P. and Fu, Xiao (2015):
Translation Invariant Word Embeddings, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

@InProceedings{huang-EtAl:2015:EMNLP,
author = {Huang, Kejun and Gardner, Matt and Papalexakis, Evangelos and Faloutsos, Christos and Sidiropoulos, Nikos and Mitchell, Tom and Talukdar, Partha P. and Fu, Xiao},
title = {Translation Invariant Word Embeddings},
booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing},
month = {September},
address = {Lisbon, Portugal},
publisher = {Association for Computational Linguistics},
pages = {1084--1088},
url = {
http://aclweb.org/anthology/D15-1127},
year = 2015
}
Huang et al. (2015)
Su, Jinsong and Xiong, Deyi and Zhang, Biao and Liu, Yang and Yao, Junfeng and Zhang, Min (2015):
Bilingual Correspondence Recursive Autoencoder for Statistical Machine Translation, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

@InProceedings{su-EtAl:2015:EMNLP2,
author = {Su, Jinsong and Xiong, Deyi and Zhang, Biao and Liu, Yang and Yao, Junfeng and Zhang, Min},
title = {Bilingual Correspondence Recursive Autoencoder for Statistical Machine Translation},
booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing},
month = {September},
address = {Lisbon, Portugal},
publisher = {Association for Computational Linguistics},
pages = {1248--1258},
url = {
http://aclweb.org/anthology/D15-1146},
year = 2015
}
Su et al. (2015)