Multilingual Word Embeddings

Mapping between the word embeddings spaces for different languages, or a common word embedding space for all languages enables a shared semantic space that reveals word correspondences across languages.

Multilingual Word Embeddings is the main subject of 60 publications. 44 are discussed here.

Topics in NeuralNetworkModels

Publications

Ruder et al. (2017) gives a comprehensive overview of work on cross-lingual word embeddings. The observation that word representations obtained from their distributional properties (i.e., how they are used in text) are similar across languages has been made long known, but Mikolov et al. (2013) was among the first to observe this for the word embeddings generated by neural models and suggest that a simple linear transformation from word embeddings in one language to word embeddings in another language may be used to translate words.

Aligning Embedding Spaces

Mikolov et al. (2013) learn the linear mapping between pre-existing embedding spaces by minimizing the distance between a projected source word vector and a target word vector for a given seed lexicon. Xing et al. (2015) improve this method by requiring the mapping matrix to be orthogonal. Artetxe et al. (2016) refine this method further with mean centering. Faruqui and Dyer (2014) map monolingually generated word embeddings into a shared bilingual embedding state using canonical correlation analysis by maximizing the correlation of the two vector for each word translation pair. Braune et al. (2018) point out that the accuracy of obtained bilingual lexicons is much lower for rare words, a problem that can be somewhat addressed with additional features such as representations built on letter n-grams and taking orthographic distance into account when mapping words. Heyman et al. (2019) learn a linear transform between embedding spaces based on an automatically generated seed lexicon and show improvements by incrementally adding languages and matching the spaces of newly added languages to all previous languages (multi-hub). Alqaisi and O'Keefe (2019) consider the problem of morphological rich languages at the example of Arabic and demonstrate the importance of morphological analysis and word splitting.

Seed Lexicon

Supervised and semi-supervised approaches to map embedding spaces require a seed lexicon of word translation pairs. These are most commonly generated with traditional statistical methods from parallel corpora (Faruqui and Dyer, 2014). Lubin et al. (2019) address the problem of noisy word pairs in such automatically generated lexicons, showing that they cause significant harm, and develop a method that learns the noise level and finds noisy pairs. Søgaard et al. (2018) use identically spelled words in both languages as seeds. Shi et al. (2019) use off-the-shelf bilingual dictionaries and detail how such human-targeted dictionary definitions needs to be preprocessed. Artetxe et al. (2017) reduce the need for large seed dictionaries by starting with just 25 entries and iteratively increasing the dictionary based on the obtained mappings. Making do with weaker supervision, Gouws et al. (2015) learn directly from sentence pairs by predicting words in a target sentence from words in a source sentence. Coulmance et al. (2015) explore a variant of this idea. Vulić and Moens (2015) use pairs of Wikipedia document pairs, aiming to predict words in mixed language documents. Zhou et al. (2019) use identically spelled words as seeds. Vulić and Korhonen (2016) compare different types and sizes of seed lexicons.

Unsupervised Methods

Barone (2016) suggests the idea of using auto-encoders and adversarial training to learn a mapping between monolingual alignment spaces without any parallel data or any other bilingual signal but does not report any results. Zhang et al. (2017) demonstrate the effectiveness this idea, exploring both unidirectional and bidirectional mappings. Conneau et al. (2018) add a fine-tuning step based on a synthetic dictionary of high-confidence word pairs, achieving vastly better results. Mohiuddin and Joty (2019) extend this approach into a symmetric setup that learns mappings into both directions, with a discriminator for each languages (called a CycleGAN), and reconstruction loss as a training objective component. Xu et al. (2018) propose a similar method, using Sinkhorn distance. Chen and Cardie (2018) extend the adversarial training approach to more than two languages.

Instead of using adversarial training, Zhang et al. (2017) measure the difference between the two embedding spaces with earth mover's distance, defined as the sum of distances of how far each word vector has to be moved towards the nearest vector in the other language's embedding space. Hoshen and Wolf (2018) follow the same intuition but first reduce the complexity of the word vectors with principle component analysis (PCA) and align the spaces alongside the resulting axis first. Their iterative algorithm moves the projections of word vectors to the closest target-side vector in the projected space. Alvarez-Melis and Jaakkola (2018) draw parallels between this approach and Optimal Transport. In their method, they minimize the distance between a projected vector and all target-side vectors, measured by the L2 norm. Alaux et al. (2019) extend this to more than two languages, by mapping all languages into a common space and matching the word embedding distributions of any two languages at a time. Mukherjee et al. (2018) use squared-loss mutual information (SMI) as optimization measure to match the monolingual distributions. Zhou et al. (2019) first learn a density distribution over the each of the monolingual word embedding spaces using Gaussian mixture models, and then map these spaces so that word vectors in one language are mapped to vectors in the language space with similar density, measured with KL divergence.

Instead of learning a mapping between embedding spaces, Marie and Fujita (2019) learn a joint embedding space for multiple languages using a skip-gram model trained on mixed-language text. Their method is bootstrapped with unsupervised statistical machine translation. Wada et al. (2019) train a multi-lingual bidirectional language model with language-specific embeddings but shared state progression parameters. The resulting word embeddings are in a common space, with words close to their translations.

Properties of the Mapping

Methods that operate on fixed monolingual embedding spaces often learn a linear mapping between them, hence assuming that they are orthogonal. Nakashole and Flauger (2018) show that this assumption is less accurate when distant languages are involved. Søgaard et al. (2018) find the same when the languages are linguistically different, using a metric based on eigenvectors. They also note that the method works less well when the monolingual data is not drawn from the same domain or when different methods for monolingual word embedding training are used. Nakashole (2018) proposes to use linear mappings that are local to neighborhoods of words. Xing et al. (2015) argue that the mapping matrix should be orthogonal and show improvements when constraining it thus. Patra et al. (2019) relax orthogonality to a soft constraint in the training objective.

Hubness Problem

An identified problem in finding the most similar word in another languages is the hubness problem. Some words are close to many other words and hence get more frequently identified as translations. Conneau et al. (2018) consider the average distance to neighboring words in the other language and scale the distance calculation accordingly. Joulin et al. (2018) use this adjustment during training. Smith et al. (2017) propose to normalize the distance matrix between input and output words. Given the distances of source word to every target word, the distances are normalized to add up to 1 using the softmax, and vice versa. Huang et al. (2019) formalize the underlying intuition behind this idea as an optimization problem to enforce both normalizations jointly and propose a gradient descent method to solve it.

Multilingual Sentence Embeddings

Schwenk and Douze (2017) propose to obtain sentence embeddings from an LSTM-based neural machine translation models by adopting the final encoder state or max-pooling over all encoder states. Schwenk (2018) obtains better results by training a joint encoder for multiple languages and apply it to filter noisy parallel corpora. Similarly, España-Bonet et al. (2017) compute the sum of the encoder states to obtain sentence embeddings. Artetxe and Schwenk (2019) presented a encoder-decoder model built specifically to generate sentence embeddings, trained on parallel sentence pairs but with a single sentence embedding vector as the interface between encoder and decoder. Artetxe and Schwenk (2018) implemented this approach as a freely available toolkit called LASER. Schwenk et al. (2019) use it to extract large parallel corpora from Wikipedia. Ruiter et al. (2019) compute sentence embeddings as sum of word embeddings or encoder states of a neural machine translation model. They use these sentence embeddings to find parallel sentence pairs in a comparable corpus and iterate this process to improve the translation model and then find more and better sentence pairs.

Multilingual Document Embeddings

With the aim to address the task of aligning bilingual documents, Guo et al. (2019) present a model to obtain document embeddings, built from word and sentence embeddings.

Benchmarks

Discussion

New Publications

Unsupervised Methods

Aldarmaki, Hanan and Mohan, Mahesh and Diab, Mona (2018): Unsupervised Word Mapping Using Structural Similarities in Monolingual Embeddings, Transactions of the Association for Computational Linguistics
add
@article{Q18-1014,
author = {Aldarmaki, Hanan and Mohan, Mahesh and Diab, Mona},
title = {Unsupervised Word Mapping Using Structural Similarities in Monolingual Embeddings},
journal = {Transactions of the Association for Computational Linguistics},
volume = {6},
url = {https://www.aclweb.org/anthology/Q18-1014},
pages = {185--196},
year = 2018
}
Aldarmaki et al. (2018)
Chi, Ta Chung and Chen, Yun-Nung (2018): CLUSE: Cross-Lingual Unsupervised Sense Embeddings, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
add
@inproceedings{D18-1025,
author = {Chi, Ta Chung and Chen, Yun-Nung},
title = {CLUSE: Cross-Lingual Unsupervised Sense Embeddings},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {https://www.aclweb.org/anthology/D18-1025},
pages = {271--281},
year = 2018
}
Chi and Chen (2018)
Duong, Long and Kanayama, Hiroshi and Ma, Tengfei and Bird, Steven and Cohn, Trevor (2016): Learning Crosslingual Word Embeddings without Bilingual Corpora, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{duong-EtAl:2016:EMNLP2016,
author = {Duong, Long and Kanayama, Hiroshi and Ma, Tengfei and Bird, Steven and Cohn, Trevor},
title = {Learning Crosslingual Word Embeddings without Bilingual Corpora},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {1285--1295},
url = {https://aclweb.org/anthology/D16-1136},
year = 2016
}
Duong et al. (2016)

Other

Ramesh, Sree Harsha and Sankaranarayanan, Krishna Prasad (2018): Neural Machine Translation for Low Resource Languages using Bilingual Lexicon Induced from Comparable Corpora, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
add
@InProceedings{N18-4016,
author = {Ramesh, Sree Harsha and Sankaranarayanan, Krishna Prasad},
title = {Neural Machine Translation for Low Resource Languages using Bilingual Lexicon Induced from Comparable Corpora},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop},
publisher = {Association for Computational Linguistics},
pages = {112--119},
location = {New Orleans, Louisiana, USA},
url = {http://aclweb.org/anthology/N18-4016},
year = 2018
}
Ramesh and Sankaranarayanan (2018)
Hazem, Amir and Morin, Emmanuel (2017): Bilingual Word Embeddings for Bilingual Terminology Extraction from Specialized Comparable Corpora, Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
add
@inproceedings{hazem-morin-2017-bilingual,
author = {Hazem, Amir and Morin, Emmanuel},
title = {Bilingual Word Embeddings for Bilingual Terminology Extraction from Specialized Comparable Corpora},
booktitle = {Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
month = {nov},
address = {Taipei, Taiwan},
publisher = {Asian Federation of Natural Language Processing},
url = {https://www.aclweb.org/anthology/I17-1069},
pages = {685--693},
year = 2017
}
Hazem and Morin (2017)
Doval, Yerai and Camacho-Collados, Jose and Espinosa Anke, Luis and Schockaert, Steven (2018): Improving Cross-Lingual Word Embeddings by Meeting in the Middle, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
add
@inproceedings{D18-1027,
author = {Doval, Yerai and Camacho-Collados, Jose and Espinosa Anke, Luis and Schockaert, Steven},
title = {Improving Cross-Lingual Word Embeddings by Meeting in the Middle},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {https://www.aclweb.org/anthology/D18-1027},
pages = {294--304},
year = 2018
}
Doval et al. (2018)
Ruder, Sebastian and Cotterell, Ryan and Kementchedjhieva, Yova and Søgaard, Anders (2018): A Discriminative Latent-Variable Model for Bilingual Lexicon Induction, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
add
@inproceedings{D18-1042,
author = {Ruder, Sebastian and Cotterell, Ryan and Kementchedjhieva, Yova and S{\o}gaard, Anders},
title = {A Discriminative Latent-Variable Model for Bilingual Lexicon Induction},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {https://www.aclweb.org/anthology/D18-1042},
pages = {458--468},
year = 2018
}
Ruder et al. (2018)
Dou, Zi-Yi and Zhou, Zhi-Hao and Huang, Shujian (2018): Unsupervised Bilingual Lexicon Induction via Latent Variable Models, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
add
@inproceedings{D18-1062,
author = {Dou, Zi-Yi and Zhou, Zhi-Hao and Huang, Shujian},
title = {Unsupervised Bilingual Lexicon Induction via Latent Variable Models},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
address = {Brussels, Belgium},
publisher = {Association for Computational Linguistics},
url = {https://www.aclweb.org/anthology/D18-1062},
pages = {621--626},
year = 2018
}
Dou et al. (2018)
Duong, Long and Kanayama, Hiroshi and Ma, Tengfei and Bird, Steven and Cohn, Trevor (2017): Multilingual Training of Crosslingual Word Embeddings, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers
add
@InProceedings{duong-EtAl:2017:EACLlong,
author = {Duong, Long and Kanayama, Hiroshi and Ma, Tengfei and Bird, Steven and Cohn, Trevor},
title = {Multilingual Training of Crosslingual Word Embeddings},
booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers},
month = {April},
address = {Valencia, Spain},
publisher = {Association for Computational Linguistics},
pages = {894--904},
url = {http://www.aclweb.org/anthology/E17-1084},
year = 2017
}
Duong et al. (2017)
Cao, Hailong and Zhao, Tiejun and ZHANG, Shu and Meng, Yao (2016): A Distribution-based Model to Learn Bilingual Word Embeddings, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
add
@InProceedings{cao-EtAl:2016:COLING2,
author = {Cao, Hailong and Zhao, Tiejun and ZHANG, Shu and Meng, Yao},
title = {A Distribution-based Model to Learn Bilingual Word Embeddings},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {1818--1827},
url = {http://aclweb.org/anthology/C16-1171},
year = 2016
}
Cao et al. (2016)
Shi, Tianze and Liu, Zhiyuan and Liu, Yang and Sun, Maosong (2015): Learning Cross-lingual Word Embeddings via Matrix Co-factorization, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
add
@InProceedings{shi-EtAl:2015:ACL-IJCNLP1,
author = {Shi, Tianze and Liu, Zhiyuan and Liu, Yang and Sun, Maosong},
title = {Learning Cross-lingual Word Embeddings via Matrix Co-factorization},
booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
month = {July},
address = {Beijing, China},
publisher = {Association for Computational Linguistics},
pages = {567--572},
url = {http://www.aclweb.org/anthology/P15-2093},
year = 2015
}
Shi et al. (2015)
Hermann, Karl Moritz and Blunsom, Phil (2014): Multilingual Distributed Representations without Word Alignment, Proceedings of ICLR
add
@InProceedings{Hermann:2014:ICLR,
author = {Hermann, Karl Moritz and Blunsom, Phil},
title = {{Multilingual Distributed Representations without Word Alignment}},
booktitle = {Proceedings of ICLR},
month = {apr},
url = {http://arxiv.org/abs/1312.6173},
year = 2014
}
Hermann and Blunsom (2014)
Zou, Will Y. and Socher, Richard and Cer, Daniel and Manning, Christopher D. (2013): Bilingual Word Embeddings for Phrase-Based Machine Translation, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{zou-EtAl:2013:EMNLP,
author = {Zou, Will Y. and Socher, Richard and Cer, Daniel and Manning, Christopher D.},
title = {Bilingual Word Embeddings for Phrase-Based Machine Translation},
booktitle = {Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing},
month = {October},
address = {Seattle, Washington, USA},
publisher = {Association for Computational Linguistics},
pages = {1393--1398},
url = {http://www.aclweb.org/anthology/D13-1141},
year = 2013
}
Zou et al. (2013)
Chandar A P, Sarath and Lauly, Stanislas and Larochelle, Hugo and Khapra, Mitesh and Ravindran, Balaraman and Raykar, Vikas C and Saha, Amrita (2014): An Autoencoder Approach to Learning Bilingual Word Representations, Advances in Neural Information Processing Systems 27
add
@incollection{NIPS2014-5270,
author = {Chandar~A~P, Sarath and Lauly, Stanislas and Larochelle, Hugo and Khapra, Mitesh and Ravindran, Balaraman and Raykar, Vikas C and Saha, Amrita},
title = {An Autoencoder Approach to Learning Bilingual Word Representations},
booktitle = {Advances in Neural Information Processing Systems 27},
editor = {Z. Ghahramani and M. Welling and C. Cortes and N.D. Lawrence and K.Q. Weinberger},
pages = {1853--1861},
publisher = {Curran Associates, Inc.},
url = {http://papers.nips.cc/paper/5270-an-autoencoder-approach-to-learning-bilingual-word-representations.pdf},
year = 2014
}
Chandar A P et al. (2014)
Huang, Kejun and Gardner, Matt and Papalexakis, Evangelos and Faloutsos, Christos and Sidiropoulos, Nikos and Mitchell, Tom and Talukdar, Partha P. and Fu, Xiao (2015): Translation Invariant Word Embeddings, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{huang-EtAl:2015:EMNLP,
author = {Huang, Kejun and Gardner, Matt and Papalexakis, Evangelos and Faloutsos, Christos and Sidiropoulos, Nikos and Mitchell, Tom and Talukdar, Partha P. and Fu, Xiao},
title = {Translation Invariant Word Embeddings},
booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing},
month = {September},
address = {Lisbon, Portugal},
publisher = {Association for Computational Linguistics},
pages = {1084--1088},
url = {http://aclweb.org/anthology/D15-1127},
year = 2015
}
Huang et al. (2015)
Su, Jinsong and Xiong, Deyi and Zhang, Biao and Liu, Yang and Yao, Junfeng and Zhang, Min (2015): Bilingual Correspondence Recursive Autoencoder for Statistical Machine Translation, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{su-EtAl:2015:EMNLP2,
author = {Su, Jinsong and Xiong, Deyi and Zhang, Biao and Liu, Yang and Yao, Junfeng and Zhang, Min},
title = {Bilingual Correspondence Recursive Autoencoder for Statistical Machine Translation},
booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing},
month = {September},
address = {Lisbon, Portugal},
publisher = {Association for Computational Linguistics},
pages = {1248--1258},
url = {http://aclweb.org/anthology/D15-1146},
year = 2015
}
Su et al. (2015)

MT Research Survey Wiki

A Comprehensive Survey of Neural and Statistical Machine Translation Research Publications

Search Descriptions