Search Descriptions


Neural machine Translation

Statistical Machine Translation

Search Publications






Embeddings of words, phrases, sentences, and entire documents have several uses, one among them is to work towards interlingual representations of meaning.

Embeddings is the main subject of 26 publications. 10 are discussed here.


Word embeddings have become a common feature in current research in natural language processing. Mikolov et al. (2013) propose the skip-gram method to obtain these representations. Mikolov et al. (2013) introduce efficient training methods for the skip-gram and continuous bag of words models, are used in the very popular word2vec implementation and publicly available word embedding sets for many languages.
Pennington et al. (2014) train word embedding models on the co-occurrence statistics of a word over the entire corpus.

Contextualized Word Embeddings

Peters et al. (2018) demonstrate that various natural language tasks can be improved by contextualizing word embeddings through bi-directional neural language model layers (called ELMo), just as it is done in encoders in machine translations. Devlin et al. (2019) show superior results with a method called BERT which pre-trains word embeddings on a masked language model and next sentence prediction task using the transformer architecture. Yang et al. (2019) refine the BERT model by predicting one masked word at a time, with permutation of the order of the masked words. They call their variant XLNet.

Using Pre-Training Word Embedding

Xing et al. (2015) point out inconsistencies in the representation of word embeddings and the objective function for translation transforms between word embeddings, which they address with normalization. Hirasawa et al. (2019) de-bias word embeddings and show gains with pre-trained word embeddings in a low resource setting.

Phrase Embeddings

Zhang et al. (2014) learn phrase embeddings using recursive neural networks and auto-encoders and a mapping between input and output phrase to add an additional score to the phrase translations and to filter the phrase table. Hu et al. (2015) use convolutional neural networks to encode the input and output phrase and pass them to matching that computes their similarity. They include the full input sentence context in the and use a learning strategy called curriculum learning that first learns from the easy training examples and then the harder ones.



Related Topics

New Publications

  • Unanue et al. (2019)
  • McCann et al. (2017)
  • Mrksic et al. (2017)
  • Wieting and Gimpel (2018)
  • Pilehvar and Collier (2017)
  • Passban et al. (2016)
  • Sergienya and Schütze (2015)
  • Köhn (2015)
  • Sachdeva and Sharma (2015)
  • Zhao et al. (2015)
  • Garcia et al. (2014)
  • Ha et al. (2014)
  • Gao et al. (2014)
  • Cho et al. (2014)
  • Levinboim and Chiang (2015)
  • Alkhouli et al. (2014)