Neural machine Translation
Statistical Machine Translation
Embeddings of words, phrases, sentences, and entire documents have several uses, one among them is to work towards interlingual representations of meaning.
Embeddings is the main subject of 26 publications. 10 are discussed here.
Topics in NeuralNetworkModelsNeural Language Models | Attention Model | Training | Inference | Coverage | Vocabulary | Embeddings | Multilingual Word Embeddings | Monolingual Data | Adaptation | Linguistic Annotation | Multilingual Multimodal Multitask | Alternative Architectures | Analysis And Visualization | Neural Components In Statistical Machine Translation
Contextualized Word EmbeddingsPeters et al. (2018) demonstrate that various natural language tasks can be improved by contextualizing word embeddings through bi-directional neural language model layers (called ELMo), just as it is done in encoders in machine translations. Devlin et al. (2019) show superior results with a method called BERT which pre-trains word embeddings on a masked language model and next sentence prediction task using the transformer architecture. Yang et al. (2019) refine the BERT model by predicting one masked word at a time, with permutation of the order of the masked words. They call their variant XLNet.
Using Pre-Training Word EmbeddingXing et al. (2015) point out inconsistencies in the representation of word embeddings and the objective function for translation transforms between word embeddings, which they address with normalization. Hirasawa et al. (2019) de-bias word embeddings and show gains with pre-trained word embeddings in a low resource setting.
Phrase EmbeddingsZhang et al. (2014) learn phrase embeddings using recursive neural networks and auto-encoders and a mapping between input and output phrase to add an additional score to the phrase translations and to filter the phrase table. Hu et al. (2015) use convolutional neural networks to encode the input and output phrase and pass them to matching that computes their similarity. They include the full input sentence context in the and use a learning strategy called curriculum learning that first learns from the easy training examples and then the harder ones.