Search Descriptions


Neural machine Translation

Statistical Machine Translation

Search Publications





Monolingual Data

Monolingual data is much more plentiful than parallel data and has been been proven valuable for informing models of fluency and informing the representation of words.

Monolingual Data is the main subject of 33 publications. 22 are discussed here.



Sennrich et al. (2016) back-translate the monolingual data into the input language and use the obtained synthetic parallel corpus as additional training data. Hoang et al. (2018) show that the quality of the machine translation system matters and can be improved by iterative back-translation. Burlot and Yvon (2018) also show that backtranslation quality matters and carry out additional analysis. Edunov et al. (2018) show better results with Monte Carlo search to generate the backtranslation data, i.e., randomly selecting word translations based on the predicted probability distribution. Imamura et al. (2018); Imamura and Sumita (2018) also confirm that better translation quality can be obtained when backtranslating with such sampling and offer some refinements. Caswell et al. (2019) argue that the noise introduced by this type of stochastic search flags to the model that it is backtranslated data, something that can also be accomplished with an explicit special token, to the same effect.
Currey et al. (2017) show that in low resource conditions simple copying of target side data to the source side also generates beneficial training data. Fadaee and Monz (2018) see gains with synthetic data generated by forward-translation (also called self-training). They also report gains when subsampling backtranslation data to favor rare or difficult to generate words (words with high loss during training).

Dual Learning

He et al. (2016) use monolingual data in a dual learning setup. Machine translation engines are trained in both directions, and in addition to regular model training from parallel data, monolingual data is translated in a round trip (e to f to e) and evaluated with a language model for language f and reconstruction match back to e as cost function to drive gradient descent updates to the model. Tu et al. (2017) augment the translation model with a reconstruction step. The generated output is translated back into the input language and the training objective is extended to not only include the likelihood of the target sentence but also the likelihood to the reconstructed input sentence. Niu et al. (2018) simultaneously train a model in both translation directions (with the identity of the source language indicated by marker token. Niu et al. (2019) extend this work to roundtrip translation training on monolingual data, allowing the forward translation and the reconstruction step to operate on the same model. They use Gumbel softmax to make the roundtrip differentiable.

Unsupervised Machine Translation

The idea of backtranslation is also crucial for the ambitious goal of unsupervised machine translation, i.e., the training of machine translation systems with monolingual data only. These methods typically start with multilingual word embeddings, which may also be induced from monolingual data. Given such a word translation model, Lample et al. (2018) propose to translate sentences in one language with a simple word-by-word translation model into another language, using a shared encoder and decoder for both languages involved. They define three different objectives in their setup: the ability to reconstruct a source sentence form its intermediate representation, even with added noise (randomly dropping words), the ability to reconstruct a source sentence from its translation into the target language, and an adversarial component that attempts to classify the identity of the language from intermediate representation of a sentence in either language. Artetxe et al. (2018) use a similar setup, with a shared encoder and language-specific decoder, relying on the idea of a denoising auto-encoder (just like the first objective above), and the ability to reconstruct the source sentence from a translation into the target language. Sun et al. (2019) note that during training of the neural machine translation model, the bilingual word embedding deteriorates. They add the training objective for the induction of the bilingual word embeddings into the objective function of neural machine translation training. Yang et al. (2018) use language-specific encoders with some shared weights in a similar setup. Artetxe et al. (2018) show better results when inducing phrase translations from phrase embeddings and use them in statistical phrase-based machine translation model, which includes an explicit language model. They refine their model with synthetic data generated by iterative backtranslation. Lample et al. (2018) combine unsupervised statistical and neural machine translation models. Their phrase-based model is initialized with word translations obtained from multilingual word embeddings and then iteratively refined into phrase translations. Ren et al. (2019) more closely tie together training of unsupervised statistical and neural machine translation systems by using the statistical machine translation model as a regularizer for the neural model training. Artetxe et al. (2019) improve their unsupervised statistical machine translation model with a feature that favors similarly spelled translations and a unsupervised method to tune the weights for the statistical components. Circling back to bilingual lexicon induction, Artetxe et al. (2019) use such an unsupervised machine translation model to synthesize a parallel corpus by translating monolingual data, process it with word alignment methods, and extract a bilingual dictionary using maximum likelihood estimation.



Related Topics

New Publications

Unsupervised machine translation

  • Lample and Conneau (2019)


  • Graça et al. (2019)
  • Domingo and Casacuberta (2018)
  • Prabhumoye et al. (2018)


  • Luo et al. (2019)
  • Pourdamghani et al. (2019)
  • Xia et al. (2019)
  • Marie and Fujita (2019)
  • Wu et al. (2019)
  • Wang et al. (2017)
  • Shen et al. (2017)