Search Descriptions


Neural machine Translation

Statistical Machine Translation

Search Publications





IBM Models

The IBM Models are a sequence of models with increasing complexity, starting with lexical translation probabilities, adding models for reordering and word duplication.

IBM Models is the main subject of 45 publications. 28 are discussed here.


The IBM models are described in detail by Brown et al. (1993), who originally presented the statistical machine translation approach in earlier papers (Brown et al., 1988; Brown et al., 1990). See also the introductions by Knight (1997); Knight (1999).
During a 1999 Johns Hopkins University Workshop, the IBM models were implemented in a toolkit called GIZA (Al-Onaizan et al., 1999), later refined into GIZA++ by Och and Ney (2000). GIZA++ is open source and widely used. The estimation of the bilingual word classes is described by Och (1999).
Instead of hill-climbing to the Viterbi alignment, algorithms such as Estimation of Distributions may be employed (Rodríguez et al., 2006). The stochastic modelling approach for translation is described by Ney (2001).
A variation on the IBM models is the HMM model which uses relative distortion but not fertility (Vogel et al., 1996). This model was extended by treating jumps to other source words differently from repeated translations of the same source word (Toutanova et al., 2002), and conditioning jumps on the source word (He, 2007).
IBM models have been extended using maximum entropy models (Foster, 2000) to include position (Foster, 2000), part-of-speech tag information (Kim et al., 2000), even in the EM training algorithm (García-Varea et al., 2002; García-Varea et al., 2002b). Improvements have also been obtained by adding bilingual dictionaries (Wu and Wang, 2004) and context vectors estimated from monolingual corpora (Wang and Zhou, 2004), lemmatizing words (Dejean et al., 2003; Popovic and Ney, 2004; Pianta and Bentivogli, 2004), interpolating lemma and word aligment models (Zhang and Sumita, 2007), as well as smoothing (Moore, 2004). Mixture models for word translation probabilities have been explored to automatically learn topic-dependent translation models (Zhao and Xing, 2006; Civera and Juan, 2006). Packing words that typically occur in many-to-one alignments into a single token may improve alignment quality (Ma et al., 2007).



Related Topics

New Publications

  • Eyigöz et al. (2013)
  • Schulz and Aziz (2016)
  • UNKNOWN CITATION 'simion-collins-stein:2015:EMNLP'
  • Dyer et al. (2013)
  • Gal and Blunsom (2013)
  • Simion et al. (2013)
  • Simion et al. (2014)
  • Schoenemann (2013)
  • Gelling and Cohn (2014)
  • Vaswani et al. (2012)
  • Riley and Gildea (2012)
  • Ravi and Knight (2010)
  • Samuelsson (2012)
  • Brunning et al. (2009)
  • Gao et al. (2010)
  • Schoenemann (2010)
  • Toutanova and Galley (2011)
  • Lopez and Resnik (2005)