Search Descriptions


Neural machine Translation

Statistical Machine Translation

Search Publications





Generative Syntax-Based Models

Instead of learning syntactic rules from parallel corpora that have been word-aligned by other means, generative models may be used to integrate grammar induction and word alignment.

Generative Syntax Models is the main subject of 15 publications. 8 are discussed here.


Inspired by the IBM models, Yamada and Knight (2001) presents a generative tree-based model that is trained using the EM algorithm, thus aligning the words in the parallel corpus while extracting syntactic transfer rules. Syntax trees are provided by automatically parsing the English side of the corpus in a pre-processing step. They also present a chart parsing algorithm for their model (Yamada and Knight, 2002). This model allows the integration of a syntactic language model (Charniak et al., 2003). Gildea (2003) introduce a clone operation to the model and extend it to dependency trees (Gildea, 2004).
Relaxing the isomorphism between input and output trees leads to the idea of quasi-synchronous grammars (QG), which have shown to produce better word alignment quality than IBM models, but not symmetrized IBM models (Smith and Eisner, 2006). A similar relaxation is allowing multiple neighboring head nodes in the rules (Zhang et al., 2008; Zhang et al., 2008b).



Related Topics

New Publications

  • Graehl et al. (2008)
  • Cohn and Blunsom (2009)
  • May et al. (2010)
  • Wu (1995)
  • Neubig et al. (2011)
  • Gimpel and Smith (2011)
  • Yamada (2002)