Search Descriptions


Neural machine Translation

Statistical Machine Translation

Search Publications





Hierarchical Models

While still not including phrase structure labels, hierarchical models extend inversion transduction grammars by allowing a mix of words and non-terminal symbols in grammar rules. These models are typically learned from word aligned data, just as phrase-based models.

Hierarchical Models is the main subject of 60 publications. 13 are discussed here.


Chiang (2005); Chiang (2007) combines the ideas of phrase-based models and tree structure and proposes an efficient decoding method based on chart parsing. His hierarchical phrase-based model Hiero (Chiang et al., 2005) makes no use of explicit annotation. Hierarchical rules may be extracted from a sentence pairs in linear time (Zhang et al., 2008).
Zollmann et al. (2008) discuss some of the properties of such models and their relationship to phrase-based models. Hoang et al. (2009) point out the similarities of these and syntax-based models and Hopkins et al. (2011) provide a formal framework for rule extraction algorithms for all of them.
Watanabe et al. (2006) propose a model somewhat between traditional phrase models and hierarchical models, which allow for discontinuities in source phrases, but follow a tradition left-to-right search algorithm. They show competitive performance with phrase-based models (Watanabe et al., 2006; Watanabe et al., 2006b). This is confirmed by findings by Galley and Manning (2010) who also use left-to-right decoding with gappy phrases, showing improvements over phrase-based and hierarchical baselines with discontinuous source phrases and to a lesser degree with discontinuous target phrases.
Another adaptation of the hierarchical model approach only allows function words (or the most frequent words in the corpus) to occur in rules with nonterminals on the right hand side, which allows a lexicalized but still compact grammar Setiawan et al. (2007). As with phrase-based models, instead of setting rule application probability by maximum likelihood estimation, we may train classifiers to include additional features (Subotin, 2008).



Related Topics

New Publications

  • Kaeshammer (2015)
  • Wenniger and Sima'an (2015)
  • Zhang et al. (2015)
  • Siahbani and Sarkar (2017)
  • He et al. (2015)
  • Stanojević and Sima'an (2015)
  • Kamigaito et al. (2015)
  • Wang et al. (2015)
  • Karimova et al. (2014)
  • Sankaran et al. (2013)
  • Cao et al. (2014)
  • Cao et al. (2014)
  • Xiao et al. (2014)
  • Saluja et al. (2014)
  • Nguyen and Vogel (2013)
  • Khan et al. (2013)
  • Zhang et al. (2014)
  • Sankaran and Sarkar (2014)
  • Siahbani and Sarkar (2014)
  • Sankaran and Sarkar (2014)
  • Siahbani and Sarkar (2014)
  • Peitz et al. (2014)
  • Graham (2013)
  • Wenniger and Sima'an (2013)
  • Wenniger and Sima'an (2013)
  • Shi et al. (2011)
  • Wei and Xu (2011)
  • He et al. (2010)
  • Almaghout et al. (2012)
  • Sankaran and Sarkar (2012)
  • Bod (2007)
  • Auli et al. (2009)
  • Li et al. (2012)
  • Li et al. (2012)
  • Gispert et al. (2010)
  • Sankaran et al. (2012)
  • Gispert et al. (2010)
  • Huang et al. (2010)
  • Hayashi et al. (2010)
  • Yang and Zheng (2009)
  • Almaghout et al. (2010)
  • Heger et al. (2010)
  • Vilar et al. (2010)
  • Cmejrek and Zhou (2010)
  • He et al. (2010)
  • Setiawan and Resnik (2010)
  • Gao and Vogel (2011)
  • Bansal et al. (2011)
  • Simard et al. (2005)