GHKM Grammar Extraction

Currently Moses provides a synchronous grammar extraction method is an extension of the hierarchical grammar extraction. But possibly for syntax-based models other trade-offs between coverage and model sizes have to be made. Galley et al. already proposed in 2004 a grammar extraction method that extracts only minimal rules, thus not creating too many grammar rules while ensuring coverage of the training data. It should be quite straight-forward to implement this method and test it with tree-based decoders such as Joshua and Moses. This could be implemented in any fast programming language.

Online resources
Michel Galley, Mark Hopkins, Kevin Knight, and Daniel Marcu. What's in a Translation Rule?. Proceedings of HLT/NAACL 2004 : Basic idea suggesting the 2 algorithms for the rule extraction

Scalable Inference and Training of. Context-Rich Syntactic Translation Models. Michel Galley*, Jonathan Graehl†, Kevin Knight†‡, Daniel Marcu†‡ : Further work, with explanation for unaligned words.

Moses syntax tutorial

http://www.proceedings2006.imcsit.org/pliks/199.pdf An Algorithm for Extracting Translation Rules from Scarce Bilingual Corpora

Ptolemaios code that may be useful: svn co http://bramaputra.ling.uni-potsdam.de/svn/projekte/shared/Ptolemaios/branches/ghkm
see particularly class ptolemaios.apps.util.ghkm.core.GHKMTreeNode

svn repository

Page last modified on January 27, 2010, at 05:49 PM