Projects

Features To Model Word Span Of Non Terminals

WIKI is stupid. We're using google doc

   https://docs.google.com/document/d/1YzGLQCOx9fy__nACOsq4qV-sf6QjFhUPKcqR8gVIKEE/edit

Project leader: Hieu Hoang

Desirable skills for participants: hierarchical MT, C++

Create features and/or hard constraints that model the number of words that non-terminals in translation rules span In the hierarchical phrase-base translation model, non-terminal in a translation model can span a number of words. This project aims to improve reordering by modelling the distribution of the number of words spanned by each non-terminal in each translation rule. This distribution can then be used as feature functions or to constrain where particular rules can be applied. The hierarchical extract program in Moses has partially implemented this project. This project will be of interest to those who want an empirical understanding of the hierarchical translation model & reordering, which could lead to other avenues of research.