Projects

Improved Extraction Heuristics For Hierarchical Phrase Based Models

Project leader: Hieu Hoang

Desirable skills for participants: hierarchical MT, C++ or Java

Hierarchical phrase-based models has only be proven to consistently outperform (standard) phrase-based models for Chinese-English systems. This is one of the project which aims to rectify this. The part of the problem may be the simplicity of extraction algorithm, which is based on heuristics describe by (Chiang, 2005). While these heuristics are efficient and easy to implement, they may be missing rules that are useful to the translation model. This project aims to improve the translation model by expanding the extraction heuristics in a language-independent manner. For example, adding feature functions to model distortion, penalising non-terminals on the edge of rules, allowing consecutive non-terminals and more than 2 non-terminals. This is a insightful project for students who want to learn about the details of the hierarchical translation model.