Statistical Example Based Mt

Project leader: Chris Dyer

Desirable skills for participants: Python/Cython, C++, machine learning, cdec

When translating the word "bank" into French, when should "banc" (i.e., the financial institution) be used rather than "rive" (i.e., the land next to a river)? Source language context is a rich source of information that can help answer this question. However, standard translation do not model this dependency directly. The purpose of this project is to incorporate source-contextual information into a (hierarchical) phrase-based translation model using the open source Lopez suffix array grammar/phrase table extractor. We will 1) improve the performance and flexibility of the suffix array extractor, 2) develop new contextual features, and 3) run experiments on state-of-the-art MT systems.