Projects After 6 Months

PortS mtplz to Joshua

Joshua now includes a working phrase-based decoder that was included in its 6.0 release on Feb. 2, 2015. It currently uses just a bare-bones unlexicalized distortion model for reordering, but it is planned to continue to add features to it. It uses a hypergraph structure that is shared with Joshua hierarchical decoder, and thus has access to much of the Joshua decoder, such as its sparse feature implementation. It is about as fast as Moses.

Twitter Crowd Translation

In October, the project leader launched the project under the name In November the project leader went to London TC 36 conference to present this paper.

MT with NLTK: Yet Another MT Decoder

In conjunction with the NLTK developers, the project participants built a monotone decoder prototype during MTM14. They are still working on it but a working version is already available in the translate fork: . The next goal is to incorporate an end to end MT solution through the NLTK API, (see and Also, the current development of MT in NLTK will be presented at PyCon

OxLM: Extend neural language models by conditioning on source sentences

Extending OxLM with source conditioning was fully accomplished within ~1 month after the MT marathon. The code is now available in OxLM master here. This work will be the core topic of the project leader master thesis.

Features for Syntax-based Moses

This project was broad in scope and comprised several subtasks. The status of the subtasks differ. Overall, the project is under reasonably active development even now. Some subtasks have been accomplished within the Marathon week, some after the Marathon week, and some will hopefully be continued in the future. Based on the work and the experience gained at MT Marathon, research on syntax-based SMT with Moses has been continued at the participant institutions. The project repository is basically Moses on GitHub ( Some of the project code is already in Master, some other code may still be added later.

Here's a breakdown of subtasks of our project (and their respective status):

  • span length (accomplished within the Marathon week)
  • phrase orientation (under reasonably active development even now)
  • n-best tree output, MIRA/MERT with head-word chain metric (accomplished within the Marathon week)
  • parse trees with semantic information, bilingual tree alignment (under reasonably active development even now)
  • long distance agreement (dormant)
  • a dependency-based reordering model (under reasonably active development even now)

Decoding with sampling and nonlocal features

The project leader is still working at the project targeting a publication in the next months.

Domain adaptation via biased sampling

The topic of this project has become a core component of the EU project: Modern Machine Translation.