Hierarchical re-ordering model

The goal of the project is to implement in Moses the simple and effective hierarchical phrase reordering model by Galley and Manning (2008). Lexicalised reordering models as in Moses distinguish three orientations of successively translated phrases: M(onotone), S(wap), D(discontinuous). These models are adequate to model single-phrase movements, but not swaps involving more phrases. Moreover, Moses implements a "word-based" orientation model, which is then used for phrases. While Tillmann (2004) developed a more consistent phrase-based orientation model, Galley and Manning recently proposed a "hierarchical model". The implementation requires revisiting the training of the lexicalised re-ordering model and to integrate the usual search algorithm with a shift-reduce process, in order to keep track of contiguity of covered source phrases. This operation does not increase complexity of decoding. Experimental results by Galley and Manning (2008) reported consistent improvements over the word-based (Moses) and phrase-based (IBM) models on Arabic-English and Chinese-English. Our goal is to fully implement the hierarchical model and to test it on the German-English WMT task. Programming languages: C++ for the training/decoding part, Perl for data processing and creation of case studies.

Some of the foreseen activities:

  1. Training. The HRM has to be trained during the phrase extraction process. To start with we have a version of code kindly provided by Arun Abhishek. Re-ordering features to be computed should be exactly the same of the conventional word-based reordering model. The only difference should be the definition of the "events" that trigger the different kinds of orientations.
  2. Decoding. We need to integrate a shift-reduce process in the decoder to find phrase sequences that form larger blocks. We need to implement the stack object and the shift & reduce methods. As the stack becomes part of the state of each hypothesis, we have to measure its impact on the size of the search space.
    The actual implementation should probably be done as an extension of the existing lexical reordering models in LexicalReordering.cpp. This code could benefit from a general review, however. All information required from the previous hypothesis when expanding a new one, including the stack, should be encapsulated in a FFState feature function state object. The old implementation of lexical reordering violates this principle.
  3. Case studies. In order to evaluate the approach we need to identify in advance a bunch of sentences of the test set that could potentially benefit from the HRM. In particular, sentences whose translation with Moses shows good lexical coverage but poor word reordering.

Experimental Results

Schedule of meetings


  • Marcello Federico (FBK-irst Italy)
  • Nadi Tomeh (LIMSI/CNRS France) (sf: naditomeh -added),(email: nadi.tomeh@gmail.com)
  • Ankit Srivastava (CNGL Ireland) (sf: ankitks - added),(email: ankitks@gmail.com)
  • Gabriele A. Musillo (FBK-irst Italy, sf:musillo - added, email:musillo@fbk.eu)
  • Sara Stymne (Linkoeping U, Sweden) (sf:sarst - added, email:sarst@ida.liu.se)
  • Christian Hardmeier (FBK-irst Italy)


Corpus resources

  • A 81141 sentence German-English news corpus (WMT09) with symmetrized alignments is available on http://www.ida.liu.se/~sarst/mtm4/. Now you can also find two sets of testfiles at the same page. We use 2009b to start with.
  • You can find an SMT system trained on the complete data provided by Sara at http://maigret.rax.ch/mtm4/de-en.80k.tar.gz (around 475 MB). Note that the included moses.ini file contains relative paths to the models that you may need to adjust.

Thoughts about implementing the decoding part

Let the LexicalReordering class handle the model setup. Put the actual code to determine orientation into subclasses of FFState, which

  • encapsulate the necessary information about the previous hypothesis (the stack for hierarchical reordering, the range of the previous phrase for the classical variant).
  • provide a method that takes info about the current hypothesis and returns an orientation and a new state. This would replace the current GetOrientationType methods of the subclasses of LexicalReordering (which are not needed any more).

Thoughts about implementing the training part

Training impacts on finding the forward and backward orientation of phrases considered by phrase-extract. The simples approach to determine back-ward re-ordering with the HRM is to relax the maximum length constraint on phrases from k to the length of the target n. This will increase time cost from O(n k2) to O(n2 k)! Practically. Concerning the back-ward reordering we need to go through all the created phrases for the sentence. (This is not how it is currently done, is it???)

Model types

The model types, their combinations, and thus also the config-string is updated. The config string has five parts:
(ex: hier-msd-bidirectional-fe-allff)

The parts have the following options:

  • modeltype:
    • hier - hierarchical model
    • phrase - phrase-based model
    • wbe - word-based extraction (but phrase-based at decoding). This is the current model in Moses. DEFAULT
  • orientations
    • mslr - monotone, swap, discontinuous-left, discontinuous-right
    • msd - monotone, swap, discontinuous
    • monotonicity - monotone or non-monotone
    • leftright - left or right
  • directionality
    • backward - determine orientation with respect to previous phrase DEFAULT
    • forward - determine orientation with respect to following phrase
    • bidirectional - use both backward and forward models
  • language
    • fe - conditioned on both languages
    • f - conditioned on the source language
  • collapsing
    • allff - treat the weights as individual feature functions DEFAULT
    • collapseff - collapse all scores in one direction into one feature funtion


  • Sourceforge accounts (Christian)
  • German-English light-weight baseline (Christian)
  • Case study (Gabriele):

State-of-the-art BLEU performance has been reported in (Galley & Manning, 2008) using a hierarchical reordering model. However, BLEU scores alone are not discriminative enough for comparing reordering models as BLEU only considers local reordering in n-grams. Our understanding of reordering performance is therefore quite limited. To clarify this issue, metrics and test data sets specifically designed to assess reordering phenomena will be needed, following the recent work in (Birch et al., 2009). This is a challenging research problem and it will be solved in future work.

  • Training algorithm (Gabriele, Nadi) - seems to be working. Now has mslr, msd and monotonicity orientations for three types of models: wbe, phrase-based and hierarchical
  • Decoding (Sara, Christian, Ankit) - seems to be working
  • Scoring (Sara) - Modified train-factored-phrase-model.perl, and re-wrote the socring in C++. Seems to be working.
  • Better handling of phrase pairs not in reordering table.
Page last modified on February 18, 2010, at 12:23 PM