Training Step 7: Build reordering model

By default, only a distance-based reordering model is included in final configuration. This model gives a cost linear to the reordering distance. For instance, skipping over two words costs twice as much as skipping over one word.

However, additional conditional reordering models, so called lexicalized reordering models, may be build. There are three types of lexicalized reordering models in Moses that are based on Koehn et al. (2005) and Galley and Manning (2008). The Koehn at al. model determines the orientation of two phrases based on word alignments at training time, and based on phrase alignments at decoding time. The other two models are based on Galley and Manning. The phrase-based model uses phrases both at training and decoding time, and the hierarchical model allows combinations of several phrases for determining the orientation.

The lexicalized reordering models are specified by a configuration string, containing five parts, that account for different aspects:

Modeltype - the type of model used (see above)
- wbe - word-based extraction (but phrase-based at decoding). This is the original model in Moses. DEFAULT
- phrase - phrase-based model
- hier - hierarchical model
Orientation - Which classes of orientations that are used in the model
- mslr - Considers four different orientations: monotone, swap, discontinuous-left, discontinuous-right
- msd - Considers three different orientations: monotone, swap, discontinuous (the two discontinuous classes of the mslr model are merged into one class)
- monotonicity - Considers two different orientations: monotone or non-monotone (swap and discontinuous of the msd model are merged into the non-monotone class)
- leftright - Considers two different orientations: left or right (the four classes in the mslr model are merged into two classes, swap and discontinuous-left into left and monotone and discontinuous-right into right)
Directionality - Determines if the orientation should be modeled based on the previous or next phrase, or both.
- backward - determine orientation with respect to previous phrase DEFAULT
- forward - determine orientation with respect to following phrase
- bidirectional - use both backward and forward models
language - decides which language to base the model on
- fe - conditioned on both the source and target languages
- f - conditioned on the source language only
collapsing - determines how to treat the scores
- allff - treat the scores as individual feature functions DEFAULT
- collapseff - collapse all scores in one direction into one feature function

any possible configuration of these five factors is allowed. It is always necessary to specify orientation and language. The other three factors use the default values indicated above if they are not specified. Some examples of possible models are:

msd-bidirectional-fe (this model is commonly used, for instance it is the model used in the WMT baselines)
wbe-msd-bidirectional-fe-allff same model as above
mslr-f
wbe-backward-mslr-f-allff same model as above
phrase-msd-bidirectional-fe
hier-mslr-bidirectional-fe
hier-leftright-forward-f-collapseff

and of course distance.

Which reordering model(s) that are used (and built during the training process, if necessary) can be set with the switch -reordering, e.g.:

 -reordering distance
 -reordering msd-bidirectional-fe
 -reordering msd-bidirectional-fe,hier-mslr-bidirectional-fe
 -reordering distance,msd-bidirectional-fe,hier-mslr-bidirectional-fe

Note that the distance model is always included, so there is no need to specify it.

The number of features that are created with a lexical reordering model depends on the type of the model. If the flag allff is used, a msd model has three features, one each for the probability that the phrase is translated monotone, swapped, or discontinuous, a mslr model has four features and a monotonicity or leftright model has two features. If a bidirectional model is used, then the number of features doubles - one for each direction. If collapseff are used there is one feature for each direction, regardless of which orientation types that are used.

There are also a number of other flags that can be given to train-model.perl that concerns the reordering models:

--reordering-smooth - specifies the smoothing constant to be used for training lexicalized reordering models. If the letter u follows the constant, smoothing is based on actual counts. (default 0.5)
--max-lexical-reordering - if this flag is used, the extract file will contain information for the mslr orientations for all three model types, wbe, phrase and hier. Otherwise the extract file will contain the minimum information that is needed based on which reordering model config strings that are given.