# Training Step 6: Score Phrases

Subsequently, a translation table is created from the stored phrase translation pairs. The two steps are separated, because for larger translation models, the phrase translation table does not fit into memory. Fortunately, we never have to store the phrase translation table into memory --- we can construct it on disk.

To estimate the phrase translation probability φ(e|f) we proceed as follows: First, the extract file is sorted. This ensures that all English phrase translations for an foreign phrase are next to each other in the file. Thus, we can process the file, one foreign phrase at a time, collect counts and compute φ(e|f) for that foreign phrase f. To estimate φ(f|e), the inverted file is sorted, and then φ(f|e) is estimated for an English phrase at a time.

Next to phrase translation probability distributions φ(f|e) and φ(e|f), additional phrase translation scoring functions can be computed, e.g. lexical weighting, word penalty, phrase penalty, etc. Currently, lexical weighting is added for both directions and a fifth score is the phrase penalty.

``` > grep '| in europe |' model/phrase-table | sort -nrk 7 -t\| | head
in europa ||| in europe ||| 0.829007 0.207955 0.801493 0.492402
europas ||| in europe ||| 0.0251019 0.066211 0.0342506 0.0079563
in der europaeischen union ||| in europe ||| 0.018451 0.00100126 0.0319584 0.0196869
in europa , ||| in europe ||| 0.011371 0.207955 0.207843 0.492402
europaeischen ||| in europe ||| 0.00686548 0.0754338 0.000863791 0.046128
im europaeischen ||| in europe ||| 0.00579275 0.00914601 0.0241287 0.0162482
fuer europa ||| in europe ||| 0.00493456 0.0132369 0.0372168 0.0511473
in europa zu ||| in europe ||| 0.00429092 0.207955 0.714286 0.492402
an europa ||| in europe ||| 0.00386183 0.0114416 0.352941 0.118441
der europaeischen ||| in europe ||| 0.00343274 0.00141532 0.00099583 0.000512159
```

Currently, four different phrase translation scores are computed:

1. inverse phrase translation probability φ(f|e)
2. inverse lexical weighting lex(f|e)
3. direct phrase translation probability φ(e|f)
4. direct lexical weighting lex(e|f)

Previously, there was another score:

1. phrase penalty (always exp(1) = 2.718)

This has now been superceded by it's own feature function, `PhrasePenalty`.

### Using a subset of scores

You may not want to use all the scores in your translation table. The following options allow you to remove some of the scores:

• `NoLex` -- do not use lexical scores (removes score 2 and 4)
• `OnlyDirect` -- do not use the inverse scores (removes score 1 and 2)

These settings have to be specified with the setting `-score-options` when calling the script `train-model.perl`, for instance:

``` train-model.perl [... other settings ...] -score-options '--NoLex'
```

NB - the consolidate program (that runs after score) also has a few arguments. For example, it has

• `PhraseCount` -- add the old phrase count feature (score 5)

However, this can't be set with by the train-model.perl script.

### Good Turing discounting

Singleton phrase pairs tend to have overestimated phrase translation probabilities. Consider the extreme case of a source phrase that occurs only once in the corpus and has only one translation. The corresponding phrase translation probability φ(e|f) would be 1.

To obtain better phrase translation probabilities, the observed counts may be reduced by expected counts which takes unobserved events into account. Borrowing a method from language model estimation, Good Turing discounting can be used to reduce the actual counts (such as 1 in the example above) and reduce it to a more realistic number (maybe 0.3). The value of the adjusted count is determined by an analysis of the number of singleton, twice-occuring, thrice-occuring, etc. phrase pairs that were extracted.

To use Good Turing discounting of the phrase translation probabilities, you have to specify `--GoodTuring` as one of the `-score-options`, as in the section above. The adjusted counts are reported to STDERR.

### Word-to-word alignment

An enhanced version of the scoring script outputs the word-to-word alignments between f and e as they are in the files (`extract` and `extract.inv`) generated in the previous training step "Extract Phrases".

The alignments information are reported in the fourth fields. The format is identical to the alignment output obtained when the GIZA++ output has been symmetrized priot to phrase extraction.

``` > grep '| in europe |' model/phrase-table | sort -nrk 7 -t\| | head
in europa ||| in europe ||| 0.829007 0.207955 ||| 0-0 1-1 ||| ...
europas ||| in europe ||| ... ||| 0-0 0-1 ||| ...
in der europaeischen union ||| in europe ||| ... ||| 0-0 2-1 3-1 |||
in europa , ||| in europe ||| ... ||| 0-0 1-1 ||| ...
europaeischen ||| in europe ||| ... ||| 0-1 ||| ...
im europaeischen ||| in europe ||| ... ||| 0-0 1-1 |||
```

For instance:

``` in der europaeischen union ||| in europe ||| 0-0 2-1 3-1 ||| ...
```

means

``` German        -> English
in            -> in
der           ->
europaeischen -> europe
union         -> europe
```

The word-to-word alignments come from one word alignment (see training step "Align words").

The alignment information is also used in SCFG-rules for the chart-decoder to link non-terminals together in the source and target side. In this instance, the alignment information is not an option, but a necessity. For example, the following Moses SCFG rule

```   [X][X] miss [X][X] [X] ||| [X][X] [X][X] manques [X] ||| ... ||| 0-1 2-0 ||| ...
```

is formated as this in the Hiero format:

```    [X] ||| [X,1] miss [X,2] ||| [X,2] [X,1] manques ||| ....
```

ie. this rule reordes the 1st and 3rd non-terminals in the source.

Therefore, the same alignment field can be used for word-alignment and non-terminal co-indexes. However, I'm (Hieu) sure if anyone has implemented this in the chart decoder yet

### Columns in the phrase-table

There is a maximum of 7 columns in the phrase table:

```   1. Source phrase
2. Target phrase
3. Scores
4. Alignment
5. Counts
6. Sparse feature scores
7. Key-value properties
```