In the phrase extraction step, all phrases are dumped into one big file. Here is the top of that file:
> head model/extract wiederaufnahme ||| resumption ||| 0-0 wiederaufnahme der ||| resumption of the ||| 0-0 1-1 1-2 wiederaufnahme der sitzungsperiode ||| resumption of the session ||| 0-0 1-1 1-2 2-3 der ||| of the ||| 0-0 0-1 der sitzungsperiode ||| of the session ||| 0-0 0-1 1-2 sitzungsperiode ||| session ||| 0-0 ich ||| i ||| 0-0 ich erklaere ||| i declare ||| 0-0 1-1 erklaere ||| declare ||| 0-0 sitzungsperiode ||| session ||| 0-0
The content of this file is for each line: foreign phrase, English phrase,
and alignment points. Alignment points are pairs (foreign,english).
Also, an inverted alignment file extract.inv
is generated, and if the lexicalized reordering model is trained (default), a reordering file extract.o
.