EM Training of Phrase Based Models
While the dominant method to obtain a phrase table is based on a given word alignment, we may also define a probabilistic model that explains the parallel corpus and then train this model directly with the expectation maximization (EM) algorithm.
Phrase Based Model EM is the main subject of 10 publications. 9 are discussed here.
The popular joint phrase model was proposed by Marcu and Wong (2002)
. The joint model may be improved by constraining it with alignment points from the intersection of IBM Model alignments (Birch et al., 2006
; Birch et al., 2006b)
or by not strictly requiring a unique phrase alignment (Moore and Quirk, 2007)
DeNero et al. (2006)
point to some problems when using EM training with conditional probabilities. Cherry and Lin (2007)
show that the ITG constraint helps the joint phrase model approach, partly by enabling a faster algorithm with less search errors. The phrase alignment problem is NP-complete (DeNero and Klein, 2008)
Wuebker et al. (2010)
use leave-one-out to overcome the problem of over-fitting when re-aligning the training data with a model that was obtained from it. They use the obtained alignment to re-estimate translation probabilities (similar to one iteration of EM). Using only the best derivation from the forced alignment drastically reduces the size of the phrase table but hurts performance (Sanchis-Trilles et al., 2011)
- Mylonakis and Sima'an (2008)