Instead of solely relying on an integrated search for the best translation, we may introduce a second decoding pass in which the best translation is chosen from the set of the most likely candidates generated by a traditional decoder. This allows more features or alternate decision rules.
Reranking is the main subject of 12 publications. 8 are discussed here.
Minimum error rate training has been used for re-ranking (Och et al., 2004)
, other proposed methods are based on ordinal regression to separate good translations from bad ones (Shen et al., 2004)
and SPSA (Lambert and Banchs, 2006)
. Duh and Kirchhoff (2008)
use boosting to improve over the log-linear model without any additional features.
Hasan et al. (2007)
examine the required size of n-best lists, both considering Oracle-BLEU and actual re-ranking performance, and see gains with n-best lists of up to 10,000. The use of a log-linear model imposes certain restrictions on features that may be relaxed using other machine learning approaches such as kernel methods or Gaussian mixture models (Nguyen et al., 2007)
See work by Chen et al. (2007)
and Patry et al. (2007)
for features used in re-ranking.
Minimum Bayes Risk decoding is typically done in the form of re-ranking an n-best list or graph. Re-ranking also allows the use of very large language models.
- Carter and Monz (2010)
- Sokolov et al. (2012)
- Velldal and Oepen (2005)
- Duh et al. (2010)