Minimum Bayes Risk

Training and decoding in statistical machine translation operates with imperfect models, so we may not want to aim for the most likely solutions but for the one that carries the least Bayesian risk.

Minimum Bayes Risk is the main subject of 17 publications. 8 are discussed here.

Topics in MachineLearning

Publications

Minimum Bayes risk decoding was introduced initially for n-best-list re-ranking

Shankar Kumar and William Byrne (2004): Minimum Bayes-Risk Decoding for Statistical Machine Translation, Proceedings of the Joint Conference on Human Language Technologies and the Annual Meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL)

(Kumar and Byrne, 2004), and has been shown to be beneficial for many translation tasks

Ehling, Nicola and Zens, Richard and Ney, Hermann (2007): Minimum Bayes Risk Decoding for BLEU, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

(Ehling et al., 2007).

Computing minimum Bayes risk over lattices or hypergraphs (i.e., the full search graph of the decoder) takes advantage of a larger pool of evidence

Tromble, Roy and Kumar, Shankar and Och, Franz and Macherey, Wolfgang (2008): Lattice Minimum Bayes-Risk Decoding for Statistical Machine Translation, Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

(Tromble et al., 2008).

Allauzen, Cyril and Kumar, Shankar and Macherey, Wolfgang and Mohri, Mehryar and Riley, Michael (2010): Expected Sequence Similarity Maximization, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Allauzen et al. (2010) present more efficient algorithms for computing the minimum Bayes risk translation from a lattice.

Optimizing for the expected error, not the actual error, may also be done in parameter tuning

Smith, David A. and Eisner, Jason (2006): Minimum Risk Annealing for Training Log-Linear Models, Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

(Smith and Eisner, 2006).

Related to minimum Bayes risk is the use of n-gram posterior probabilities in re-ranking

Zens, Richard and Ney, Hermann (2006): N-Gram Posterior Probabilities for Statistical Machine Translation, Proceedings on the Workshop on Statistical Machine Translation

(Zens and Ney, 2006;

Vicente Alabau and Alberto Sanchis and Francisco Casacuberta (2007): Using Word Posterior Probabilities in Lattice Translation, Proceedings of the International Workshop on Spoken Language Translation (IWSLT)

mentioned in Research Groups and Minimum Bayes Risk

Alabau et al., 2007;

Vicente Alabau and Alberto Sanchis and Francisco Casacuberta (2007): Improving Speech-to-Speech Translation Using Word Posterior Probabilities, Proceedings of the MT Summit XI

Alabau et al., 2007b).

Benchmarks

Discussion

New Publications

Nan Duan and Mu Li and Ming Zhou and Lei Cui (2011): Improving Phrase Extraction via MBR Phrase Scoring and Pruning, Proceedings of the 13th Machine Translation Summit (MT Summit XIII)
add
@inproceedings{MTS-2011-Duan,
author = {Nan Duan and Mu Li and Ming Zhou and Lei Cui},
title = {Improving Phrase Extraction via {MBR} Phrase Scoring and Pruning},
url = {http://www.mt-archive.info/MTS-2011-Duan.pdf},
pages = {189-197},
booktitle = {Proceedings of the 13th Machine Translation Summit (MT Summit XIII)},
publisher = {International Association for Machine Translation},
location = {Xiamen, China},
year = 2011
}
Duan et al. (2011)
Duh, Kevin and Sudoh, Katsuhito and Wu, Xianchao and Tsukada, Hajime and Nagata, Masaaki (2012): Learning to Translate with Multiple Objectives, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
add
@InProceedings{duh-EtAl:2012:ACL2012,
author = {Duh, Kevin and Sudoh, Katsuhito and Wu, Xianchao and Tsukada, Hajime and Nagata, Masaaki},
title = {Learning to Translate with Multiple Objectives},
booktitle = {Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {July},
address = {Jeju Island, Korea},
publisher = {Association for Computational Linguistics},
pages = {1--10},
url = {http://www.aclweb.org/anthology/P12-1001},
year = 2012
}
Duh et al. (2012)
He, Xiaodong and Deng, Li (2012): Maximum Expected BLEU Training of Phrase and Lexicon Translation Models, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
add
@InProceedings{he-deng:2012:ACL2012,
author = {He, Xiaodong and Deng, Li},
title = {Maximum Expected BLEU Training of Phrase and Lexicon Translation Models},
booktitle = {Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {July},
address = {Jeju Island, Korea},
publisher = {Association for Computational Linguistics},
pages = {292--301},
url = {http://www.aclweb.org/anthology/P12-1031},
year = 2012
}
He and Deng (2012)
Hiroaki Shimizu and Masao Utiyama and Eiichiro Sumita and Satoshi Nakamura (2012): Minimum Bayes-risk decoding extended with similar examples: NAIST-NCT at IWSLT 2012, Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT)
add
@inproceedings{iwslt12:Shimizu,
author = {Hiroaki Shimizu and Masao Utiyama and Eiichiro Sumita and Satoshi Nakamura},
title = {Minimum {Bayes}-risk decoding extended with similar examples: {NAIST-NCT} at {IWSLT} 2012},
url = {http://www.mt-archive.info/IWSLT-2012-Shimizu.pdf},
pages = {117-120},
booktitle = {Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT)},
location = {Hong Kong},
year = 2012
}
Shimizu et al. (2012)
consensus translation
Pauls, Adam and Denero, John and Klein, Dan (2009): Consensus Training for Consensus Decoding in Machine Translation, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{pauls-denero-klein:2009:EMNLP,
author = {Pauls, Adam and Denero, John and Klein, Dan},
title = {Consensus Training for Consensus Decoding in Machine Translation},
booktitle = {Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing},
month = {August},
address = {Singapore},
publisher = {Association for Computational Linguistics},
pages = {1418--1427},
url = {http://www.aclweb.org/anthology/D/D09/D09-1147},
year = 2009
}
Pauls et al. (2009)
Blackwood, Graeme and de Gispert, Adrià and Byrne, William (2010): Fluency Constraints for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices, Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)
add
@InProceedings{blackwood-degispert-byrne:2010:PAPERS,
author = {Blackwood, Graeme and de Gispert, Adri\`{a} and Byrne, William},
title = {Fluency Constraints for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices},
booktitle = {Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)},
month = {August},
address = {Beijing, China},
publisher = {Coling 2010 Organizing Committee},
pages = {71--79},
url = {http://www.aclweb.org/anthology/C10-1009},
year = 2010
}
Blackwood et al. (2010)
Li, Zhifei and Eisner, Jason (2009): First- and Second-Order Expectation Semirings with Applications to Minimum-Risk Training on Translation Forests, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{li-eisner:2009:EMNLP,
author = {Li, Zhifei and Eisner, Jason},
title = {First- and Second-Order Expectation Semirings with Applications to Minimum-Risk Training on Translation Forests},
booktitle = {Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing},
month = {August},
address = {Singapore},
publisher = {Association for Computational Linguistics},
pages = {40--51},
url = {http://www.aclweb.org/anthology/D/D09/D09-1005},
year = 2009
}
Li and Eisner (2009)
Kumar, Shankar and Macherey, Wolfgang and Dyer, Chris and Och, Franz (2009): Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices, Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP
add
@InProceedings{kumar-EtAl:2009:ACLIJCNLP,
author = {Kumar, Shankar and Macherey, Wolfgang and Dyer, Chris and Och, Franz},
title = {Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices},
booktitle = {Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP},
month = {August},
address = {Suntec, Singapore},
publisher = {Association for Computational Linguistics},
pages = {163--171},
url = {http://www.aclweb.org/anthology/P/P09/P09-1019},
year = 2009
}
Kumar et al. (2009)
Li, Zhifei and Eisner, Jason and Khudanpur, Sanjeev (2009): Variational Decoding for Statistical Machine Translation, Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP
add
@InProceedings{li-eisner-khudanpur:2009:ACLIJCNLP,
author = {Li, Zhifei and Eisner, Jason and Khudanpur, Sanjeev},
title = {Variational Decoding for Statistical Machine Translation},
booktitle = {Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP},
month = {August},
address = {Suntec, Singapore},
publisher = {Association for Computational Linguistics},
pages = {593--601},
url = {http://www.aclweb.org/anthology/P/P09/P09-1067},
year = 2009
}
Li et al. (2009)

MT Research Survey Wiki

A Comprehensive Survey of Neural and Statistical Machine Translation Research Publications

Search Descriptions

Minimum Bayes Risk

Publications

Benchmarks

Discussion

Related Topics

New Publications