Minimum Bayes Risk
Training and decoding in statistical machine translation operates with imperfect models, so we may not want to aim for the most likely solutions but for the one that carries the least Bayesian risk.
Minimum Bayes Risk is the main subject of 17 publications. 8 are discussed here.
Publications
Minimum Bayes risk decoding was introduced initially for n-best-list re-ranking
Shankar Kumar and William Byrne (2004):
Minimum Bayes-Risk Decoding for Statistical Machine Translation, Proceedings of the Joint Conference on Human Language Technologies and the Annual Meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL)
@Inproceedings{Kumar:2004,
author = {Shankar Kumar and William Byrne},
title = {Minimum {B}ayes-Risk Decoding for Statistical Machine Translation},
url = {
http://acl.ldc.upenn.edu/hlt-naacl2004/main/pdf/60\_Paper.pdf},
googlescholar = {8960637937936508347},
booktitle = {Proceedings of the Joint Conference on Human Language Technologies and the Annual Meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL)},
year = 2004
}
(Kumar and Byrne, 2004), and has been shown to be beneficial for many translation tasks
Ehling, Nicola and Zens, Richard and Ney, Hermann (2007):
Minimum Bayes Risk Decoding for BLEU, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions
@InProceedings{ehling-zens-ney:2007:PosterDemo,
author = {Ehling, Nicola and Zens, Richard and Ney, Hermann},
title = {Minimum {B}ayes Risk Decoding for {BLEU}},
booktitle = {Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions},
month = {June},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {101--104},
url = {
http://www.aclweb.org/anthology/P/P07/P07-2026},
year = 2007
}
(Ehling et al., 2007).
Computing minimum Bayes risk over lattices or hypergraphs (i.e., the full search graph of the decoder) takes advantage of a larger pool of evidence
Tromble, Roy and Kumar, Shankar and Och, Franz and Macherey, Wolfgang (2008):
Lattice Minimum Bayes-Risk Decoding for Statistical Machine Translation, Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing
@InProceedings{tromble-EtAl:2008:EMNLP,
author = {Tromble, Roy and Kumar, Shankar and Och, Franz and Macherey, Wolfgang},
title = {Lattice {Minimum Bayes-Risk} Decoding for Statistical Machine Translation},
booktitle = {Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing},
month = {October},
address = {Honolulu, Hawaii},
publisher = {Association for Computational Linguistics},
pages = {620--629},
url = {
http://www.aclweb.org/anthology/D08-1065},
year = 2008
}
(Tromble et al., 2008).
Allauzen, Cyril and Kumar, Shankar and Macherey, Wolfgang and Mohri, Mehryar and Riley, Michael (2010):
Expected Sequence Similarity Maximization, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
@InProceedings{allauzen-EtAl:2010:NAACLHLT,
author = {Allauzen, Cyril and Kumar, Shankar and Macherey, Wolfgang and Mohri, Mehryar and Riley, Michael},
title = {Expected Sequence Similarity Maximization},
booktitle = {Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics},
month = {June},
address = {Los Angeles, California},
publisher = {Association for Computational Linguistics},
pages = {957--965},
url = {
http://www.aclweb.org/anthology/N10-1139},
year = 2010
}
Allauzen et al. (2010) present more efficient algorithms for computing the minimum Bayes risk translation from a lattice.
Optimizing for the expected error, not the actual error, may also be done in parameter tuning
Smith, David A. and Eisner, Jason (2006):
Minimum Risk Annealing for Training Log-Linear Models, Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions
@InProceedings{smith-eisner:2006:POS,
author = {Smith, David A. and Eisner, Jason},
title = {Minimum Risk Annealing for Training Log-Linear Models},
booktitle = {Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions},
month = {July},
address = {Sydney, Australia},
publisher = {Association for Computational Linguistics},
pages = {787--794},
url = {
http://www.aclweb.org/anthology/P/P06/P06-2101},
year = 2006
}
(Smith and Eisner, 2006).
Related to minimum Bayes risk is the use of n-gram posterior probabilities in re-ranking
Zens, Richard and Ney, Hermann (2006):
N-Gram Posterior Probabilities for Statistical Machine Translation, Proceedings on the Workshop on Statistical Machine Translation
@InProceedings{zens-ney:2006:WMT2,
author = {Zens, Richard and Ney, Hermann},
title = {N-Gram Posterior Probabilities for Statistical Machine Translation},
booktitle = {Proceedings on the Workshop on Statistical Machine Translation},
month = {June},
address = {New York City},
publisher = {Association for Computational Linguistics},
pages = {72--77},
url = {
http://www.aclweb.org/anthology/W/W06/W06-3110},
year = 2006
}
(Zens and Ney, 2006;
Vicente Alabau and Alberto Sanchis and Francisco Casacuberta (2007):
Using Word Posterior Probabilities in Lattice Translation, Proceedings of the International Workshop on Spoken Language Translation (IWSLT)
mentioned in Research Groups and Minimum Bayes Risk@inproceedings{Alabau:2007:IWSLT,
author = {Vicente Alabau and Alberto Sanchis and Francisco Casacuberta},
title = {Using Word Posterior Probabilities in Lattice Translation},
url = {
http://20.210-193-52.unknown.qala.com.sg/archive/iwslt\_07/papers/slt7\_131.pdf},
googlescholar = {12689526118201797518},
booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)},
year = 2007
}
Alabau et al., 2007;
Vicente Alabau and Alberto Sanchis and Francisco Casacuberta (2007):
Improving Speech-to-Speech Translation Using Word Posterior Probabilities, Proceedings of the MT Summit XI
@inproceedings{Alabau:2007:MTSummit,
author = {Vicente Alabau and Alberto Sanchis and Francisco Casacuberta},
title = {Improving Speech-to-Speech Translation Using Word Posterior Probabilities},
url = {
http://www.mt-archive.info/MTS-2007-Alabau.pdf},
googlescholar = {1412853080146898409},
booktitle = {Proceedings of the {MT} Summit XI},
year = 2007
}
Alabau et al., 2007b).
Benchmarks
Discussion
Related Topics
New Publications
Nan Duan and Mu Li and Ming Zhou and Lei Cui (2011):
Improving Phrase Extraction via MBR Phrase Scoring and Pruning, Proceedings of the 13th Machine Translation Summit (MT Summit XIII)
@inproceedings{MTS-2011-Duan,
author = {Nan Duan and Mu Li and Ming Zhou and Lei Cui},
title = {Improving Phrase Extraction via {MBR} Phrase Scoring and Pruning},
url = {
http://www.mt-archive.info/MTS-2011-Duan.pdf},
pages = {189-197},
booktitle = {Proceedings of the 13th Machine Translation Summit (MT Summit XIII)},
publisher = {International Association for Machine Translation},
location = {Xiamen, China},
year = 2011
}
Duan et al. (2011)
Duh, Kevin and Sudoh, Katsuhito and Wu, Xianchao and Tsukada, Hajime and Nagata, Masaaki (2012):
Learning to Translate with Multiple Objectives, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
@InProceedings{duh-EtAl:2012:ACL2012,
author = {Duh, Kevin and Sudoh, Katsuhito and Wu, Xianchao and Tsukada, Hajime and Nagata, Masaaki},
title = {Learning to Translate with Multiple Objectives},
booktitle = {Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {July},
address = {Jeju Island, Korea},
publisher = {Association for Computational Linguistics},
pages = {1--10},
url = {
http://www.aclweb.org/anthology/P12-1001},
year = 2012
}
Duh et al. (2012)
He, Xiaodong and Deng, Li (2012):
Maximum Expected BLEU Training of Phrase and Lexicon Translation Models, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
@InProceedings{he-deng:2012:ACL2012,
author = {He, Xiaodong and Deng, Li},
title = {Maximum Expected BLEU Training of Phrase and Lexicon Translation Models},
booktitle = {Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {July},
address = {Jeju Island, Korea},
publisher = {Association for Computational Linguistics},
pages = {292--301},
url = {
http://www.aclweb.org/anthology/P12-1031},
year = 2012
}
He and Deng (2012)
Hiroaki Shimizu and Masao Utiyama and Eiichiro Sumita and Satoshi Nakamura (2012):
Minimum Bayes-risk decoding extended with similar examples: NAIST-NCT at IWSLT 2012, Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT)
@inproceedings{iwslt12:Shimizu,
author = {Hiroaki Shimizu and Masao Utiyama and Eiichiro Sumita and Satoshi Nakamura},
title = {Minimum {Bayes}-risk decoding extended with similar examples: {NAIST-NCT} at {IWSLT} 2012},
url = {
http://www.mt-archive.info/IWSLT-2012-Shimizu.pdf},
pages = {117-120},
booktitle = {Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT)},
location = {Hong Kong},
year = 2012
}
Shimizu et al. (2012)
- consensus translation
Pauls, Adam and Denero, John and Klein, Dan (2009):
Consensus Training for Consensus Decoding in Machine Translation, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
@InProceedings{pauls-denero-klein:2009:EMNLP,
author = {Pauls, Adam and Denero, John and Klein, Dan},
title = {Consensus Training for Consensus Decoding in Machine Translation},
booktitle = {Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing},
month = {August},
address = {Singapore},
publisher = {Association for Computational Linguistics},
pages = {1418--1427},
url = {
http://www.aclweb.org/anthology/D/D09/D09-1147},
year = 2009
}
Pauls et al. (2009)
Blackwood, Graeme and de Gispert, Adrià and Byrne, William (2010):
Fluency Constraints for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices, Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)
@InProceedings{blackwood-degispert-byrne:2010:PAPERS,
author = {Blackwood, Graeme and de Gispert, Adri\`{a} and Byrne, William},
title = {Fluency Constraints for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices},
booktitle = {Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)},
month = {August},
address = {Beijing, China},
publisher = {Coling 2010 Organizing Committee},
pages = {71--79},
url = {
http://www.aclweb.org/anthology/C10-1009},
year = 2010
}
Blackwood et al. (2010)
Li, Zhifei and Eisner, Jason (2009):
First- and Second-Order Expectation Semirings with Applications to Minimum-Risk Training on Translation Forests, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
@InProceedings{li-eisner:2009:EMNLP,
author = {Li, Zhifei and Eisner, Jason},
title = {First- and Second-Order Expectation Semirings with Applications to Minimum-Risk Training on Translation Forests},
booktitle = {Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing},
month = {August},
address = {Singapore},
publisher = {Association for Computational Linguistics},
pages = {40--51},
url = {
http://www.aclweb.org/anthology/D/D09/D09-1005},
year = 2009
}
Li and Eisner (2009)
Kumar, Shankar and Macherey, Wolfgang and Dyer, Chris and Och, Franz (2009):
Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices, Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP
@InProceedings{kumar-EtAl:2009:ACLIJCNLP,
author = {Kumar, Shankar and Macherey, Wolfgang and Dyer, Chris and Och, Franz},
title = {Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices},
booktitle = {Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP},
month = {August},
address = {Suntec, Singapore},
publisher = {Association for Computational Linguistics},
pages = {163--171},
url = {
http://www.aclweb.org/anthology/P/P09/P09-1019},
year = 2009
}
Kumar et al. (2009)
Li, Zhifei and Eisner, Jason and Khudanpur, Sanjeev (2009):
Variational Decoding for Statistical Machine Translation, Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP
@InProceedings{li-eisner-khudanpur:2009:ACLIJCNLP,
author = {Li, Zhifei and Eisner, Jason and Khudanpur, Sanjeev},
title = {Variational Decoding for Statistical Machine Translation},
booktitle = {Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP},
month = {August},
address = {Suntec, Singapore},
publisher = {Association for Computational Linguistics},
pages = {593--601},
url = {
http://www.aclweb.org/anthology/P/P09/P09-1067},
year = 2009
}
Li et al. (2009)