Re-Ranking
Instead of solely relying on an integrated search for the best translation, we may introduce a second decoding pass in which the best translation is chosen from the set of the most likely candidates generated by a traditional decoder. This allows more features or alternate decision rules.
Reranking is the main subject of 12 publications. 8 are discussed here.
Publications
Minimum error rate training has been used for re-ranking
Franz Josef Och and Daniel Gildea and Sanjeev Khudanpur and Anoop Sarkar and Kenji Yamada and Alexander Fraser and Shankar Kumar and Libin Shen and David A. Smith and Katherine Eng and Viren Jain and Zhen Jin and Dragomir Radev (2004):
A Smorgasbord of Features for Statistical Machine Translation, Proceedings of the Joint Conference on Human Language Technologies and the Annual Meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL)
mentioned in Syntactic Reranking and Reranking@Inproceedings{Och:2004,
author = {Franz Josef Och and Daniel Gildea and Sanjeev Khudanpur and Anoop Sarkar and Kenji Yamada and Alexander Fraser and Shankar Kumar and Libin Shen and David A. Smith and Katherine Eng and Viren Jain and Zhen Jin and Dragomir Radev},
title = {A Smorgasbord of Features for Statistical Machine Translation},
url = {
http://acl.ldc.upenn.edu/hlt-naacl2004/main/pdf/54\_Paper.pdf},
googlescholar = {2846180344359494752},
booktitle = {Proceedings of the Joint Conference on Human Language Technologies and the Annual Meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL)},
year = 2004
}
(Och et al., 2004), other proposed methods are based on ordinal regression to separate good translations from bad ones
Libin Shen and Anoop Sarkar and Franz Josef Och (2004):
Discriminative Reranking for Machine Translation, Proceedings of the Joint Conference on Human Language Technologies and the Annual Meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL)
@Inproceedings{Shen:2004,
author = {Libin Shen and Anoop Sarkar and Franz Josef Och},
title = {Discriminative Reranking for Machine Translation},
url = {
http://acl.ldc.upenn.edu/hlt-naacl2004/main/pdf/121\_Paper.pdf},
googlescholar = {15918420457011067539},
booktitle = {Proceedings of the Joint Conference on Human Language Technologies and the Annual Meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL)},
year = 2004
}
(Shen et al., 2004) and SPSA
Patrik Lambert and Rafael E. Banchs (2006):
Tuning machine translation parameters with SPSA, Proc. of the International Workshop on Spoken Language Translation
@inproceedings{Lambert:2006:IWSLT,
author = {Patrik Lambert and Rafael E. Banchs},
title = {Tuning machine translation parameters with {SPSA}},
url = {
http://hal.archives-ouvertes.fr/docs/00/92/74/78/PDF/spsa.CR.pdf},
googlescholar = {7794354378363639524},
month = {November},
booktitle = {Proc. of the International Workshop on Spoken Language Translation},
address = {Kyoto, Japan},
year = 2006
}
(Lambert and Banchs, 2006).
Duh, Kevin and Kirchhoff, Katrin (2008):
Beyond Log-Linear Models: Boosted Minimum Error Rate Training for N-best Re-ranking, Proceedings of ACL-08: HLT, Short Papers
@InProceedings{duh-kirchhoff:2008:ACLShort,
author = {Duh, Kevin and Kirchhoff, Katrin},
title = {Beyond Log-Linear Models: Boosted Minimum Error Rate Training for N-best Re-ranking},
booktitle = {Proceedings of ACL-08: HLT, Short Papers},
month = {June},
address = {Columbus, Ohio},
publisher = {Association for Computational Linguistics},
pages = {37--40},
url = {
http://www.aclweb.org/anthology/P/P08/P08-2010},
year = 2008
}
Duh and Kirchhoff (2008) use boosting to improve over the log-linear model without any additional features.
Hasan, Saša and Zens, Richard and Ney, Hermann (2007):
Are Very Large N-Best Lists Useful for SMT?, Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
@InProceedings{hasan-zens-ney:2007:ShortPapers,
author = {Hasan, Sa\v{s}a and Zens, Richard and Ney, Hermann},
title = {Are Very Large N-Best Lists Useful for {SMT}?},
booktitle = {Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers},
month = {April},
address = {Rochester, New York},
publisher = {Association for Computational Linguistics},
pages = {57--60},
url = {
http://www.aclweb.org/anthology/N/N07/N07-2015},
year = 2007
}
Hasan et al. (2007) examine the required size of n-best lists, both considering Oracle-BLEU and actual re-ranking performance, and see gains with n-best lists of up to 10,000. The use of a log-linear model imposes certain restrictions on features that may be relaxed using other machine learning approaches such as kernel methods or Gaussian mixture models
Nguyen, Patrick and Mahajan, Milind and He, Xiaodong (2007):
Training Non-Parametric Features for Statistical Machine Translation, Proceedings of the Second Workshop on Statistical Machine Translation
@InProceedings{nguyen-mahajan-he:2007:WMT,
author = {Nguyen, Patrick and Mahajan, Milind and He, Xiaodong},
title = {Training Non-Parametric Features for Statistical Machine Translation},
booktitle = {Proceedings of the Second Workshop on Statistical Machine Translation},
month = {June},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {72--79},
url = {
http://www.aclweb.org/anthology/W/W07/W07-0210},
year = 2007
}
(Nguyen et al., 2007).
See work by
Boxing Chen and Jun Sun and Hongfei Jiang and Min Zhang and Aiti Aw (2007):
I^2R Chinese-English Translation System forIWSLT 2007, Proceedings of the International Workshop on Spoken Language Translation (IWSLT)
mentioned in Research Groups and Reranking@inproceedings{BoxingChen:2007:IWSLT,
author = {Boxing Chen and Jun Sun and Hongfei Jiang and Min Zhang and Aiti Aw},
title = {{$I^2R$} {C}hinese-{E}nglish Translation System for{IWSLT} 2007},
url = {
http://20.210-193-52.unknown.qala.com.sg/archive/iwslt\_07/papers/slt7\_055.pdf},
googlescholar = {12484695701786327964},
booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)},
year = 2007
}
Chen et al. (2007) and
Alexandre Patry and Philippe Langlais and Frédéric Béchet (2007):
MISTRAL: A Lattice Translation System for IWSLT 2007, Proceedings of the International Workshop on Spoken Language Translation (IWSLT)
mentioned in Research Groups and Reranking@inproceedings{Patry:2007:IWSLT,
author = {Alexandre Patry and Philippe Langlais and Fr{\'e}d{\'e}ric B{\'e}chet},
title = {{MISTRAL}: A Lattice Translation System for {IWSLT} 2007},
url = {
http://20.210-193-52.unknown.qala.com.sg/archive/iwslt\_07/papers/slt7\_146.pdf},
googlescholar = {14872243486498596741},
booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)},
year = 2007
}
Patry et al. (2007) for features used in re-ranking.
Benchmarks
Discussion
Related Topics
Minimum Bayes Risk decoding is typically done in the form of re-ranking an n-best list or graph. Re-ranking also allows the use of very large language models.
New Publications
Simon Carter and Christof Monz (2010):
Discriminative Syntactic Reranking for Statistical Machine Translation, Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas
@inproceedings{AMTA-2010-Carter,
author = {Simon Carter and Christof Monz},
title = {Discriminative Syntactic Reranking for Statistical Machine Translation},
url = {
http://www.mt-archive.info/AMTA-2010-Carter.pdf},
booktitle = {Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas},
location = {Denver, Colorado},
year = 2010
}
Carter and Monz (2010)
Artem Sokolov and Guillaume Wisniewski and François Yvon (2012):
Non-linear n-best List Reranking with Few Features, Proceedings of the Tenth Conference of the Association for Machine Translation in the Americas (AMTA)
@inproceedings{AMTA-2012-Sokolov,
author = {Artem Sokolov and Guillaume Wisniewski and Fran{\,c}ois Yvon},
title = {Non-linear n-best List Reranking with Few Features},
url = {
http://www.mt-archive.info/AMTA-2012-Sokolov.pdf},
booktitle = {Proceedings of the Tenth Conference of the Association for Machine Translation in the Americas (AMTA)},
location = {San Diego, California},
year = 2012
}
Sokolov et al. (2012)
Erik Velldal and Stephan Oepen (2005):
Maximum Entropy Models for Realization Ranking, Proceedings of the Tenth Machine Translation Summit (MT Summit X)
@InProceedings{Velldal:2005:MTS,
author = {Erik Velldal and Stephan Oepen},
title = {Maximum Entropy Models for Realization Ranking},
url = {
http://heim.ifi.uio.no/~erikve/pubs/VelOep05.pdf},
googlescholar = {17230566164713489421},
booktitle = {Proceedings of the Tenth Machine Translation Summit (MT Summit X)},
month = {September},
address = {Phuket, Thailand},
year = 2005
}
Velldal and Oepen (2005)
Duh, Kevin and Sudoh, Katsuhito and Tsukada, Hajime and Isozaki, Hideki and Nagata, Masaaki (2010):
N-Best Reranking by Multitask Learning, Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
@InProceedings{duh-EtAl:2010:WMT,
author = {Duh, Kevin and Sudoh, Katsuhito and Tsukada, Hajime and Isozaki, Hideki and Nagata, Masaaki},
title = {N-Best Reranking by Multitask Learning},
booktitle = {Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR},
month = {July},
address = {Uppsala, Sweden},
publisher = {Association for Computational Linguistics},
pages = {375--383},
url = {
http://www.aclweb.org/anthology/W10-1757},
year = 2010
}
Duh et al. (2010)