Large-Scale Discriminative Training

The current mix of generative models, ad hoc scoring functions, and discriminative parameter of a handful of weights is theoretically unappealing, so there has been a long standing effort to train all the millions of parameters of a statistical machine translation model discriminatively.

Large Scale Discriminative Training is the main subject of 44 publications. 18 are discussed here.

Topics in MachineLearning

Publications

Large-scale discriminative training methods that optimize millions of features over the entire training corpus have emerged recently. Tillmann and Zhang (2005) add a binary feature for each phrase translation table entry and train feature weights using a stochastic gradient descent method. Kernel regression methods may be applied to the same task (Wang et al., 2007; Wang and Shawe-Taylor, 2008). Wellington et al. (2006) applies discriminative training to a tree translation model. Large scale discriminative training may also use the perceptron algorithm (Liang et al., 2006) or variations thereof (Tillmann and Zhang, 2006) to directly optimize on error metrics such as BLEU.

Arun and Koehn (2007) compare MIRA and the Perceptron algorithm and point out some of the problems on the road to large-scale discriminative training. This approach has also been applied to a variant of the hierarchical phrase model (Watanabe et al., 2007; Watanabe et al., 2007b). The MIRA algorithm may be also used for an extended form of parameter tuning (Chiang et al., 2008), allowing for the use of thousands of features (Chiang et al., 2009), covering properties such as source and target syntax (Chiang, 2010), on a larger tuning set.

Blunsom et al. (2008) argue the importance to perform feature updates on all derivations of translation, not just the most likely one, to address spurious ambiguity. A representative subset of translations may be acquired by sampling (Arun et al., 2009). This allows for a unified approach to Minimum Risk training and decoding (Arun et al., 2010). While Arun et al. (2009) use Gibbs sampling, simpler methods such as SampleRank (Haddow et al., 2011) may be used as well.

Machine translation may be framed as a structured prediction problem, which is a current strain of machine learning research. Zhang et al. (2008) frame ITG decoding in such a way and propose a discriminative training method following the SEARN algorithm (Daumé III et al., 2006).

Benchmarks

Discussion

New Publications

Tamchyna, Aleš and Fraser, Alexander and Bojar, Ondřej and Junczys-Dowmunt, Marcin (2016): Target-Side Context for Discriminative Models in Statistical Machine Translation, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
add
@InProceedings{tamchyna-EtAl:2016:P16-1,
author = {Tamchyna, Ale\v{s} and Fraser, Alexander and Bojar, Ond\v{r}ej and Junczys-Dowmunt, Marcin},
title = {Target-Side Context for Discriminative Models in Statistical Machine Translation},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {1704--1714},
url = {http://www.aclweb.org/anthology/P16-1161},
year = 2016
}
Tamchyna et al. (2016)
Braune, Fabienne and Fraser, Alexander and Daumé III, Hal and Tamchyna, Aleš (2016): A Framework for Discriminative Rule Selection in Hierarchical Moses, Proceedings of the First Conference on Machine Translation
add
@InProceedings{braune-EtAl:2016:WMT,
author = {Braune, Fabienne and Fraser, Alexander and Daum\'{e} III, Hal and Tamchyna, Ale\v{s}},
title = {A Framework for Discriminative Rule Selection in Hierarchical Moses},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {92--101},
url = {http://www.aclweb.org/anthology/W/W16/W16-2210},
year = 2016
}
Braune et al. (2016)
Wuebker, Joern and Muehr, Sebastian and Lehnen, Patrick and Peitz, Stephan and Ney, Hermann (2015): A Comparison of Update Strategies for Large-Scale Maximum Expected BLEU Training, Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
add
@InProceedings{wuebker-EtAl:2015:NAACL-HLT,
author = {Wuebker, Joern and Muehr, Sebastian and Lehnen, Patrick and Peitz, Stephan and Ney, Hermann},
title = {A Comparison of Update Strategies for Large-Scale Maximum Expected BLEU Training},
booktitle = {Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
month = {May--June},
address = {Denver, Colorado},
publisher = {Association for Computational Linguistics},
pages = {1516--1526},
url = {http://www.aclweb.org/anthology/N15-1175},
year = 2015
}
Wuebker et al. (2015)
Sokolov, Artem and Riezler, Stefan and Cohen, Shay B. (2015): A Coactive Learning View of Online Structured Prediction in Statistical Machine Translation, Proceedings of the Nineteenth Conference on Computational Natural Language Learning
add
@InProceedings{sokolov-riezler-cohen:2015:CoNLL,
author = {Sokolov, Artem and Riezler, Stefan and Cohen, Shay B.},
title = {A Coactive Learning View of Online Structured Prediction in Statistical Machine Translation},
booktitle = {Proceedings of the Nineteenth Conference on Computational Natural Language Learning},
month = {July},
address = {Beijing, China},
publisher = {Association for Computational Linguistics},
pages = {1--11},
url = {http://www.aclweb.org/anthology/K15-1001},
year = 2015
}
Sokolov et al. (2015)
Eidelman, Vladimir and Wu, Ke and Ture, Ferhan and Resnik, Philip and Lin, Jimmy (2013): Towards Efficient Large-Scale Feature-Rich Statistical Machine Translation, Proceedings of the Eighth Workshop on Statistical Machine Translation
add
@InProceedings{eidelman-EtAl:2013:WMT,
author = {Eidelman, Vladimir and Wu, Ke and Ture, Ferhan and Resnik, Philip and Lin, Jimmy},
title = {Towards Efficient Large-Scale Feature-Rich Statistical Machine Translation},
booktitle = {Proceedings of the Eighth Workshop on Statistical Machine Translation},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {128--133},
url = {http://www.aclweb.org/anthology/W13-2214},
year = 2013
}
Eidelman et al. (2013)
Xingyi Song and Lucia Specia and Trevor Cohn (2014): Data selection for discriminative training in statistical machine translation, Proceedings of 17th Annual conference of the European Association for Machine Translation
add
@inproceedings{eamt-2014-Song,
author = {Xingyi Song and Lucia Specia and Trevor Cohn},
title = {Data selection for discriminative training in statistical machine translation},
booktitle = {Proceedings of 17th Annual conference of the European Association for Machine Translation},
pages = {45-52},
url = {http://www.mt-archive.info/10/EAMT-2014-Song.pdf},
location = {Dubrovnik, Croatia},
year = 2014
}
Song et al. (2014)
Avneesh Saluja and Ying Zhang (2014): Online discriminative learning for machine translation with binary-valued feedback, Machine Translation
add
@article{MTJ:2014:Saluja,
author = {Avneesh Saluja and Ying Zhang},
title = {Online discriminative learning for machine translation with binary-valued feedback},
pages = {69-90},
journal = {Machine Translation},
volume = {28},
number = {2},
month = {October},
year = 2014
}
Saluja and Zhang (2014)
Green, Spence and Cer, Daniel and Manning, Christopher (2014): An Empirical Comparison of Features and Tuning for Phrase-based Machine Translation, Proceedings of the Ninth Workshop on Statistical Machine Translation mentioned in Parameter Tuning and Large Scale Discriminative Training
add
@InProceedings{green-cer-manning:2014:W14-332,
author = {Green, Spence and Cer, Daniel and Manning, Christopher},
title = {An Empirical Comparison of Features and Tuning for Phrase-based Machine Translation},
booktitle = {Proceedings of the Ninth Workshop on Statistical Machine Translation},
month = {June},
address = {Baltimore, Maryland, USA},
publisher = {Association for Computational Linguistics},
pages = {466--476},
url = {http://www.aclweb.org/anthology/W14-3360},
year = 2014
}
Green et al. (2014)
Tan, Ming and Xia, Tian and Wang, Shaojun and Zhou, Bowen (2013): A Corpus Level MIRA Tuning Strategy for Machine Translation, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{tan-EtAl:2013:EMNLP,
author = {Tan, Ming and Xia, Tian and Wang, Shaojun and Zhou, Bowen},
title = {A Corpus Level {MIRA} Tuning Strategy for Machine Translation},
booktitle = {Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing},
month = {October},
address = {Seattle, Washington, USA},
publisher = {Association for Computational Linguistics},
pages = {851--856},
url = {http://www.aclweb.org/anthology/D13-1083},
year = 2013
}
Tan et al. (2013)
Zhao, Kai and Huang, Liang and Mi, Haitao and Ittycheriah, Abe (2014): Hierarchical MT Training using Max-Violation Perceptron, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
add
@InProceedings{zhao-EtAl:2014:P14-2,
author = {Zhao, Kai and Huang, Liang and Mi, Haitao and Ittycheriah, Abe},
title = {Hierarchical {MT} Training using Max-Violation Perceptron},
booktitle = {Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
month = {June},
address = {Baltimore, Maryland},
publisher = {Association for Computational Linguistics},
pages = {785--790},
url = {http://www.aclweb.org/anthology/P14-2127},
year = 2014
}
Zhao et al. (2014)
Auli, Michael and Galley, Michel and Gao, Jianfeng (2014): Large-scale Expected BLEU Training of Phrase-based Reordering Models, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
add
@InProceedings{auli-galley-gao:2014:EMNLP2014,
author = {Auli, Michael and Galley, Michel and Gao, Jianfeng},
title = {Large-scale Expected BLEU Training of Phrase-based Reordering Models},
booktitle = {Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
month = {October},
address = {Doha, Qatar},
publisher = {Association for Computational Linguistics},
pages = {1250--1260},
url = {http://www.aclweb.org/anthology/D14-1132},
year = 2014
}
Auli et al. (2014)
Simianer, Patrick and Riezler, Stefan (2013): Multi-Task Learning for Improved Discriminative Training in SMT, Proceedings of the Eighth Workshop on Statistical Machine Translation
add
@InProceedings{simianer-riezler:2013:WMT,
author = {Simianer, Patrick and Riezler, Stefan},
title = {Multi-Task Learning for Improved Discriminative Training in {SMT}},
booktitle = {Proceedings of the Eighth Workshop on Statistical Machine Translation},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {292--300},
url = {http://www.aclweb.org/anthology/W13-2236},
year = 2013
}
Simianer and Riezler (2013)
Flanigan, Jeffrey and Dyer, Chris and Carbonell, Jaime (2013): Large-Scale Discriminative Training for Statistical Machine Translation Using Held-Out Line Search, Proceedings of NAACL-HLT
add
@inproceedings{flanigan2013large,
author = {Flanigan, Jeffrey and Dyer, Chris and Carbonell, Jaime},
title = {Large-Scale Discriminative Training for Statistical Machine Translation Using Held-Out Line Search},
url = {http://www.aclweb.org/anthology/N13-1025},
googlescholar = {11168103960488599596},
booktitle = {Proceedings of NAACL-HLT},
pages = {248--258},
year = 2013
}
Flanigan et al. (2013)
Cherry, Colin and Foster, George (2012): Batch tuning strategies for statistical machine translation, Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
add
@inproceedings{cherry2012batch,
author = {Cherry, Colin and Foster, George},
title = {Batch tuning strategies for statistical machine translation},
url = {http://www.aclweb.org/anthology/N/N12/N12-1047.pdf},
googlescholar = {13457139291854575466},
booktitle = {Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
pages = {427--436},
organization = {Association for Computational Linguistics},
year = 2012
}
Cherry and Foster (2012)
Gimpel, Kevin and Smith, Noah A (2012): Structured ramp loss minimization for machine translation, Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
add
@inproceedings{gimpel2012structured,
author = {Gimpel, Kevin and Smith, Noah A},
title = {Structured ramp loss minimization for machine translation},
url = {http://www.cs.cmu.edu/~nasmith/papers/gimpel+smith.naacl12.pdf},
googlescholar = {14584730824265315099},
booktitle = {Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
pages = {221--231},
organization = {Association for Computational Linguistics},
year = 2012
}
Gimpel and Smith (2012)
Chiang, David (2012): Hope and fear for discriminative training of statistical translation models, The Journal of Machine Learning Research
add
@article{chiang2012hope,
author = {Chiang, David},
title = {Hope and fear for discriminative training of statistical translation models},
url = {http://www.isi.edu/~chiang/papers/chiang-jmlr12.pdf},
googlescholar = {12857447296546216175},
journal = {The Journal of Machine Learning Research},
volume = {98888},
pages = {1159--1187},
publisher = {JMLR. org},
year = 2012
}
Chiang (2012)
Green, Spence and Wang, Sida and Cer, Daniel and Manning, Christopher D (2013): Fast and Adaptive Online Training of Feature-Rich Translation Models
add
@inproceedings{green2013fast,
author = {Green, Spence and Wang, Sida and Cer, Daniel and Manning, Christopher D},
title = {Fast and Adaptive Online Training of Feature-Rich Translation Models},
url = {http://www-nlp.stanford.edu/~sidaw/home/\_media/papers:onlinemt.pdf},
googlescholar = {4291958085712942190},
organization = {ACL},
year = 2013
}
Green et al. (2013)
Flanigan, Jeffrey and Dyer, Chris and Carbonell, Jaime (2013): Large-Scale Discriminative Training for Statistical Machine Translation Using Held-Out Line Search, Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
add
@InProceedings{flanigan-dyer-carbonell:2013:NAACL-HLT,
author = {Flanigan, Jeffrey and Dyer, Chris and Carbonell, Jaime},
title = {Large-Scale Discriminative Training for Statistical Machine Translation Using Held-Out Line Search},
booktitle = {Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
month = {June},
address = {Atlanta, Georgia},
publisher = {Association for Computational Linguistics},
pages = {248--258},
url = {http://www.aclweb.org/anthology/N13-1025},
year = 2013
}
Flanigan et al. (2013)
Abhishek Arun and Barry Haddow and Philipp Koehn and Adam Lopez and Chris Dyer (2010): Monte Carlo techniques for phrase-based translation, Machine Translation
add
@article{MTJ:2010:Arun,
author = {Abhishek Arun and Barry Haddow and Philipp Koehn and Adam Lopez and Chris Dyer},
title = {Monte {C}arlo techniques for phrase-based translation},
url = {http://homepages.inf.ed.ac.uk/bhaddow/arun-mtsi-eps.pdf},
googlescholar = {4875145697102106083},
pages = {103-121},
journal = {Machine Translation},
volume = {24},
number = {2},
month = {June},
year = 2010
}
Arun et al. (2010)
Duan, Nan and Li, Mu and Zhou, Ming (2012): Forced Derivation Tree based Model Training to Statistical Machine Translation, Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
add
@InProceedings{duan-li-zhou:2012:EMNLP-CoNLL,
author = {Duan, Nan and Li, Mu and Zhou, Ming},
title = {Forced Derivation Tree based Model Training to Statistical Machine Translation},
booktitle = {Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning},
month = {July},
address = {Jeju Island, Korea},
publisher = {Association for Computational Linguistics},
pages = {445--454},
url = {http://www.aclweb.org/anthology/D12-1041},
year = 2012
}
Duan et al. (2012)
Simianer, Patrick and Riezler, Stefan and Dyer, Chris (2012): Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
add
@InProceedings{simianer-riezler-dyer:2012:ACL2012,
author = {Simianer, Patrick and Riezler, Stefan and Dyer, Chris},
title = {Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT},
booktitle = {Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {July},
address = {Jeju Island, Korea},
publisher = {Association for Computational Linguistics},
pages = {11--21},
url = {http://www.aclweb.org/anthology/P12-1002},
year = 2012
}
Simianer et al. (2012)
Wuebker, Joern and Hwang, Mei-Yuh and Quirk, Chris (2012): Leave-One-Out Phrase Model Training for Large-Scale Deployment, Proceedings of the Seventh Workshop on Statistical Machine Translation
add
@InProceedings{wuebker-hwang-quirk:2012:WMT,
author = {Wuebker, Joern and Hwang, Mei-Yuh and Quirk, Chris},
title = {Leave-One-Out Phrase Model Training for Large-Scale Deployment},
booktitle = {Proceedings of the Seventh Workshop on Statistical Machine Translation},
month = {June},
address = {Montreal, Canada},
publisher = {Association for Computational Linguistics},
pages = {457--464},
url = {http://www.aclweb.org/anthology/W12-3158},
year = 2012
}
Wuebker et al. (2012)
Yuan Cao and Sanjeev Khudanpur (2012): Sample Selection for Large-scale MT Discriminative Training, Proceedings of the Tenth Conference of the Association for Machine Translation in the Americas (AMTA)
add
@inproceedings{AMTA-2012-Cao,
author = {Yuan Cao and Sanjeev Khudanpur},
title = {Sample Selection for Large-scale {MT} Discriminative Training},
url = {http://www.mt-archive.info/AMTA-2012-Cao.pdf},
booktitle = {Proceedings of the Tenth Conference of the Association for Machine Translation in the Americas (AMTA)},
location = {San Diego, California},
year = 2012
}
Cao and Khudanpur (2012)
Eva Hasler and Barry Haddow and Philipp Koehn (2012): Sparse lexicalised features and topic adaptation for SMT, Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT)
add
@inproceedings{iwslt12:Hasler-2,
author = {Eva Hasler and Barry Haddow and Philipp Koehn},
title = {Sparse lexicalised features and topic adaptation for {SMT}},
url = {http://www.mt-archive.info/IWSLT-2012-Hasler-2.pdf},
pages = {268-275},
booktitle = {Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT)},
location = {Hong Kong},
year = 2012
}
Hasler et al. (2012)
Li, Zhifei and Wang, Ziyuan and Eisner, Jason and Khudanpur, Sanjeev and Roark, Brian (2011): Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{li-EtAl:2011:EMNLP1,
author = {Li, Zhifei and Wang, Ziyuan and Eisner, Jason and Khudanpur, Sanjeev and Roark, Brian},
title = {Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation},
booktitle = {Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing},
month = {July},
address = {Edinburgh, Scotland, UK.},
publisher = {Association for Computational Linguistics},
pages = {920--929},
url = {http://www.aclweb.org/anthology/D11-1085},
year = 2011
}
Li et al. (2011)
Xiao, Xinyan and Liu, Yang and Liu, Qun and Lin, Shouxun (2011): Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{xiao-EtAl:2011:EMNLP,
author = {Xiao, Xinyan and Liu, Yang and Liu, Qun and Lin, Shouxun},
title = {Fast Generation of Translation Forest for Large-Scale {SMT} Discriminative Training},
booktitle = {Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing},
month = {July},
address = {Edinburgh, Scotland, UK.},
publisher = {Association for Computational Linguistics},
pages = {880--888},
url = {http://www.aclweb.org/anthology/D11-1081},
year = 2011
}
Xiao et al. (2011)

MT Research Survey Wiki

A Comprehensive Survey of Neural and Statistical Machine Translation Research Publications

Search Descriptions

Large-Scale Discriminative Training

Publications

Benchmarks

Discussion

Related Topics

New Publications