Trained Metrics
If we define the quality of a metric by correlation to human judgment, then it is possible to train metrics to optimize this correlation.
Trained Metrics is the main subject of 25 publications. 9 are discussed here.
Publications
Albrecht, Joshua and Hwa, Rebecca (2007):
A Re-examination of Machine Learning Approaches for Sentence-Level MT Evaluation, Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
@InProceedings{albrecht-hwa:2007:ACLMain2,
author = {Albrecht, Joshua and Hwa, Rebecca},
title = {A Re-examination of Machine Learning Approaches for Sentence-Level {MT} Evaluation},
booktitle = {Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics},
month = {June},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {880--887},
url = {
http://www.aclweb.org/anthology/P/P07/P07-1111},
year = 2007
}
Albrecht and Hwa (2007) argue for the general advantages of learning evaluation metrics from a large number of features, although
Sun, Shuqi and Chen, Yin and Li, Jufeng (2008):
A Re-examination on Features in Regression Based Approach to Automatic MT Evaluation, Proceedings of the ACL-08: HLT Student Research Workshop
@InProceedings{sun-chen-li:2008:SRW,
author = {Sun, Shuqi and Chen, Yin and Li, Jufeng},
title = {A Re-examination on Features in Regression Based Approach to Automatic {MT} Evaluation},
booktitle = {Proceedings of the ACL-08: HLT Student Research Workshop},
month = {June},
address = {Columbus, Ohio},
publisher = {Association for Computational Linguistics},
pages = {25--30},
url = {
http://www.aclweb.org/anthology/P/P08/P08-3005},
year = 2008
}
Sun et al. (2008) point out that carefully designed features may be more important.
Jones and Rusk (2000) propose a method that learns automatically to distinguish human translations from machine translations. Since in practice the purpose of evaluation is to distinguish good translations from bad translations, it may be beneficial to view evaluation as a ranking task
Ye, Yang and Zhou, Ming and Lin, Chin-Yew (2007):
Sentence Level Machine Translation Evaluation as a Ranking, Proceedings of the Second Workshop on Statistical Machine Translation
@InProceedings{ye-zhou-lin:2007:WMT,
author = {Ye, Yang and Zhou, Ming and Lin, Chin-Yew},
title = {Sentence Level Machine Translation Evaluation as a Ranking},
booktitle = {Proceedings of the Second Workshop on Statistical Machine Translation},
month = {June},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {240--247},
url = {
http://www.aclweb.org/anthology/W/W07/W07-0236},
year = 2007
}
(Ye et al., 2007;
Duh, Kevin (2008):
Ranking vs. Regression in Machine Translation Evaluation, Proceedings of the Third Workshop on Statistical Machine Translation
@InProceedings{duh:2008:WMT,
author = {Duh, Kevin},
title = {Ranking vs. Regression in Machine Translation Evaluation},
booktitle = {Proceedings of the Third Workshop on Statistical Machine Translation},
month = {June},
address = {Columbus, Ohio},
publisher = {Association for Computational Linguistics},
pages = {191--194},
url = {
http://www.aclweb.org/anthology/W/W08/W08-0331},
year = 2008
}
Duh, 2008).
Lin, Chin-Yew and Och, Franz Josef (2004):
ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , Proceedings of Coling 2004
@inproceedings{Lin:2004b,
author = {Lin, Chin-Yew and Och, Franz Josef},
title = {ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation },
url = {
http://acl.ldc.upenn.edu/C/C04/C04-1072.pdf},
booktitle = {Proceedings of Coling 2004 },
editor = {{}},
month = {Aug 23--Aug 27},
address = {Geneva, Switzerland},
publisher = {COLING},
pages = {501--507},
year = 2004
}
Lin and Och (2004) propose a metric for the evaluation of evaluation metrics, which does not require human judgment data for correlation. The metric is based on the rank given to a reference translation among machine translations.
Multiple metrics may be combined uniformly
Giménez, Jesús and Màrquez, Lluís (2008):
A Smorgasbord of Features for Automatic MT Evaluation, Proceedings of the Third Workshop on Statistical Machine Translation
@InProceedings{gimenez-marquez:2008:WMT,
author = {Gim{\'e}nez, Jes{\'u}s and M{\`a}rquez, Llu{\'i}s},
title = {A Smorgasbord of Features for Automatic {MT} Evaluation},
booktitle = {Proceedings of the Third Workshop on Statistical Machine Translation},
month = {June},
address = {Columbus, Ohio},
publisher = {Association for Computational Linguistics},
pages = {195--198},
url = {
http://www.aclweb.org/anthology/W/W08/W08-0332},
year = 2008
}
(Giménez and Màrquez, 2008), or by adding metrics greedily until no improvement is seen
Jesús Giménez and Lluís Màrquez (2008):
Heterogeneous Automatic MT Evaluation Through Non-Parametric Metric Combinations, Proceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP)
@inproceedings{Gimenez:2008:IJCNLP,
author = {Jes{\'u}s Gim{\'e}nez and Llu{\'i}s M{\`a}rquez},
title = {Heterogeneous Automatic {MT} Evaluation Through Non-Parametric Metric Combinations},
booktitle = {Proceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP)},
year = 2008
}
(Giménez and Màrquez, 2008).
Denkowski, Michael and Lavie, Alon (2010):
Extending the METEOR Machine Translation Evaluation Metric to the Phrase Level, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
@InProceedings{denkowski-lavie:2010:NAACLHLT,
author = {Denkowski, Michael and Lavie, Alon},
title = {Extending the METEOR Machine Translation Evaluation Metric to the Phrase Level},
booktitle = {Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics},
month = {June},
address = {Los Angeles, California},
publisher = {Association for Computational Linguistics},
pages = {250--253},
url = {
http://www.aclweb.org/anthology/N10-1031},
year = 2010
}
(Denkowski and Lavie, 2010) add a number of parameters to the n-gram based METEOR metric and extend it into a trainable metric.
Benchmarks
Discussion
Related Topics
New Publications
Stanojević, Miloš and Sima'an, Khalil (2017):
Alternative Objective Functions for Training MT Evaluation Metrics, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
@InProceedings{stanojevic-simaan:2017:Short,
author = {Stanojevi\'{c}, Milo\v{s} and Sima'an, Khalil},
title = {Alternative Objective Functions for Training {MT} Evaluation Metrics},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
month = {July},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
pages = {20--25},
url = {
http://aclweb.org/anthology/P17-2004},
year = 2017
}
Stanojević and Sima'an (2017)
Gupta, Rohit and Orasan, Constantin and van Genabith, Josef (2015):
ReVal: A Simple and Effective Machine Translation Evaluation Metric Based on Recurrent Neural Networks, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
mentioned in Trained Metrics and Neural Components In Statistical Machine Translation@InProceedings{gupta-orasan-vangenabith:2015:EMNLP,
author = {Gupta, Rohit and Orasan, Constantin and van Genabith, Josef},
title = {ReVal: A Simple and Effective Machine Translation Evaluation Metric Based on Recurrent Neural Networks},
booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing},
month = {September},
address = {Lisbon, Portugal},
publisher = {Association for Computational Linguistics},
pages = {1066--1072},
url = {
http://aclweb.org/anthology/D15-1124},
year = 2015
}
Gupta et al. (2015)
Guzmán, Francisco and Joty, Shafiq and Màrquez, Lluís and Moschitti, Alessandro and Nakov, Preslav and Nicosia, Massimo (2014):
Learning to Differentiate Better from Worse Translations, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
@InProceedings{guzman-EtAl:2014:EMNLP2014,
author = {Guzm\'{a}n, Francisco and Joty, Shafiq and M\`{a}rquez, Llu\'{i}s and Moschitti, Alessandro and Nakov, Preslav and Nicosia, Massimo},
title = {Learning to Differentiate Better from Worse Translations},
booktitle = {Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
month = {October},
address = {Doha, Qatar},
publisher = {Association for Computational Linguistics},
pages = {214--220},
url = {
http://www.aclweb.org/anthology/D14-1027},
year = 2014
}
Guzmán et al. (2014)
Gonzàlez, Meritxell and Barrón-Cedeño, Alberto and Màrquez, Lluís (2014):
IPA and STOUT: Leveraging Linguistic and Source-based Features for Machine Translation Evaluation, Proceedings of the Ninth Workshop on Statistical Machine Translation
@InProceedings{gonzalez-barroncedeno-marquez:2014:W14-33,
author = {Gonz\`{a}lez, Meritxell and Barr\'{o}n-Cede\~{n}o, Alberto and M\`{a}rquez, Llu\'{i}s},
title = {IPA and STOUT: Leveraging Linguistic and Source-based Features for Machine Translation Evaluation},
booktitle = {Proceedings of the Ninth Workshop on Statistical Machine Translation},
month = {June},
address = {Baltimore, Maryland, USA},
publisher = {Association for Computational Linguistics},
pages = {394--401},
url = {
http://www.aclweb.org/anthology/W14-3351},
year = 2014
}
Gonzàlez et al. (2014)
Stanojevic, Milos and Sima'an, Khalil (2014):
BEER: BEtter Evaluation as Ranking, Proceedings of the Ninth Workshop on Statistical Machine Translation
mentioned in Evaluation and Trained Metrics@InProceedings{stanojevic-simaan:2014:W14-33,
author = {Stanojevic, Milos and Sima'an, Khalil},
title = {BEER: BEtter Evaluation as Ranking},
booktitle = {Proceedings of the Ninth Workshop on Statistical Machine Translation},
month = {June},
address = {Baltimore, Maryland, USA},
publisher = {Association for Computational Linguistics},
pages = {414--419},
url = {
http://www.aclweb.org/anthology/W14-3354},
year = 2014
}
Stanojevic and Sima'an (2014)
Stanojević, Miloš and Sima'an, Khalil (2014):
Fitting Sentence Level Translation Evaluation with Many Dense Features, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
mentioned in Evaluation and Trained Metrics@InProceedings{stanojevic-simaan:2014:EMNLP2014,
author = {Stanojevi\'{c}, Milo\v{s} and Sima'an, Khalil},
title = {Fitting Sentence Level Translation Evaluation with Many Dense Features},
booktitle = {Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
month = {October},
address = {Doha, Qatar},
publisher = {Association for Computational Linguistics},
pages = {202--206},
url = {
http://www.aclweb.org/anthology/D14-1025},
year = 2014
}
Stanojević and Sima'an (2014)
Lucia Specia and Kashif Shah (2014):
Predicting human translation quality, Proceedings of the Eleventh Conference of the Association for Machine Translation in the Americas (AMTA)
@inproceedings{AMTA-2014-Specia,
author = {Lucia Specia and Kashif Shah},
title = {Predicting human translation quality},
pages = {288-300},
url = {
http://www.mt-archive.info/10/AMTA-2014-Specia.pdf},
volume = {1},
booktitle = {Proceedings of the Eleventh Conference of the Association for Machine Translation in the Americas (AMTA)},
location = {Vancouver, BC, Canada},
year = 2014
}
Specia and Shah (2014)
Han, Aaron Li-Feng and Wong, Derek F. and Chao, Lidia S. and Lu, Yi and He, Liangye and Wang, Yiming and Zhou, Jiaji (2013):
A Description of Tunable Machine Translation Evaluation Systems in WMT13 Metrics Task, Proceedings of the Eighth Workshop on Statistical Machine Translation
@InProceedings{han-EtAl:2013:WMT2,
author = {Han, Aaron Li-Feng and Wong, Derek F. and Chao, Lidia S. and Lu, Yi and He, Liangye and Wang, Yiming and Zhou, Jiaji},
title = {A Description of Tunable Machine Translation Evaluation Systems in {WMT}13 Metrics Task},
booktitle = {Proceedings of the Eighth Workshop on Statistical Machine Translation},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {414--421},
url = {
http://www.aclweb.org/anthology/W13-2253},
year = 2013
}
Han et al. (2013)
Wang, Mengqiu and Manning, Christopher D. (2012):
Probabilistic Finite State Machines for Regression-based MT Evaluation, Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
@InProceedings{wang-manning:2012:EMNLP-CoNLL,
author = {Wang, Mengqiu and Manning, Christopher D.},
title = {Probabilistic Finite State Machines for Regression-based {MT} Evaluation},
booktitle = {Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning},
month = {July},
address = {Jeju Island, Korea},
publisher = {Association for Computational Linguistics},
pages = {984--994},
url = {
http://www.aclweb.org/anthology/D12-1090},
year = 2012
}
Wang and Manning (2012)
Fishel, Mark and Sennrich, Rico and Popović, Maja and Bojar, Ond"‰ˆÃ´ej (2012):
TerrorCat: a Translation Error Categorization-based MT Quality Metric, Proceedings of the Seventh Workshop on Statistical Machine Translation
@InProceedings{fishel-EtAl:2012:WMT,
author = {Fishel, Mark and Sennrich, Rico and Popovi\'{c}, Maja and Bojar, Ond"‰ˆÃ´ej},
title = {TerrorCat: a Translation Error Categorization-based {MT} Quality Metric},
booktitle = {Proceedings of the Seventh Workshop on Statistical Machine Translation},
month = {June},
address = {Montreal, Canada},
publisher = {Association for Computational Linguistics},
pages = {61--67},
url = {
http://www.aclweb.org/anthology/W12-3105},
year = 2012
}
Fishel et al. (2012)
Wong, Billy and Kit, Chunyu (2010):
The Parameter-Optimized ATEC Metric for MT Evaluation, Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
@InProceedings{wong-kit:2010:WMT,
author = {Wong, Billy and Kit, Chunyu},
title = {The Parameter-Optimized ATEC Metric for {MT} Evaluation},
booktitle = {Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR},
month = {July},
address = {Uppsala, Sweden},
publisher = {Association for Computational Linguistics},
pages = {360--364},
url = {
http://www.aclweb.org/anthology/W10-1755},
year = 2010
}
Wong and Kit (2010)
Dahlmeier, Daniel and Liu, Chang and Ng, Hwee Tou (2011):
TESLA at WMT 2011: Translation Evaluation and Tunable Metric, Proceedings of the Sixth Workshop on Statistical Machine Translation
@InProceedings{dahlmeier-liu-ng:2011:WMT,
author = {Dahlmeier, Daniel and Liu, Chang and Ng, Hwee Tou},
title = {TESLA at WMT 2011: Translation Evaluation and Tunable Metric},
booktitle = {Proceedings of the Sixth Workshop on Statistical Machine Translation},
month = {July},
address = {Edinburgh, Scotland},
publisher = {Association for Computational Linguistics},
pages = {78--84},
url = {
http://www.aclweb.org/anthology/W11-2106},
year = 2011
}
Dahlmeier et al. (2011)
Song, Xingyi and Cohn, Trevor (2011):
Regression and Ranking based Optimisation for Sentence Level MT Evaluation, Proceedings of the Sixth Workshop on Statistical Machine Translation
@InProceedings{song-cohn:2011:WMT,
author = {Song, Xingyi and Cohn, Trevor},
title = {Regression and Ranking based Optimisation for Sentence Level {MT} Evaluation},
booktitle = {Proceedings of the Sixth Workshop on Statistical Machine Translation},
month = {July},
address = {Edinburgh, Scotland},
publisher = {Association for Computational Linguistics},
pages = {123--129},
url = {
http://www.aclweb.org/anthology/W11-2113},
year = 2011
}
Song and Cohn (2011)
Joshua S. Albrecht and Rebecca Hwa (2008):
Regression for machine translation evaluation at the sentence level, Machine Translation
@article{MTJ:2008:Albrecht,
author = {Joshua S. Albrecht and Rebecca Hwa},
title = {Regression for machine translation evaluation at the sentence level},
url = {
http://ccc.inaoep.mx/~villasen/bib/Regression%20for%20machine%20translation%20evaluation.pdf},
googlescholar = {7541999752936339681},
pages = {1--27},
journal = {Machine Translation},
volume = {22},
number = {1--2},
month = {March},
year = 2008
}
Albrecht and Hwa (2008)
Lita, Lucian and Rogati, Monica and Lavie, Alon (2005):
BLANC: Learning Evaluation Metrics for MT, Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing
@InProceedings{lita-rogati-lavie:2005:HLTEMNLP,
author = {Lita, Lucian and Rogati, Monica and Lavie, Alon},
title = {{BLANC}: Learning Evaluation Metrics for {MT}},
booktitle = {Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing},
month = {October},
address = {Vancouver, British Columbia, Canada},
publisher = {Association for Computational Linguistics},
pages = {740--747},
url = {
http://www.aclweb.org/anthology/H/H05/H05-1093},
year = 2005
}
Lita et al. (2005)
Albrecht, Joshua and Hwa, Rebecca (2007):
Regression for Sentence-Level MT Evaluation with Pseudo References, Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
@InProceedings{albrecht-hwa:2007:ACLMain1,
author = {Albrecht, Joshua and Hwa, Rebecca},
title = {Regression for Sentence-Level {MT} Evaluation with Pseudo References},
booktitle = {Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics},
month = {June},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {296--303},
url = {
http://www.aclweb.org/anthology/P/P07/P07-1038},
year = 2007
}
Albrecht and Hwa (2007)