Trained Metrics

If we define the quality of a metric by correlation to human judgment, then it is possible to train metrics to optimize this correlation.

Trained Metrics is the main subject of 25 publications. 9 are discussed here.

Topics in Evaluation

Publications

Albrecht, Joshua and Hwa, Rebecca (2007): A Re-examination of Machine Learning Approaches for Sentence-Level MT Evaluation, Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

Albrecht and Hwa (2007) argue for the general advantages of learning evaluation metrics from a large number of features, although

Sun, Shuqi and Chen, Yin and Li, Jufeng (2008): A Re-examination on Features in Regression Based Approach to Automatic MT Evaluation, Proceedings of the ACL-08: HLT Student Research Workshop

Sun et al. (2008) point out that carefully designed features may be more important.

Douglas A. Jones and Gregory M. Rusk (2000): Toward a Scoring Function for Quality-Driven Machine Translation, Proceedings of the International Conference on Computational Linguistics (COLING)

Jones and Rusk (2000) propose a method that learns automatically to distinguish human translations from machine translations. Since in practice the purpose of evaluation is to distinguish good translations from bad translations, it may be beneficial to view evaluation as a ranking task

Ye, Yang and Zhou, Ming and Lin, Chin-Yew (2007): Sentence Level Machine Translation Evaluation as a Ranking, Proceedings of the Second Workshop on Statistical Machine Translation

(Ye et al., 2007;

Duh, Kevin (2008): Ranking vs. Regression in Machine Translation Evaluation, Proceedings of the Third Workshop on Statistical Machine Translation

Duh, 2008).

Lin, Chin-Yew and Och, Franz Josef (2004): ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , Proceedings of Coling 2004

Lin and Och (2004) propose a metric for the evaluation of evaluation metrics, which does not require human judgment data for correlation. The metric is based on the rank given to a reference translation among machine translations.

Multiple metrics may be combined uniformly

Giménez, Jesús and Màrquez, Lluís (2008): A Smorgasbord of Features for Automatic MT Evaluation, Proceedings of the Third Workshop on Statistical Machine Translation

(Giménez and Màrquez, 2008), or by adding metrics greedily until no improvement is seen

Jesús Giménez and Lluís Màrquez (2008): Heterogeneous Automatic MT Evaluation Through Non-Parametric Metric Combinations, Proceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP)

(Giménez and Màrquez, 2008).

Denkowski, Michael and Lavie, Alon (2010): Extending the METEOR Machine Translation Evaluation Metric to the Phrase Level, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

(Denkowski and Lavie, 2010) add a number of parameters to the n-gram based METEOR metric and extend it into a trainable metric.

Benchmarks

Discussion

New Publications

Stanojević, Miloš and Sima'an, Khalil (2017): Alternative Objective Functions for Training MT Evaluation Metrics, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
add
@InProceedings{stanojevic-simaan:2017:Short,
author = {Stanojevi\'{c}, Milo\v{s} and Sima'an, Khalil},
title = {Alternative Objective Functions for Training {MT} Evaluation Metrics},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
month = {July},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
pages = {20--25},
url = {http://aclweb.org/anthology/P17-2004},
year = 2017
}
Stanojević and Sima'an (2017)
Gupta, Rohit and Orasan, Constantin and van Genabith, Josef (2015): ReVal: A Simple and Effective Machine Translation Evaluation Metric Based on Recurrent Neural Networks, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing mentioned in Trained Metrics and Neural Components In Statistical Machine Translation
add
@InProceedings{gupta-orasan-vangenabith:2015:EMNLP,
author = {Gupta, Rohit and Orasan, Constantin and van Genabith, Josef},
title = {ReVal: A Simple and Effective Machine Translation Evaluation Metric Based on Recurrent Neural Networks},
booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing},
month = {September},
address = {Lisbon, Portugal},
publisher = {Association for Computational Linguistics},
pages = {1066--1072},
url = {http://aclweb.org/anthology/D15-1124},
year = 2015
}
Gupta et al. (2015)
Guzmán, Francisco and Joty, Shafiq and Màrquez, Lluís and Moschitti, Alessandro and Nakov, Preslav and Nicosia, Massimo (2014): Learning to Differentiate Better from Worse Translations, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
add
@InProceedings{guzman-EtAl:2014:EMNLP2014,
author = {Guzm\'{a}n, Francisco and Joty, Shafiq and M\`{a}rquez, Llu\'{i}s and Moschitti, Alessandro and Nakov, Preslav and Nicosia, Massimo},
title = {Learning to Differentiate Better from Worse Translations},
booktitle = {Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
month = {October},
address = {Doha, Qatar},
publisher = {Association for Computational Linguistics},
pages = {214--220},
url = {http://www.aclweb.org/anthology/D14-1027},
year = 2014
}
Guzmán et al. (2014)
Gonzàlez, Meritxell and Barrón-Cedeño, Alberto and Màrquez, Lluís (2014): IPA and STOUT: Leveraging Linguistic and Source-based Features for Machine Translation Evaluation, Proceedings of the Ninth Workshop on Statistical Machine Translation
add
@InProceedings{gonzalez-barroncedeno-marquez:2014:W14-33,
author = {Gonz\`{a}lez, Meritxell and Barr\'{o}n-Cede\~{n}o, Alberto and M\`{a}rquez, Llu\'{i}s},
title = {IPA and STOUT: Leveraging Linguistic and Source-based Features for Machine Translation Evaluation},
booktitle = {Proceedings of the Ninth Workshop on Statistical Machine Translation},
month = {June},
address = {Baltimore, Maryland, USA},
publisher = {Association for Computational Linguistics},
pages = {394--401},
url = {http://www.aclweb.org/anthology/W14-3351},
year = 2014
}
Gonzàlez et al. (2014)
Stanojevic, Milos and Sima'an, Khalil (2014): BEER: BEtter Evaluation as Ranking, Proceedings of the Ninth Workshop on Statistical Machine Translation mentioned in Evaluation and Trained Metrics
add
@InProceedings{stanojevic-simaan:2014:W14-33,
author = {Stanojevic, Milos and Sima'an, Khalil},
title = {BEER: BEtter Evaluation as Ranking},
booktitle = {Proceedings of the Ninth Workshop on Statistical Machine Translation},
month = {June},
address = {Baltimore, Maryland, USA},
publisher = {Association for Computational Linguistics},
pages = {414--419},
url = {http://www.aclweb.org/anthology/W14-3354},
year = 2014
}
Stanojevic and Sima'an (2014)
Stanojević, Miloš and Sima'an, Khalil (2014): Fitting Sentence Level Translation Evaluation with Many Dense Features, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) mentioned in Evaluation and Trained Metrics
add
@InProceedings{stanojevic-simaan:2014:EMNLP2014,
author = {Stanojevi\'{c}, Milo\v{s} and Sima'an, Khalil},
title = {Fitting Sentence Level Translation Evaluation with Many Dense Features},
booktitle = {Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
month = {October},
address = {Doha, Qatar},
publisher = {Association for Computational Linguistics},
pages = {202--206},
url = {http://www.aclweb.org/anthology/D14-1025},
year = 2014
}
Stanojević and Sima'an (2014)
Lucia Specia and Kashif Shah (2014): Predicting human translation quality, Proceedings of the Eleventh Conference of the Association for Machine Translation in the Americas (AMTA)
add
@inproceedings{AMTA-2014-Specia,
author = {Lucia Specia and Kashif Shah},
title = {Predicting human translation quality},
pages = {288-300},
url = {http://www.mt-archive.info/10/AMTA-2014-Specia.pdf},
volume = {1},
booktitle = {Proceedings of the Eleventh Conference of the Association for Machine Translation in the Americas (AMTA)},
location = {Vancouver, BC, Canada},
year = 2014
}
Specia and Shah (2014)
Han, Aaron Li-Feng and Wong, Derek F. and Chao, Lidia S. and Lu, Yi and He, Liangye and Wang, Yiming and Zhou, Jiaji (2013): A Description of Tunable Machine Translation Evaluation Systems in WMT13 Metrics Task, Proceedings of the Eighth Workshop on Statistical Machine Translation
add
@InProceedings{han-EtAl:2013:WMT2,
author = {Han, Aaron Li-Feng and Wong, Derek F. and Chao, Lidia S. and Lu, Yi and He, Liangye and Wang, Yiming and Zhou, Jiaji},
title = {A Description of Tunable Machine Translation Evaluation Systems in {WMT}13 Metrics Task},
booktitle = {Proceedings of the Eighth Workshop on Statistical Machine Translation},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {414--421},
url = {http://www.aclweb.org/anthology/W13-2253},
year = 2013
}
Han et al. (2013)
Wang, Mengqiu and Manning, Christopher D. (2012): Probabilistic Finite State Machines for Regression-based MT Evaluation, Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
add
@InProceedings{wang-manning:2012:EMNLP-CoNLL,
author = {Wang, Mengqiu and Manning, Christopher D.},
title = {Probabilistic Finite State Machines for Regression-based {MT} Evaluation},
booktitle = {Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning},
month = {July},
address = {Jeju Island, Korea},
publisher = {Association for Computational Linguistics},
pages = {984--994},
url = {http://www.aclweb.org/anthology/D12-1090},
year = 2012
}
Wang and Manning (2012)
Fishel, Mark and Sennrich, Rico and Popović, Maja and Bojar, Ond"‰ˆÃ´ej (2012): TerrorCat: a Translation Error Categorization-based MT Quality Metric, Proceedings of the Seventh Workshop on Statistical Machine Translation
add
@InProceedings{fishel-EtAl:2012:WMT,
author = {Fishel, Mark and Sennrich, Rico and Popovi\'{c}, Maja and Bojar, Ond"‰ˆÃ´ej},
title = {TerrorCat: a Translation Error Categorization-based {MT} Quality Metric},
booktitle = {Proceedings of the Seventh Workshop on Statistical Machine Translation},
month = {June},
address = {Montreal, Canada},
publisher = {Association for Computational Linguistics},
pages = {61--67},
url = {http://www.aclweb.org/anthology/W12-3105},
year = 2012
}
Fishel et al. (2012)
Wong, Billy and Kit, Chunyu (2010): The Parameter-Optimized ATEC Metric for MT Evaluation, Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
add
@InProceedings{wong-kit:2010:WMT,
author = {Wong, Billy and Kit, Chunyu},
title = {The Parameter-Optimized ATEC Metric for {MT} Evaluation},
booktitle = {Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR},
month = {July},
address = {Uppsala, Sweden},
publisher = {Association for Computational Linguistics},
pages = {360--364},
url = {http://www.aclweb.org/anthology/W10-1755},
year = 2010
}
Wong and Kit (2010)
Dahlmeier, Daniel and Liu, Chang and Ng, Hwee Tou (2011): TESLA at WMT 2011: Translation Evaluation and Tunable Metric, Proceedings of the Sixth Workshop on Statistical Machine Translation
add
@InProceedings{dahlmeier-liu-ng:2011:WMT,
author = {Dahlmeier, Daniel and Liu, Chang and Ng, Hwee Tou},
title = {TESLA at WMT 2011: Translation Evaluation and Tunable Metric},
booktitle = {Proceedings of the Sixth Workshop on Statistical Machine Translation},
month = {July},
address = {Edinburgh, Scotland},
publisher = {Association for Computational Linguistics},
pages = {78--84},
url = {http://www.aclweb.org/anthology/W11-2106},
year = 2011
}
Dahlmeier et al. (2011)
Song, Xingyi and Cohn, Trevor (2011): Regression and Ranking based Optimisation for Sentence Level MT Evaluation, Proceedings of the Sixth Workshop on Statistical Machine Translation
add
@InProceedings{song-cohn:2011:WMT,
author = {Song, Xingyi and Cohn, Trevor},
title = {Regression and Ranking based Optimisation for Sentence Level {MT} Evaluation},
booktitle = {Proceedings of the Sixth Workshop on Statistical Machine Translation},
month = {July},
address = {Edinburgh, Scotland},
publisher = {Association for Computational Linguistics},
pages = {123--129},
url = {http://www.aclweb.org/anthology/W11-2113},
year = 2011
}
Song and Cohn (2011)
Joshua S. Albrecht and Rebecca Hwa (2008): Regression for machine translation evaluation at the sentence level, Machine Translation
add
@article{MTJ:2008:Albrecht,
author = {Joshua S. Albrecht and Rebecca Hwa},
title = {Regression for machine translation evaluation at the sentence level},
url = {http://ccc.inaoep.mx/~villasen/bib/Regression%20for%20machine%20translation%20evaluation.pdf},
googlescholar = {7541999752936339681},
pages = {1--27},
journal = {Machine Translation},
volume = {22},
number = {1--2},
month = {March},
year = 2008
}
Albrecht and Hwa (2008)
Lita, Lucian and Rogati, Monica and Lavie, Alon (2005): BLANC: Learning Evaluation Metrics for MT, Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{lita-rogati-lavie:2005:HLTEMNLP,
author = {Lita, Lucian and Rogati, Monica and Lavie, Alon},
title = {{BLANC}: Learning Evaluation Metrics for {MT}},
booktitle = {Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing},
month = {October},
address = {Vancouver, British Columbia, Canada},
publisher = {Association for Computational Linguistics},
pages = {740--747},
url = {http://www.aclweb.org/anthology/H/H05/H05-1093},
year = 2005
}
Lita et al. (2005)
Albrecht, Joshua and Hwa, Rebecca (2007): Regression for Sentence-Level MT Evaluation with Pseudo References, Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
add
@InProceedings{albrecht-hwa:2007:ACLMain1,
author = {Albrecht, Joshua and Hwa, Rebecca},
title = {Regression for Sentence-Level {MT} Evaluation with Pseudo References},
booktitle = {Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics},
month = {June},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {296--303},
url = {http://www.aclweb.org/anthology/P/P07/P07-1038},
year = 2007
}
Albrecht and Hwa (2007)

MT Research Survey Wiki

A Comprehensive Survey of Neural and Statistical Machine Translation Research Publications

Search Descriptions

Trained Metrics

Publications

Benchmarks

Discussion

Related Topics

New Publications