Manual Metrics
The most intuitively trustworthy evaluation of machine translation systems is to ask human judges. However, what to ask them is an open research question.
Manual Metrics is the main subject of 45 publications. 15 are discussed here.
Publications
M. King and Andrei Popescu-Belis and Eduard Hovy (2003):
FEMTI: Creating and Using a Framework for MT Evaluation, Proceedings of the MT Summit IX
@inproceedings{King:2003,
author = { M. King and Andrei Popescu-Belis and Eduard Hovy},
title = { {FEMTI}: Creating and Using a Framework for {MT} Evaluation},
url = {
http://www-rohan.sdsu.edu/~gawron/mt\_plus/readings/evaluation/FEMTI-King-final.pdf},
booktitle = {Proceedings of the {MT} Summit IX},
year = 2003
}
King et al. (2003) present a large range of evaluation metrics for machine translation systems that go well beyond the translation quality measures who devoted the bulk of this chapter to.
Keith J. Miller and Michelle Vanni (2005):
Inter-rater Agreement Measures, and the Refinement of Metrics in the PLATO MT Evaluation Paradigm, Proceedings of the Tenth Machine Translation Summit (MT Summit X)
@InProceedings{Miller:2005:MTS,
author = {Keith J. Miller and Michelle Vanni},
title = {Inter-rater Agreement Measures, and the Refinement of Metrics in the {PLATO} {MT} Evaluation Paradigm},
url = {
http://www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA456393},
googlescholar = {4922588981188199265},
booktitle = {Proceedings of the Tenth Machine Translation Summit (MT Summit X)},
month = {September},
address = {Phuket, Thailand},
year = 2005
}
Miller and Vanni (2005) propose
clarity and
coherence as manual metrics.
Florence Reeder (2004):
Investigation of intelligibility judgments, Proceedings of the 6th Conference of the Association for Machine Translation in the Americas (AMTA 2004)
@inproceedings{reeder:2004:AMTA,
author = {Florence Reeder},
title = {Investigation of intelligibility judgments},
booktitle = {Proceedings of the 6th Conference of the Association for Machine Translation in the Americas (AMTA 2004)},
pages = {227--235},
year = 2004
}
Reeder (2004) shows the correlation between fluency and the number of words it takes to distinguish between human and machine translations.
Grading standards for essays from foreign language learners may be used for machine translation evaluation. Using these standards reveals that machine translation has trouble with basic levels, but scores relatively high on advanced categories
Florence Reeder (2006):
Direct Application of a Language Learner Test to MT Evaluation, 5th Conference of the Association for Machine Translation in the Americas (AMTA)
@InProceedings{Reeder:2006:AMTA,
author = {Florence Reeder},
title = {Direct Application of a Language Learner Test to {MT} Evaluation},
url = {
http://people.csail.mit.edu/imcgraw/links/research/pubs/ama2006/papers/019.pdf},
googlescholar = {3458151710551718284},
booktitle = {5th Conference of the Association for Machine Translation in the Americas (AMTA)},
month = {August},
address = {Boston, Massachusetts},
year = 2006
}
(Reeder, 2006). A manual metric that can be automated is to ask for specific translation errors — the questions may be based on past errors
Uchimoto, Kiyotaka and Kotani, Katsunori and Zhang, Yujie and Isahara, Hitoshi (2007):
Automatic Evaluation of Machine Translation Based on Rate of Accomplishment of Sub-Goals, Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference
@InProceedings{uchimoto-EtAl:2007:main,
author = {Uchimoto, Kiyotaka and Kotani, Katsunori and Zhang, Yujie and Isahara, Hitoshi},
title = {Automatic Evaluation of Machine Translation Based on Rate of Accomplishment of Sub-Goals},
booktitle = {Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference},
month = {April},
address = {Rochester, New York},
publisher = {Association for Computational Linguistics},
pages = {33--40},
url = {
http://www.aclweb.org/anthology/N/N07/N07-1005},
year = 2007
}
(Uchimoto et al., 2007).
Vilar, David and Leusch, Gregor and Ney, Hermann and Banchs, Rafael E. (2007):
Human Evaluation of Machine Translation Through Binary System Comparisons, Proceedings of the Second Workshop on Statistical Machine Translation
@InProceedings{vilar-EtAl:2007:WMT,
author = {Vilar, David and Leusch, Gregor and Ney, Hermann and Banchs, Rafael E.},
title = {Human Evaluation of Machine Translation Through Binary System Comparisons},
booktitle = {Proceedings of the Second Workshop on Statistical Machine Translation},
month = {June},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {96--103},
url = {
http://www.aclweb.org/anthology/W/W07/W07-0213},
year = 2007
}
Vilar et al. (2007) argue for pairwise system comparisons as metric, which leads to higher inter and intra annotator agreement
Callison-Burch, Chris and Fordyce, Cameron Shaw and Koehn, Philipp and Monz, Christof and Schroeder, Josh (2007):
(Meta-) Evaluation of Machine Translation, Proceedings of the Second Workshop on Statistical Machine Translation
mentioned in Evaluation Campaigns and Manual Metrics@InProceedings{callisonburch-EtAl:2007:WMT,
author = {Callison-Burch, Chris and Fordyce, Cameron Shaw and Koehn, Philipp and Monz, Christof and Schroeder, Josh},
title = {({M}eta-) Evaluation of Machine Translation},
booktitle = {Proceedings of the Second Workshop on Statistical Machine Translation},
month = {June},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {136--158},
url = {
http://www.aclweb.org/anthology/W/W07/W07-0218},
year = 2007
}
(Callison-Burch et al., 2007).
Bojar, Ondřej and Ercegovčević, Miloš and Popel, Martin and Zaidan, Omar (2011):
A Grain of Salt for the WMT Manual Evaluation, Proceedings of the Sixth Workshop on Statistical Machine Translation
@InProceedings{bojar-EtAl:2011:WMT,
author = {Bojar, Ondřej and Ercegov\v{c}evi\'{c}, Milo\v{s} and Popel, Martin and Zaidan, Omar},
title = {A Grain of Salt for the WMT Manual Evaluation},
booktitle = {Proceedings of the Sixth Workshop on Statistical Machine Translation},
month = {July},
address = {Edinburgh, Scotland},
publisher = {Association for Computational Linguistics},
pages = {1--11},
url = {
http://www.aclweb.org/anthology/W11-2101},
year = 2011
}
Bojar et al. (2011) presents a critique of current manual evaluation practice in the WMT campaign, such as handling of ties and bias of annotators.
Lopez, Adam (2012):
Putting Human Assessments of Machine Translation Systems in Order, Proceedings of the Seventh Workshop on Statistical Machine Translation
@InProceedings{lopez:2012:WMT,
author = {Lopez, Adam},
title = {Putting Human Assessments of Machine Translation Systems in Order},
booktitle = {Proceedings of the Seventh Workshop on Statistical Machine Translation},
month = {June},
address = {Montreal, Canada},
publisher = {Association for Computational Linguistics},
pages = {1--9},
url = {
http://www.aclweb.org/anthology/W12-3101},
year = 2012
}
Lopez (2012) points out inconsistencies in rankings produced by these evaluation campaigns.
Philipp Koehn (2012):
Simulating human judgment in machine translation evaluation campaigns, Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT)
@inproceedings{iwslt12:Koehn,
author = {Philipp Koehn},
title = {Simulating human judgment in machine translation evaluation campaigns},
url = {
http://www.mt-archive.info/IWSLT-2012-Koehn.pdf},
pages = {179-184},
booktitle = {Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT)},
location = {Hong Kong},
year = 2012
}
Koehn (2012) proposes a model that allows the simulations of these ranking evaluations and gives recommendations about the number of manual judgments needed to detect statistically significant differences.
One goal of manual assessment is to get better insight into the types of errors systems make.
David Vilar and Jia Xu and Luis Fernando D'Haro and Hermann Ney (2006):
Error Analysis of Statistical Machine Translation Output, Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 06)
@inproceedings{vilar-lrec06,
author = {David Vilar and Jia Xu and Luis Fernando D'Haro and Hermann Ney},
title = {Error Analysis of Statistical Machine Translation Output},
booktitle = {Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 06)},
pages = {697--702},
url = {
http://mt-archive.info/LREC-2006-Vilar.pdf},
year = 2006
}
Vilar et al. (2006) proposes a taxonomy of error types, such as: unknown word, incorrect word form or long range word order.
Popovic, Maja and de Gispert, Adrià and Gupta, Deepa and Lambert, Patrik and Ney, Hermann and Mariño, José B. and Federico, Marcello and Banchs, Rafael E. (2006):
Morpho-syntactic Information for Automatic Error Analysis of Statistical Machine Translation Output, Proceedings on the Workshop on Statistical Machine Translation
@InProceedings{popovic-EtAl:2006:WMT,
author = {Popovic, Maja and de Gispert, Adri\`{a} and Gupta, Deepa and Lambert, Patrik and Ney, Hermann and Mari{\~n}o, Jos\'{e} B. and Federico, Marcello and Banchs, Rafael E.},
title = {Morpho-syntactic Information for Automatic Error Analysis of Statistical Machine Translation Output},
booktitle = {Proceedings on the Workshop on Statistical Machine Translation},
month = {June},
address = {New York City},
publisher = {Association for Computational Linguistics},
pages = {1--6},
url = {
http://www.aclweb.org/anthology/W/W06/W06-3101},
year = 2006
}
Popovic et al. (2006);
Popovic, Maja and Ney, Hermann (2007):
Word Error Rates: Decomposition over POS classes and Applications for Error Analysis, Proceedings of the Second Workshop on Statistical Machine Translation
@InProceedings{popovic-ney:2007:WMT,
author = {Popovic, Maja and Ney, Hermann},
title = {Word Error Rates: Decomposition over {POS} classes and Applications for Error Analysis},
booktitle = {Proceedings of the Second Workshop on Statistical Machine Translation},
month = {June},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {48--55},
url = {
http://www.aclweb.org/anthology/W/W07/W07-0207},
year = 2007
}
Popovic and Ney (2007) introduce automatic metrics that correspond to some of these error categories.
Maja Popović and Aljoscha Burchardt (2011):
From Human to Automatic Error Classification for Machine Translation Output, Proceedings of the 15th International Conference of the European Association for Machine Translation (EAMT)
@inproceedings{eamt11:Popovic,
author = {Maja Popovi\'{c} and Aljoscha Burchardt},
title = {From Human to Automatic Error Classification for Machine Translation Output},
pages = {265--272},
booktitle = {Proceedings of the 15th International Conference of the European Association for Machine Translation (EAMT)},
location = {Leuven, Belgium},
editor = {Mikel L. Forcada and Heidi Depraetere and Vincent Vandeghinste},
url = {
http://mt-archive.info/EAMT-2011-Popovic.pdf},
year = 2011
}
Popović and Burchardt (2011) refine their automatic analytical metrics to assess word order, morphology, deletion and insertion errors, and compare them against human judgments on these error categories.
Benchmarks
Discussion
Related Topics
New Publications
Ma, Qingsong and Graham, Yvette and Baldwin, Timothy and Liu, Qun (2017):
Further Investigation into Reference Bias in Monolingual Evaluation of Machine Translation, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
@InProceedings{D17-1261,
author = {Ma, Qingsong and Graham, Yvette and Baldwin, Timothy and Liu, Qun},
title = {Further Investigation into Reference Bias in Monolingual Evaluation of Machine Translation},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},
publisher = {Association for Computational Linguistics},
pages = {2466--2475},
location = {Copenhagen, Denmark},
url = {
http://aclweb.org/anthology/D17-1261},
year = 2017
}
Ma et al. (2017)
Isabelle, Pierre and Cherry, Colin and Foster, George (2017):
A Challenge Set Approach to Evaluating Machine Translation, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
mentioned in Manual Metrics and Analysis And Visualization@InProceedings{D17-1262,
author = {Isabelle, Pierre and Cherry, Colin and Foster, George},
title = {A Challenge Set Approach to Evaluating Machine Translation},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},
publisher = {Association for Computational Linguistics},
pages = {2476--2486},
location = {Copenhagen, Denmark},
url = {
http://aclweb.org/anthology/D17-1262},
year = 2017
}
Isabelle et al. (2017)
Arle Lommel and Aljoscha Burchardt and Maja Popovi\'c and Kim Harris and Eleftherios Avramidis and Hans Uszkoreit (2014):
Using a new analytic measure for the annotation and analysis of MT errors on real data, Proceedings of 17th Annual conference of the European Association for Machine Translation
@inproceedings{eamt-2014-Lommel,
author = {Arle Lommel and Aljoscha Burchardt and Maja Popovi{\'c} and Kim Harris and Eleftherios Avramidis and Hans Uszkoreit},
title = {Using a new analytic measure for the annotation and analysis of {MT} errors on real data},
booktitle = {Proceedings of 17th Annual conference of the European Association for Machine Translation},
pages = {165-172},
url = {
http://www.mt-archive.info/10/EAMT-2014-Lommel.pdf},
location = {Dubrovnik, Croatia},
year = 2014
}
Lommel et al. (2014)
Graham, Yvette and Baldwin, Timothy and Moffat, Alistair and Zobel, Justin (2013):
Continuous Measurement Scales in Human Evaluation of Machine Translation, Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse
@InProceedings{graham-EtAl:2013:LAW7-ID,
author = {Graham, Yvette and Baldwin, Timothy and Moffat, Alistair and Zobel, Justin},
title = {Continuous Measurement Scales in Human Evaluation of Machine Translation},
booktitle = {Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {33--41},
url = {
http://www.aclweb.org/anthology/W13-2305},
year = 2013
}
Graham et al. (2013)
Guzmán, Francisco and Abdelali, Ahmed and Temnikova, Irina and Sajjad, Hassan and Vogel, Stephan (2015):
How do Humans Evaluate Machine Translation, Proceedings of the Tenth Workshop on Statistical Machine Translation
@InProceedings{guzman-EtAl:2015:WMT,
author = {Guzm\'{a}n, Francisco and Abdelali, Ahmed and Temnikova, Irina and Sajjad, Hassan and Vogel, Stephan},
title = {How do Humans Evaluate Machine Translation},
booktitle = {Proceedings of the Tenth Workshop on Statistical Machine Translation},
month = {September},
address = {Lisbon, Portugal},
publisher = {Association for Computational Linguistics},
pages = {457--466},
url = {
http://aclweb.org/anthology/W15-3059},
year = 2015
}
Guzmán et al. (2015)
Matouš Macháček and Ondřej Bojar (2015):
Evaluating Machine Translation Quality Using Short Segments Annotations, The Prague Bulletin of Mathematical Linguistics
@article{pbml-103-Machacek,
author = {Matouš Macháček and Ondřej Bojar},
title = {Evaluating Machine Translation Quality Using Short Segments Annotations},
pages = {85--110},
journal = {The Prague Bulletin of Mathematical Linguistics},
url = {
http://ufal.mff.cuni.cz/pbml/103/art-machacek-bojar.pdf},
volume = {103},
month = {April},
year = 2015
}
Macháček and Bojar (2015)
Ângela Costa and Wang Ling and Tiago Luís and Rui Correia and Luísa Coheur (2015):
A linguistically motivated taxonomy for Machine Translation error analysis, Machine Translation
@article{MTJ:2015:Costa,
author = {Ângela Costa and Wang Ling and Tiago Luís and Rui Correia and Luísa Coheur},
title = {A linguistically motivated taxonomy for Machine Translation error analysis},
pages = {127--161},
journal = {Machine Translation},
volume = {29},
number = {2},
month = {June},
year = 2015
}
Costa et al. (2015)
Ondřej Klejch and Eleftherios Avramidis and Aljoscha Burchardt and Martin Popel (2015):
MT-ComparEval: Graphical evaluation interface for Machine Translation development, The Prague Bulletin of Mathematical Linguistics
@article{pbml-104-Klejch,
author = {Ondřej Klejch and Eleftherios Avramidis and Aljoscha Burchardt and Martin Popel},
title = {MT-ComparEval: Graphical evaluation interface for Machine Translation development},
pages = {63--74},
journal = {The Prague Bulletin of Mathematical Linguistics},
url = {
http://ufal.mff.cuni.cz/pbml/104/art-klejch-et-al.pdf},
volume = {104},
month = {October},
year = 2015
}
Klejch et al. (2015)
Birch, Alexandra and Abend, Omri and Bojar, Ondřej and Haddow, Barry (2016):
HUME: Human UCCA-Based Evaluation of Machine Translation, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
@InProceedings{birch-EtAl:2016:EMNLP2016,
author = {Birch, Alexandra and Abend, Omri and Bojar, Ond\v{r}ej and Haddow, Barry},
title = {HUME: Human UCCA-Based Evaluation of Machine Translation},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {1264--1274},
url = {
https://aclweb.org/anthology/D16-1134},
year = 2016
}
Birch et al. (2016)
Abdelali, Ahmed and Durrani, Nadir and Guzmán, Francisco (2016):
iAppraise: A Manual Machine Translation Evaluation Environment Supporting Eye-tracking, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations
@InProceedings{abdelali-durrani-guzman:2016:N16-3,
author = {Abdelali, Ahmed and Durrani, Nadir and Guzm\'{a}n, Francisco},
title = {iAppraise: A Manual Machine Translation Evaluation Environment Supporting Eye-tracking},
booktitle = {Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations},
month = {June},
address = {San Diego, California},
publisher = {Association for Computational Linguistics},
pages = {17--21},
url = {
http://www.aclweb.org/anthology/N16-3004},
year = 2016
}
Abdelali et al. (2016)
Otani, Naoki and Nakazawa, Toshiaki and Kawahara, Daisuke and Kurohashi, Sadao (2016):
IRT-based Aggregation Model of Crowdsourced Pairwise Comparison for Evaluating Machine Translations, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
@InProceedings{otani-EtAl:2016:EMNLP2016,
author = {Otani, Naoki and Nakazawa, Toshiaki and Kawahara, Daisuke and Kurohashi, Sadao},
title = {IRT-based Aggregation Model of Crowdsourced Pairwise Comparison for Evaluating Machine Translations},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
month = {November},
address = {Austin, Texas},
publisher = {Association for Computational Linguistics},
pages = {511--520},
url = {
https://aclweb.org/anthology/D16-1049},
year = 2016
}
Otani et al. (2016)
Graham, Yvette and Ma, Qingsong and Baldwin, Timothy and Liu, Qun and Parra, Carla and Scarton, Carolina (2017):
Improving Evaluation of Document-level Machine Translation Quality Estimation, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
@InProceedings{graham-EtAl:2017:EACLshort,
author = {Graham, Yvette and Ma, Qingsong and Baldwin, Timothy and Liu, Qun and Parra, Carla and Scarton, Carolina},
title = {Improving Evaluation of Document-level Machine Translation Quality Estimation},
booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers},
month = {April},
address = {Valencia, Spain},
publisher = {Association for Computational Linguistics},
pages = {356--361},
url = {
http://www.aclweb.org/anthology/E17-2057},
year = 2017
}
Graham et al. (2017)
Fomicheva, Marina and Specia, Lucia (2016):
Reference Bias in Monolingual Machine Translation Evaluation, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
@InProceedings{fomicheva-specia:2016:P16-2,
author = {Fomicheva, Marina and Specia, Lucia},
title = {Reference Bias in Monolingual Machine Translation Evaluation},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {77--82},
url = {
http://anthology.aclweb.org/P16-2013},
year = 2016
}
Fomicheva and Specia (2016)
Herrmann, Teresa and Niehues, Jan and Waibel, Alex (2014):
Manual Analysis of Structurally Informed Reordering in German-English Machine Translation, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014)
@InProceedings{L14-1462,
author = {Herrmann, Teresa and Niehues, Jan and Waibel, Alex},
title = {Manual Analysis of Structurally Informed Reordering in German-English Machine Translation},
booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014)},
publisher = {European Language Resources Association (ELRA)},
location = {Reykjavik, Iceland},
url = {
http://www.lrec-conf.org/proceedings/lrec2014/pdf/569\_Paper.pdf},
year = 2014
}
Herrmann et al. (2014)
Aranberri, Nora (2015):
SMT error analysis and mapping to syntactic, semantic and structural fixes, Proceedings of the Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation
@InProceedings{aranberri:2015:SSST-9,
author = {Aranberri, Nora},
title = {SMT error analysis and mapping to syntactic, semantic and structural fixes},
booktitle = {Proceedings of the Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation},
month = {June},
address = {Denver, Colorado, USA},
publisher = {Association for Computational Linguistics},
pages = {30--38},
url = {
http://www.aclweb.org/anthology/W15-1004},
year = 2015
}
Aranberri (2015)
Chi-kiu Lo and Dekai Wu (2013):
Human semantic MT evaluation with HMEANT for IWSLT 2013, Proceedings of the International Workshop on Spoken Language Translation (IWSLT)
@inproceedings{Lo-1:iwslt:2013,
author = {Chi-kiu Lo and Dekai Wu},
title = {Human semantic {MT} evaluation with {HMEANT} for {IWSLT} 2013},
url = {
http://www.mt-archive.info/10/IWSLT-2013-Lo-1.pdf},
booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)},
year = 2013
}
Lo and Wu (2013)
Birch, Alexandra and Haddow, Barry and Germann, Ulrich and Nadejde, Maria and Buck, Christian and Koehn, Philipp (2013):
The Feasibility of HMEANT as a Human MT Evaluation Metric, Proceedings of the Eighth Workshop on Statistical Machine Translation
@InProceedings{birch-EtAl:2013:WMT,
author = {Birch, Alexandra and Haddow, Barry and Germann, Ulrich and Nadejde, Maria and Buck, Christian and Koehn, Philipp},
title = {The Feasibility of {HMEANT} as a Human {MT} Evaluation Metric},
booktitle = {Proceedings of the Eighth Workshop on Statistical Machine Translation},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {52--61},
url = {
http://www.aclweb.org/anthology/W13-2203},
year = 2013
}
Birch et al. (2013)
Ondřej Bojar (2011):
Analyzing Error Types in English-Czech Machine Translation , The Prague Bulletin of Mathematical Linguistics
@article{pbml-95-bojar,
author = {Ond\v{r}ej Bojar},
title = {Analyzing Error Types in {English-Czech} Machine Translation },
url = {
http://ufal.mff.cuni.cz/pbml/95/art-bojar.pdf},
pages = {63--76},
journal = {The Prague Bulletin of Mathematical Linguistics},
volume = {95},
year = 2011
}
Bojar (2011)
Sakaguchi, Keisuke and Post, Matt and Van Durme, Benjamin (2014):
Efficient Elicitation of Annotations for Human Evaluation of Machine Translation, Proceedings of the Ninth Workshop on Statistical Machine Translation
@InProceedings{sakaguchi-post-vandurme:2014:W14-33,
author = {Sakaguchi, Keisuke and Post, Matt and Van Durme, Benjamin},
title = {Efficient Elicitation of Annotations for Human Evaluation of Machine Translation},
booktitle = {Proceedings of the Ninth Workshop on Statistical Machine Translation},
month = {June},
address = {Baltimore, Maryland, USA},
publisher = {Association for Computational Linguistics},
pages = {1--11},
url = {
http://www.aclweb.org/anthology/W14-3301},
year = 2014
}
Sakaguchi et al. (2014)
Bouamor, Houda and Alshikhabobakr, Hanan and Mohit, Behrang and Oflazer, Kemal (2014):
A Human Judgement Corpus and a Metric for Arabic MT Evaluation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
@InProceedings{bouamor-EtAl:2014:EMNLP2014,
author = {Bouamor, Houda and Alshikhabobakr, Hanan and Mohit, Behrang and Oflazer, Kemal},
title = {A Human Judgement Corpus and a Metric for Arabic {MT} Evaluation},
booktitle = {Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
month = {October},
address = {Doha, Qatar},
publisher = {Association for Computational Linguistics},
pages = {207--213},
url = {
http://www.aclweb.org/anthology/D14-1026},
year = 2014
}
Bouamor et al. (2014)
Toral, Antonio and Kumar Naskar, Sudip and Vreeke, Joris and Gaspari, Federico and Groves, Declan (2013):
A Web Application for the Diagnostic Evaluation of Machine Translation over Specific Linguistic Phenomena, Proceedings of the 2013 NAACL HLT Demonstration Session
@InProceedings{toral-EtAl:2013:Demos,
author = {Toral, Antonio and Kumar Naskar, Sudip and Vreeke, Joris and Gaspari, Federico and Groves, Declan},
title = {A Web Application for the Diagnostic Evaluation of Machine Translation over Specific Linguistic Phenomena},
booktitle = {Proceedings of the 2013 NAACL HLT Demonstration Session},
month = {June},
address = {Atlanta, Georgia},
publisher = {Association for Computational Linguistics},
pages = {20--23},
url = {
http://www.aclweb.org/anthology/N13-3005},
year = 2013
}
Toral et al. (2013)
Hopkins, Mark and May, Jonathan (2013):
Models of Translation Competitions, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
@InProceedings{hopkins-may:2013:ACL2013,
author = {Hopkins, Mark and May, Jonathan},
title = {Models of Translation Competitions},
booktitle = {Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {1416--1424},
url = {
http://www.aclweb.org/anthology/P13-1139},
year = 2013
}
Hopkins and May (2013)
Gonzàlez, Meritxell and Mascarell, Laura and Màrquez, Lluís (2013):
tSEARCH: Flexible and Fast Search over Automatic Translations for Improved Quality/Error Analysis, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations
@InProceedings{gonzalez-mascarell-marquez:2013:SystemDemo,
author = {Gonz\`{a}lez, Meritxell and Mascarell, Laura and M\`{a}rquez, Llu\'{i}s},
title = {tSEARCH: Flexible and Fast Search over Automatic Translations for Improved Quality/Error Analysis},
booktitle = {Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {181--186},
url = {
http://www.aclweb.org/anthology/P13-4031},
year = 2013
}
Gonzàlez et al. (2013)
Stephen Doherty and Sharon O'Brien and Michael Carl (2010):
Eye tracking as an MT evaluation technique, Machine Translation
@article{MTJ:2010:Doherty,
author = {Stephen Doherty and Sharon O'Brien and Michael Carl},
title = {Eye tracking as an {MT} evaluation technique},
url = {
http://www.cngl.ie/drupal/sites/default/files/papers3/Doherty,%20Brien,%20Carl.%202010.%20Eye%20tracking%20as%20an%20MT%20evaluation%20technique.%20Media.pdf},
googlescholar = {4920570112510821061},
pages = {1-13},
journal = {Machine Translation},
volume = {24},
number = {1},
month = {March},
year = 2010
}
Doherty et al. (2010)
Luisa Bentivogli and Marcello Federico and Giovanni Moretti and Michael Paul (2011):
Getting Expert Quality from the Crowd for Machine Translation Evaluation, Proceedings of the 13th Machine Translation Summit (MT Summit XIII)
@inproceedings{MTS-2011-Bentivogli,
author = {Luisa Bentivogli and Marcello Federico and Giovanni Moretti and Michael Paul},
title = {Getting Expert Quality from the Crowd for Machine Translation Evaluation},
url = {
http://www.mt-archive.info/MTS-2011-Bentivogli.pdf},
pages = {521-528},
booktitle = {Proceedings of the 13th Machine Translation Summit (MT Summit XIII)},
publisher = {International Association for Machine Translation},
location = {Xiamen, China},
year = 2011
}
Bentivogli et al. (2011)
M. Paul and E. Sumita and L. Bentivogli and M. Federico (2012):
Crowd-based MT Evaluation for non-English Target Languages, Proceedings of th 16th International Conference of the European Association for Machine Translation (EAMT)
@inproceedings{EAMT-2012-Paul,
author = {M. Paul and E. Sumita and L. Bentivogli and M. Federico},
title = {Crowd-based {MT} Evaluation for non-{E}nglish Target Languages},
url = {
http://www.mt-archive.info/EAMT-2012-Paul},
pages = {229-236},
booktitle = {Proceedings of th 16th International Conference of the European Association for Machine Translation (EAMT)},
location = {Trento, Italy},
editor = {Mauro Cettolo and Marcello Federico and Lucia Specia and Andy Way},
year = 2012
}
Paul et al. (2012)
Zaidan, Omar F. and Callison-Burch, Chris (2010):
Predicting Human-Targeted Translation Edit Rate via Untrained Human Annotators, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
@InProceedings{zaidan-callisonburch:2010:NAACLHLT,
author = {Zaidan, Omar F. and Callison-Burch, Chris},
title = {Predicting Human-Targeted Translation Edit Rate via Untrained Human Annotators},
booktitle = {Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics},
month = {June},
address = {Los Angeles, California},
publisher = {Association for Computational Linguistics},
pages = {369--372},
url = {
http://www.aclweb.org/anthology/N10-1057},
year = 2010
}
Zaidan and Callison-Burch (2010)
Eduard Hovy and Margaret King and Andrei Popescu-Belis (2002):
Principles of Context-Based Machine Translation Evaluation, Machine Translation
@article{MTJ:2002:Hovy,
author = {Eduard Hovy and Margaret King and Andrei Popescu-Belis},
title = {Principles of Context-Based Machine Translation Evaluation},
url = {
http://www.researchgate.net/publication/225255630\_Principles\_of\_Context-Based\_Machine\_Translation\_Evaluation/file/504635278e007d267d.pdf},
googlescholar = {10898239395361187997},
pages = {43--75},
journal = {Machine Translation},
volume = {17},
number = {1},
month = {March},
year = 2002
}
Hovy et al. (2002)
Zaidan, Omar (2011):
MAISE: A Flexible, Configurable, Extensible Open Source Package for Mass AI System Evaluation, Proceedings of the Sixth Workshop on Statistical Machine Translation
@InProceedings{zaidan:2011:WMT,
author = {Zaidan, Omar},
title = {MAISE: A Flexible, Configurable, Extensible Open Source Package for Mass AI System Evaluation},
booktitle = {Proceedings of the Sixth Workshop on Statistical Machine Translation},
month = {July},
address = {Edinburgh, Scotland},
publisher = {Association for Computational Linguistics},
pages = {130--134},
url = {
http://www.aclweb.org/anthology/W11-2114},
year = 2011
}
Zaidan (2011)
Henderson, John and Morgan, William (2005):
Gaming Fluency: Evaluating the Bounds and Expectations of Segment-based Translation Memory, Proceedings of the ACL Workshop on Building and Using Parallel Texts
@InProceedings{henderson-morgan:2005:WPT,
author = {Henderson, John and Morgan, William},
title = {Gaming Fluency: Evaluating the Bounds and Expectations of Segment-based Translation Memory},
booktitle = {Proceedings of the ACL Workshop on Building and Using Parallel Texts},
month = {June},
address = {Ann Arbor, Michigan},
publisher = {Association for Computational Linguistics},
pages = {175--182},
url = {
http://www.aclweb.org/anthology/W/W05/W05-0832},
year = 2005
}
Henderson and Morgan (2005)
Christian Boitet and Youcef Bey and Mutsuko Tomokio and Wenjie Cao and Hervié Blanchon (2006):
IWSLT-06: experiments with commercial MT systems and lessons from subjective evaluations, Proc. of the International Workshop on Spoken Language Translation
@inproceedings{Boitet:2006:IWSLT,
author = {Christian Boitet and Youcef Bey and Mutsuko Tomokio and Wenjie Cao and Hervi\'{e} Blanchon},
title = {{IWSLT}-06: experiments with commercial {MT} systems and lessons from subjective evaluations},
booktitle = {Proc. of the International Workshop on Spoken Language Translation},
address = {Kyoto, Japan},
year = 2006
}
Boitet et al. (2006)