N-Gram Matching Metrics

Good machine translation output not only matches single words of a reference translation, but larger chunks of text, motivating the use of n-gram based metrics.

N Gram Metrics is the main subject of 26 publications. 12 are discussed here.

Topics in Evaluation

Publications

The BLEU evaluation metric is based on n-grams, typically up to the order of four (Papineni et al., 2001). Several variants of n-gram matching have been proposed: weighting n-grams based on their frequency (Babych and Hartley, 2004), or other complexity metrics (Babych et al., 2004). GTM is based on precision and recall (Melamed et al., 2003; Turian et al., 2003). Echizen-ya and Araki (2007) propose IMPACT, which is more sensitive to the longest matching n-grams.

A metric may benefit from using an explicit alignment of system output and reference while maintaining the advantages of n-gram based methods such as BLEU (Liu and Gildea, 2006) and by training such a metric to correlate to human judgment (Liu and Gildea, 2007).

Lavie et al. (2004) emphasize the importance of recall and stemmed matches in evaluation, which led to the development of the METEOR metric (Banerjee and Lavie, 2005; Lavie and Agarwal, 2007). Partial credit for stemmed matches may also be applied to BLEU and TER (Agarwal and Lavie, 2008).

Benchmarks

Discussion

New Publications

Zied Elloumi and Hervé Blanchon and Gilles Serasset and Laurent Besacier (2015): METEOR for multiple target languages using DBnary, Machine Translation Summit XV
add
@inproceedings{MTS2015-Elloumi,
author = {Zied Elloumi and Hervé Blanchon and Gilles Serasset and Laurent Besacier},
title = {METEOR for multiple target languages using DBnary},
url = {http://www.mt-archive.info/15/MTS-2015-Elloumi.pdf},
pages = {80-89},
booktitle = {Machine Translation Summit XV},
year = 2015
}
Elloumi et al. (2015)
Popović, Maja (2015): chrF: character n-gram F-score for automatic MT evaluation, Proceedings of the Tenth Workshop on Statistical Machine Translation
add
@InProceedings{popovic:2015:WMT,
author = {Popovi\'{c}, Maja},
title = {chrF: character n-gram F-score for automatic {MT} evaluation},
booktitle = {Proceedings of the Tenth Workshop on Statistical Machine Translation},
month = {September},
address = {Lisbon, Portugal},
publisher = {Association for Computational Linguistics},
pages = {392--395},
url = {http://aclweb.org/anthology/W15-3049},
year = 2015
}
Popović (2015)
Virpioja, Sami and Grönroos, Stig-Arne (2015): LeBLEU: N-gram-based Translation Evaluation Score for Morphologically Complex Languages, Proceedings of the Tenth Workshop on Statistical Machine Translation
add
@InProceedings{virpioja-gronroos:2015:WMT,
author = {Virpioja, Sami and Gr\"{o}nroos, Stig-Arne},
title = {LeBLEU: N-gram-based Translation Evaluation Score for Morphologically Complex Languages},
booktitle = {Proceedings of the Tenth Workshop on Statistical Machine Translation},
month = {September},
address = {Lisbon, Portugal},
publisher = {Association for Computational Linguistics},
pages = {411--416},
url = {http://aclweb.org/anthology/W15-3052},
year = 2015
}
Virpioja and Grönroos (2015)
Apidianaki, Marianna and Marie, Benjamin (2015): METEOR-WSD: Improved Sense Matching in MT Evaluation, Proceedings of the Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation
add
@InProceedings{apidianaki-marie:2015:SSST-9,
author = {Apidianaki, Marianna and Marie, Benjamin},
title = {METEOR-WSD: Improved Sense Matching in {MT} Evaluation},
booktitle = {Proceedings of the Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation},
month = {June},
address = {Denver, Colorado, USA},
publisher = {Association for Computational Linguistics},
pages = {49--51},
url = {http://www.aclweb.org/anthology/W15-1006},
year = 2015
}
Apidianaki and Marie (2015)
Libovický, Jindřich and Pecina, Pavel (2014): Tolerant BLEU: a Submission to the WMT14 Metrics Task, Proceedings of the Ninth Workshop on Statistical Machine Translation
add
@InProceedings{libovicky-pecina:2014:W14-33,
author = {Libovick\'{y}, Jind\v{r}ich and Pecina, Pavel},
title = {Tolerant BLEU: a Submission to the WMT14 Metrics Task},
booktitle = {Proceedings of the Ninth Workshop on Statistical Machine Translation},
month = {June},
address = {Baltimore, Maryland, USA},
publisher = {Association for Computational Linguistics},
pages = {409--413},
url = {http://www.aclweb.org/anthology/W14-3353},
year = 2014
}
Libovický and Pecina (2014)
Chen, Boxing and Cherry, Colin (2014): A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU, Proceedings of the Ninth Workshop on Statistical Machine Translation
add
@InProceedings{chen-cherry:2014:W14-33,
author = {Chen, Boxing and Cherry, Colin},
title = {A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU},
booktitle = {Proceedings of the Ninth Workshop on Statistical Machine Translation},
month = {June},
address = {Baltimore, Maryland, USA},
publisher = {Association for Computational Linguistics},
pages = {362--367},
url = {http://www.aclweb.org/anthology/W14-3346},
year = 2014
}
Chen and Cherry (2014)
Chiang, David and DeNeefe, Steve and Chan, Yee Seng and Ng, Hwee Tou (2008): Decomposability of Translation Metrics for Improved Evaluation and Efficient Algorithms, Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{chiang-EtAl:2008:EMNLP,
author = {Chiang, David and DeNeefe, Steve and Chan, Yee Seng and Ng, Hwee Tou},
title = {Decomposability of Translation Metrics for Improved Evaluation and Efficient Algorithms},
booktitle = {Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing},
month = {October},
address = {Honolulu, Hawaii},
publisher = {Association for Computational Linguistics},
pages = {610--619},
url = {http://www.aclweb.org/anthology/D08-1064},
year = 2008
}
Chiang et al. (2008)
Alon Lavie and Michael J. Denkowski (2009): The Meteor metric for automatic evaluation of machine translation, Machine Translation
add
@article{MTJ:2009:Lavie2,
author = {Alon Lavie and Michael J. Denkowski},
title = {The {M}eteor metric for automatic evaluation of machine translation},
url = {http://www.cs.cmu.edu/afs/cs.cmu.edu/project/mteval-1/Papers/MT-Journal-2009/meteor-mtj-2009.pdf},
googlescholar = {15468685715273817238},
pages = {105--115},
journal = {Machine Translation},
volume = {23},
number = {2--3},
month = {September},
year = 2009
}
Lavie and Denkowski (2009)
Billy Wong and Chunyu Kit (2009): ATEC: automatic evaluation of machine translation via word choice and word order, Machine Translation
add
@article{MTJ:2009:Wong,
author = {Billy Wong and Chunyu Kit},
title = {ATEC: automatic evaluation of machine translation via word choice and word order},
pages = {141-155},
journal = {Machine Translation},
volume = {23},
number = {2--3},
month = {September},
year = 2009
}
Wong and Kit (2009)
Li, Maoxi and Zong, Chengqing and Ng, Hwee Tou (2011): Automatic Evaluation of Chinese Translation Output: Word-Level or Character-Level?, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies
add
@InProceedings{li-zong-ng:2011:ACL-HLT2011,
author = {Li, Maoxi and Zong, Chengqing and Ng, Hwee Tou},
title = {Automatic Evaluation of {Chinese} Translation Output: Word-Level or Character-Level?},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies},
month = {June},
address = {Portland, Oregon, USA},
publisher = {Association for Computational Linguistics},
pages = {159--164},
url = {http://www.aclweb.org/anthology/P11-2028},
year = 2011
}
Li et al. (2011)
Chen, Boxing and Kuhn, Roland (2011): AMBER: A Modified BLEU, Enhanced Ranking Metric, Proceedings of the Sixth Workshop on Statistical Machine Translation
add
@InProceedings{chen-kuhn:2011:WMT,
author = {Chen, Boxing and Kuhn, Roland},
title = {AMBER: A Modified BLEU, Enhanced Ranking Metric},
booktitle = {Proceedings of the Sixth Workshop on Statistical Machine Translation},
month = {July},
address = {Edinburgh, Scotland},
publisher = {Association for Computational Linguistics},
pages = {71--77},
url = {http://www.aclweb.org/anthology/W11-2105},
year = 2011
}
Chen and Kuhn (2011)
Denkowski, Michael and Lavie, Alon (2011): Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems, Proceedings of the Sixth Workshop on Statistical Machine Translation
add
@InProceedings{denkowski-lavie:2011:WMT,
author = {Denkowski, Michael and Lavie, Alon},
title = {Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems},
booktitle = {Proceedings of the Sixth Workshop on Statistical Machine Translation},
month = {July},
address = {Edinburgh, Scotland},
publisher = {Association for Computational Linguistics},
pages = {85--91},
url = {http://www.aclweb.org/anthology/W11-2107},
year = 2011
}
Denkowski and Lavie (2011)
Popović, Maja (2011): Morphemes and POS tags for n-gram based evaluation metrics, Proceedings of the Sixth Workshop on Statistical Machine Translation
add
@InProceedings{popovic:2011:WMT,
author = {Popovi\'{c}, Maja},
title = {Morphemes and POS tags for n-gram based evaluation metrics},
booktitle = {Proceedings of the Sixth Workshop on Statistical Machine Translation},
month = {July},
address = {Edinburgh, Scotland},
publisher = {Association for Computational Linguistics},
pages = {104--107},
url = {http://www.aclweb.org/anthology/W11-2110},
year = 2011
}
Popović (2011)
Albrecht, Joshua and Hwa, Rebecca (2008): The Role of Pseudo References in MT Evaluation, Proceedings of the Third Workshop on Statistical Machine Translation
add
@InProceedings{albrecht-hwa:2008:WMT,
author = {Albrecht, Joshua and Hwa, Rebecca},
title = {The Role of Pseudo References in {MT} Evaluation},
booktitle = {Proceedings of the Third Workshop on Statistical Machine Translation},
month = {June},
address = {Columbus, Ohio},
publisher = {Association for Computational Linguistics},
pages = {187--190},
url = {http://www.aclweb.org/anthology/W/W08/W08-0330},
year = 2008
}
Albrecht and Hwa (2008)

MT Research Survey Wiki

A Comprehensive Survey of Neural and Statistical Machine Translation Research Publications

Search Descriptions

N-Gram Matching Metrics

Publications

Benchmarks

Discussion

Related Topics

New Publications