Word Alignment Evaluation
Evaluation of word alignment quality is difficult, because for many words correspondence to words in the other language is not straightforward, especially function words or words that are part of idiomatic expressions of other phrasal constructions.
Word Alignment Evaluation is the main subject of 14 publications. 10 are discussed here.
Publications
To better understand the word alignment problem, parallel corpora have been annotated with word alignments for language pairs such as German–English, French–English, and Romanian–English, etc. These have been the basis for competitions on word alignment
Rada Mihalcea and Ted Pedersen (2003):
An Evaluation Exercise for Word Alignment, Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond
mentioned in Word Alignment and Word Alignment Evaluation@Inproceedings{SharedTaskWordAlignment,
author = {Rada Mihalcea and Ted Pedersen},
title = {An Evaluation Exercise for Word Alignment},
url = {
http://acl.ldc.upenn.edu/W/W03/W03-0301.pdf},
googlescholar = {9348742121509149927},
booktitle = {Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond},
year = 2003
}
(Mihalcea and Pedersen, 2003;
Martin, Joel and Mihalcea, Rada and Pedersen, Ted (2005):
Word Alignment for Languages with Scarce Resources, Proceedings of the ACL Workshop on Building and Using Parallel Texts
mentioned in Word Alignment and Word Alignment Evaluation@InProceedings{martin-mihalcea-pedersen:2005:WPT,
author = {Martin, Joel and Mihalcea, Rada and Pedersen, Ted},
title = {Word Alignment for Languages with Scarce Resources},
booktitle = {Proceedings of the ACL Workshop on Building and Using Parallel Texts},
month = {June},
address = {Ann Arbor, Michigan},
publisher = {Association for Computational Linguistics},
pages = {65--74},
url = {
http://www.aclweb.org/anthology/W/W05/W05-0809},
year = 2005
}
Martin et al., 2005).
The relationship between alignment quality and machine translation performance is under some discussion
Philippe Langlais and Michel Simard and Jean Veronis (1998):
Methods and Practical Issues in Evaluating Alignment Techniques, Proceedings of the 36th Annual Meeting of the Association of Computational Linguistics (ACL)
@Inproceedings{Langlais:1998,
author = {Philippe Langlais and Michel Simard and Jean Veronis},
title = {Methods and Practical Issues in Evaluating Alignment Techniques},
booktitle = {Proceedings of the 36th Annual Meeting of the Association of Computational Linguistics (ACL)},
year = 1998
}
(Langlais et al., 1998;
Fraser and Marcu, 2007).
David Vilar and Maja Popovic and Hermann Ney (2006):
AER: do we need to "improve" our alignments?, Proc. of the International Workshop on Spoken Language Translation
@inproceedings{Vilar:2006:IWSLT,
author = {David Vilar and Maja Popovic and Hermann Ney},
title = {{AER}: do we need to "improve" our alignments?},
url = {
http://20.210-193-52.unknown.qala.com.sg/archive/iwslt\_06/papers/slt6\_205.pdf},
googlescholar = {8322719774620980247},
month = {November},
booktitle = {Proc. of the International Workshop on Spoken Language Translation},
address = {Kyoto, Japan},
year = 2006
}
Vilar et al. (2006) point to mismatches between alignment error rate and machine translation performance. New measures have been proposed to overcome the weakness of alignment error rate
Michael Carl and Sisay Fissaha (2003):
Phrase-based Evaluation of Word-to-Word Alignments, HLT-NAACL 2003 Workshop: Building and Using Parallel Texts: Data Driven Machine Translation and Beyond
@inproceedings{Carl:2003,
author = {Michael Carl and Sisay Fissaha },
title = {Phrase-based Evaluation of Word-to-Word Alignments},
url = {
http://acl.ldc.upenn.edu/W/W03/W03-0307.pdf},
booktitle = {HLT-NAACL 2003 Workshop: Building and Using Parallel Texts: Data Driven Machine Translation and Beyond},
editor = {Rada Mihalcea and Ted Pedersen},
month = {May 31},
address = {Edmonton, Alberta, Canada},
publisher = {Association for Computational Linguistics},
year = 2003
}
(Carl and Fissaha, 2003). Giving less weight to alignment points that connect multiple aligned words improves correlation
Paul Davis and Zhuli Xie and Kevin Small (2007):
All Links are not the Same: Evaluating Word Alignments for Statistical Machine Translation, Proceedings of the MT Summit XI
@inproceedings{Davis:2007:MTSummit,
author = {Paul Davis and Zhuli Xie and Kevin Small},
title = {All Links are not the Same: Evaluating Word Alignments for Statistical Machine Translation},
url = {
http://www.kevinsmall.org/pdf/DavisSmXi07.pdf},
googlescholar = {7427372533434352724},
booktitle = {Proceedings of the {MT} Summit XI},
year = 2007
}
(Davis et al., 2007).
Adam Lopez and Philip Resnik (2006):
Word-Based Alignment, Phrase-Based Translation: What's the Link?, 5th Conference of the Association for Machine Translation in the Americas (AMTA)
mentioned in Phrase Based Models and Word Alignment Evaluation@InProceedings{Lopez:2006:AMTA,
author = {Adam Lopez and Philip Resnik},
title = {Word-Based Alignment, Phrase-Based Translation: What's the Link?},
url = {
http://www.mt-archive.info/AMTA-2006-Lopez.pdf},
googlescholar = {16252070359942137861},
booktitle = {5th Conference of the Association for Machine Translation in the Americas (AMTA)},
month = {August},
address = {Boston, Massachusetts},
year = 2006
}
Lopez and Resnik (2006) shows impact of word alignment quality on phrase-based models.
Ayan, Necip Fazil and Dorr, Bonnie J. (2006):
Going Beyond AER: An Extensive Analysis of Word Alignments and Their Impact on MT, Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
@InProceedings{ayan-dorr:2006:COLACL,
author = {Ayan, Necip Fazil and Dorr, Bonnie J.},
title = {Going Beyond AER: An Extensive Analysis of Word Alignments and Their Impact on MT},
booktitle = {Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics},
month = {July},
address = {Sydney, Australia},
publisher = {Association for Computational Linguistics},
pages = {9--16},
url = {
http://www.aclweb.org/anthology/P/P06/P06-1002},
year = 2006
}
Ayan and Dorr (2006) compare alignment quality and machine translation performance and also stress the interaction with the phrase extraction method. Albeit computationally very expensive, word alignment quality may also be directly optimized on machine translation performance
Lambert, Patrik and Banchs, Rafael E. and Crego, Josep M. (2007):
Discriminative Alignment Training without Annotated Data for Machine Translation, Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
@InProceedings{lambert-banchs-crego:2007:ShortPapers,
author = {Lambert, Patrik and Banchs, Rafael E. and Crego, Josep M.},
title = {Discriminative Alignment Training without Annotated Data for Machine Translation},
booktitle = {Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers},
month = {April},
address = {Rochester, New York},
publisher = {Association for Computational Linguistics},
pages = {85--88},
url = {
http://www.aclweb.org/anthology/N/N07/N07-2022},
year = 2007
}
(Lambert et al., 2007).
Benchmarks
Discussion
Related Topics
New Publications
Patrik Lambert and Simon Petitrenaud and Yanjun Ma and Andy Way (2012):
What types of word alignment improve statistical machine translation?, Machine Translation
@article{MTJ:2012:Lambert,
author = {Patrik Lambert and Simon Petitrenaud and Yanjun Ma and Andy Way},
title = {What types of word alignment improve statistical machine translation?},
pages = {289-323},
journal = {Machine Translation},
volume = {26},
number = {4},
month = {December},
year = 2012
}
Lambert et al. (2012)
Patrik Lambert and Simon Petitrenaud and Yanjun Ma and Andy Way (2012):
What types of word alignment improve statistical machine translation?, Machine Translation
@article{MTJ:2012:Lambert,
author = {Patrik Lambert and Simon Petitrenaud and Yanjun Ma and Andy Way},
title = {What types of word alignment improve statistical machine translation?},
pages = {289-323},
journal = {Machine Translation},
volume = {26},
number = {4},
month = {December},
year = 2012
}
Lambert et al. (2012)
Cyril Goutte and Marine Carpuat and George Foster (2012):
The Impact of Sentence Alignment Errors on Phrase-Based Machine Translation Performance, Proceedings of the Tenth Conference of the Association for Machine Translation in the Americas (AMTA)
@inproceedings{AMTA-2012-Goutte,
author = {Cyril Goutte and Marine Carpuat and George Foster},
title = {The Impact of Sentence Alignment Errors on Phrase-Based Machine Translation Performance},
url = {
http://www.mt-archive.info/AMTA-2012-Goutte.pdf},
booktitle = {Proceedings of the Tenth Conference of the Association for Machine Translation in the Americas (AMTA)},
location = {San Diego, California},
year = 2012
}
Goutte et al. (2012)
Xu, Jinxi and Chen, Jinying (2011):
How Much Can We Gain from Supervised Word Alignment?, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies
@InProceedings{xu-chen:2011:ACL-HLT2011,
author = {Xu, Jinxi and Chen, Jinying},
title = {How Much Can We Gain from Supervised Word Alignment?},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies},
month = {June},
address = {Portland, Oregon, USA},
publisher = {Association for Computational Linguistics},
pages = {165--169},
url = {
http://www.aclweb.org/anthology/P11-2029},
year = 2011
}
Xu and Chen (2011)
Bodrumlu, Tugba and Knight, Kevin and Ravi, Sujith (2009):
A New Objective Function for Word Alignment, Proceedings of the Workshop on Integer Linear Programming for Natural Language Processing
@InProceedings{bodrumlu-knight-ravi:2009:ILPNLP,
author = {Bodrumlu, Tugba and Knight, Kevin and Ravi, Sujith},
title = {A New Objective Function for Word Alignment},
booktitle = {Proceedings of the Workshop on Integer Linear Programming for Natural Language Processing},
month = {June},
address = {Boulder, Colorado},
publisher = {Association for Computational Linguistics},
pages = {28--35},
url = {
http://www.aclweb.org/anthology/W09-1804},
year = 2009
}
Bodrumlu et al. (2009)