Word Alignment with Linguistic Annotation
As with statistical machine translation models, most word alignment methods view sentences simply as strings of unique tokens, but linguistic annotation may be exploited to improve word alignment quality.
Word Alignment With Linguistic Annotation is the main subject of 24 publications. 13 are discussed here.
Publications
Word alignment methods have been extended to exploit part-of-speech information
Jyun-Sheng Chang and Huey-Chyun Chen (1994):
Using Part-of-speech information in word alignment, Proceedings of the Conference of the Association for Machine Translation in the Americas
@Inproceedings{Chang:1994,
author = {Jyun-Sheng Chang and Huey-Chyun Chen},
title = {Using Part-of-speech information in word alignment},
url = {
http://www.mt-archive.info/AMTA-1994-Chang.pdf},
googlescholar = {10523715937893251359},
booktitle = {Proceedings of the Conference of the Association for Machine Translation in the Americas},
year = 1994
}
(Chang and Chen, 1994;
Jörg Tiedemann (2003):
Combining Clues for Word Alignment, Proceedings of Meeting of the European Chapter of the Association of Computational Linguistics (EACL)
@InProceedings{Tiedemann:2003,
author = {J{\"o}rg Tiedemann},
title = {Combining Clues for Word Alignment},
url = {
http://acl.ldc.upenn.edu/E/E03/E03-1026.pdf},
googlescholar = {12019303397660825090},
booktitle = {Proceedings of Meeting of the European Chapter of the Association of Computational Linguistics (EACL)},
year = 2003
}
Tiedemann, 2003) in constraint methods
Tiedemann, Jörg (2004):
Word to word alignment strategies , Proceedings of Coling 2004
@inproceedings{Tiedemann:2004,
author = {Tiedemann, J{\"o}rg},
title = {Word to word alignment strategies },
url = {
http://acl.ldc.upenn.edu/C/C04/C04-1031.pdf},
googlescholar = {13686845012859288407},
booktitle = {Proceedings of Coling 2004 },
editor = {{}},
month = {Aug 23--Aug 27},
address = {Geneva, Switzerland},
publisher = {COLING},
pages = {212--218},
year = 2004
}
(Tiedemann, 2004), translation divergences
Bonnie J. Dorr and Lisa Pearl and Rebecca Hwa and Nizar Habash (2002):
DUSTer: A Method for Unraveling Cross-Language Divergences for Statistical Word-Level Alignment, Machine Translation: From Research to Real Users, 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002 Tiburon, CA, USA, October 6-12, 2002, Proceedings
@inproceedings{Dorr:2002,
author = {Bonnie J. Dorr and Lisa Pearl and Rebecca Hwa and Nizar Habash},
title = {DUSTer: A Method for Unraveling Cross-Language Divergences for Statistical Word-Level Alignment},
url = {
http://people.cs.pitt.edu/~hwa/amta02.ps},
googlescholar = {1449077559793433146},
editor = {Stephen D. Richardson},
booktitle = {Machine Translation: From Research to Real Users, 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002 Tiburon, CA, USA, October 6-12, 2002, Proceedings},
publisher = {Springer},
series = {Lecture Notes in Computer Science},
volume = {2499},
isbn = {3-540-44282-0},
bibsource = {DBLP,
http://dblp.uni-trier.de},
year = 2002
}
(Dorr et al., 2002), compositionality constraints
Michel Simard and Philippe Langlais (2003):
Statistical Translation Alignment with Compositionality Constraints, HLT-NAACL 2003 Workshop: Building and Using Parallel Texts: Data Driven Machine Translation and Beyond
@inproceedings{Simard:2003,
author = {Michel Simard and Philippe Langlais },
title = {Statistical Translation Alignment with Compositionality Constraints},
url = {
http://acl.ldc.upenn.edu/W/W03/W03-0304.pdf},
googlescholar = {10069460103729287862},
booktitle = {HLT-NAACL 2003 Workshop: Building and Using Parallel Texts: Data Driven Machine Translation and Beyond},
editor = {Rada Mihalcea and Ted Pedersen},
month = {May 31},
address = {Edmonton, Alberta, Canada},
publisher = {Association for Computational Linguistics},
year = 2003
}
(Simard and Langlais, 2003), and syntactic constraints
Cherry, Colin and Lin, Dekang (2003):
A Probability Model to Improve Word Alignment, Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics
@inproceedings{Cherry:2003,
author = {Cherry, Colin and Lin, Dekang},
title = {A Probability Model to Improve Word Alignment},
booktitle = {Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics},
editor = {Erhard Hinrichs and Dan Roth},
url = {
http://www.aclweb.org/anthology/P03-1012.pdf},
pages = {88--95},
year = 2003
}
(Cherry and Lin, 2003;
Dekang Lin and Colin Cherry (2003):
ProAlign: Shared Task System Description, HLT-NAACL 2003 Workshop: Building and Using Parallel Texts: Data Driven Machine Translation and Beyond
@inproceedings{Lin:2003,
author = { Dekang Lin and Colin Cherry },
title = {ProAlign: Shared Task System Description},
url = {
http://acl.ldc.upenn.edu/W/W03/W03-0302.pdf},
booktitle = {HLT-NAACL 2003 Workshop: Building and Using Parallel Texts: Data Driven Machine Translation and Beyond},
editor = {Rada Mihalcea and Ted Pedersen},
month = {May 31},
address = {Edmonton, Alberta, Canada},
publisher = {Association for Computational Linguistics},
year = 2003
}
Lin and Cherry, 2003;
Zhao, Bing and Vogel, Stephan (2003):
Word Alignment Based on Bilingual Bracketing, HLT-NAACL 2003 Workshop: Building and Using Parallel Texts: Data Driven Machine Translation and Beyond
@inproceedings{Zhao:2003b,
author = {Zhao, Bing and Vogel, Stephan},
title = {Word Alignment Based on Bilingual Bracketing},
url = {
http://acl.ldc.upenn.edu/W/W03/W03-0303.pdf},
googlescholar = {13620313122568916163},
booktitle = {HLT-NAACL 2003 Workshop: Building and Using Parallel Texts: Data Driven Machine Translation and Beyond},
editor = {Rada Mihalcea and Ted Pedersen},
month = {May 31},
address = {Edmonton, Alberta, Canada},
publisher = {Association for Computational Linguistics},
pages = {15--18},
year = 2003
}
Zhao and Vogel, 2003).
Fraser, Alexander and Marcu, Daniel (2005):
ISI's Participation in the Romanian-English Alignment Task, Proceedings of the ACL Workshop on Building and Using Parallel Texts
@InProceedings{fraser-marcu:2005:WPT,
author = {Fraser, Alexander and Marcu, Daniel},
title = {{ISI}'s Participation in the {R}omanian-{E}nglish Alignment Task},
booktitle = {Proceedings of the ACL Workshop on Building and Using Parallel Texts},
month = {June},
address = {Ann Arbor, Michigan},
publisher = {Association for Computational Linguistics},
pages = {91--94},
url = {
http://www.aclweb.org/anthology/W/W05/W05-0814},
year = 2005
}
Fraser and Marcu (2005) improve word alignments by stemming words in input and output language, thus generalizing over morphological variants. Syntactic constraints may derive from formal criteria of obtaining parallel tree structures, such as the ITG constraint, or from syntactic relationships between words on either side
Colin Cherry and Dekang Lin (2006):
A Comparison of Syntactically Motivated Word Alignment Spaces, Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics
@InProceedings{Cherry:2006:EACL,
author = {Colin Cherry and Dekang Lin},
title = {A Comparison of Syntactically Motivated Word Alignment Spaces},
url = {
http://acl.ldc.upenn.edu/eacl2006/main/papers/10\_1\_cherrylin\_90.pdf},
googlescholar = {6384212915811870792},
booktitle = {Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics},
month = {April},
address = {Trento, Italy},
year = 2006
}
(Cherry and Lin, 2006).
Linguistic constraints may be modeled as priors in the generative model
Deng, Yonggang and Gao, Yuqing (2007):
Guiding Statistical Word Alignment Models With Prior Knowledge, Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
@InProceedings{deng-gao:2007:ACLMain,
author = {Deng, Yonggang and Gao, Yuqing},
title = {Guiding Statistical Word Alignment Models With Prior Knowledge},
booktitle = {Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics},
month = {June},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {1--8},
url = {
http://www.aclweb.org/anthology/P/P07/P07-1001},
year = 2007
}
(Deng and Gao, 2007).
Hermjakob, Ulf (2009):
Improved Word Alignment with Statistics and Linguistic Heuristics, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
@InProceedings{hermjakob:2009:EMNLP,
author = {Hermjakob, Ulf},
title = {Improved Word Alignment with Statistics and Linguistic Heuristics},
booktitle = {Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing},
month = {August},
address = {Singapore},
publisher = {Association for Computational Linguistics},
pages = {229--237},
url = {
http://www.aclweb.org/anthology/D/D09/D09-1024},
year = 2009
}
Hermjakob (2009) proposes a number of hand-crafted linguistic rules to improve word alignments obtained with traditional statistical methods.
Riesa, Jason and Irvine, Ann and Marcu, Daniel (2011):
Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing
mentioned in Available Software and Word Alignment With Linguistic Annotation@InProceedings{riesa-irvine-marcu:2011:EMNLP,
author = {Riesa, Jason and Irvine, Ann and Marcu, Daniel},
title = {Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation},
booktitle = {Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing},
month = {July},
address = {Edinburgh, Scotland, UK.},
publisher = {Association for Computational Linguistics},
pages = {497--507},
url = {
http://www.aclweb.org/anthology/D11-1046},
year = 2011
}
Riesa et al. (2011) use syntactic features in a discriminative word aligner and stress that guidance from the parse structure makes search during training more manageable.
Benchmarks
Discussion
Related Topics
New Publications
Huang, Fei and Yates, Alexander (2014):
Improving Word Alignment Using Linguistic Code Switching Data, Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics
@InProceedings{huang-yates:2014:EACL,
author = {Huang, Fei and Yates, Alexander},
title = {Improving Word Alignment Using Linguistic Code Switching Data},
booktitle = {Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics},
month = {April},
address = {Gothenburg, Sweden},
publisher = {Association for Computational Linguistics},
pages = {1--9},
url = {
http://www.aclweb.org/anthology/E14-1001},
year = 2014
}
Huang and Yates (2014)
Franck Burlot and François Yvon (2015):
Morphology-aware alignments for translation to and from a synthetic language, Proceedings of the International Workshop on Spoken Language Translation (IWSLT)
@inproceedings{IWSLT-2015-Burlot,
author = {Franck Burlot and François Yvon},
title = {Morphology-aware alignments for translation to and from a synthetic language},
pages = {188-195},
booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)},
location = {Da Nang, Vietnam},
url = {
http://www.mt-archive.info/15/IWSLT-2015-burlot.pdf},
month = {December},
year = 2015
}
Burlot and Yvon (2015)
Kondo, Shuhei and Duh, Kevin and Matsumoto, Yuji (2013):
Hidden Markov Tree Model for Word Alignment, Proceedings of the Eighth Workshop on Statistical Machine Translation
@InProceedings{kondo-duh-matsumoto:2013:WMT,
author = {Kondo, Shuhei and Duh, Kevin and Matsumoto, Yuji},
title = {Hidden {Markov} Tree Model for Word Alignment},
booktitle = {Proceedings of the Eighth Workshop on Statistical Machine Translation},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {503--511},
url = {
http://www.aclweb.org/anthology/W13-2263},
year = 2013
}
Kondo et al. (2013)
Toshiaki Nakazawa and Sadao Kurohashi (2008):
Linguistically-motivated Tree-based Probabilistic Phrase Alignment, Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas (AMTA)
@inproceedings{amta08:Nakazawa,
author = {Toshiaki Nakazawa and Sadao Kurohashi},
title = {Linguistically-motivated Tree-based Probabilistic Phrase Alignment},
url = {
http://www-nagao.kuee.kyoto-u.ac.jp/~nakazawa/pubdb/AMTA2008/AMTA2008.pdf},
googlescholar = {12802904449954363637},
pages = {163--171},
booktitle = {Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas (AMTA)},
location = {Waikiki, Hawaii},
year = 2008
}
Nakazawa and Kurohashi (2008)
- UNKNOWN CITATION 'iwslt04:TP_gispert'
Søgaard, Anders and Kuhn, Jonas (2009):
Empirical Lower Bounds on Aligment Error Rates in Syntax-Based Machine Translation, Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3) at NAACL HLT 2009
@InProceedings{sogaard-kuhn:2009:SSST,
author = {S{\o}gaard, Anders and Kuhn, Jonas},
title = {Empirical Lower Bounds on Aligment Error Rates in Syntax-Based Machine Translation},
booktitle = {Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3) at NAACL HLT 2009},
month = {June},
address = {Boulder, Colorado},
publisher = {Association for Computational Linguistics},
pages = {19--27},
url = {
http://www.aclweb.org/anthology/W09-2303},
year = 2009
}
Søgaard and Kuhn (2009)
Søgaard, Anders (2009):
On the Complexity of Alignment Problems in Two Synchronous Grammar Formalisms, Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3) at NAACL HLT 2009
@InProceedings{sogaard:2009:SSST,
author = {S{\o}gaard, Anders},
title = {On the Complexity of Alignment Problems in Two Synchronous Grammar Formalisms},
booktitle = {Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3) at NAACL HLT 2009},
month = {June},
address = {Boulder, Colorado},
publisher = {Association for Computational Linguistics},
pages = {60--68},
url = {
http://www.aclweb.org/anthology/W09-2308},
year = 2009
}
Søgaard (2009)
Luong, Minh-Thang and Kan, Min-Yen (2010):
Enhancing Morphological Alignment for Translating Highly Inflected Languages, Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)
@InProceedings{luong-kan:2010:PAPERS,
author = {Luong, Minh-Thang and Kan, Min-Yen},
title = {Enhancing Morphological Alignment for Translating Highly Inflected Languages},
booktitle = {Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)},
month = {August},
address = {Beijing, China},
publisher = {Coling 2010 Organizing Committee},
pages = {743--751},
url = {
http://www.aclweb.org/anthology/C10-1084},
year = 2010
}
Luong and Kan (2010)
Lee, Jae-Hee and Lee, Seung-Wook and Hong, Gumwon and Hwang, Young-Sook and Kim, Sang-Bum and Rim, Hae-Chang (2010):
A Post-processing Approach to Statistical Word Alignment Reflecting Alignment Tendency between Part-of-speeches, Coling 2010: Posters
@InProceedings{lee-EtAl:2010:POSTERS1,
author = {Lee, Jae-Hee and Lee, Seung-Wook and Hong, Gumwon and Hwang, Young-Sook and Kim, Sang-Bum and Rim, Hae-Chang},
title = {A Post-processing Approach to Statistical Word Alignment Reflecting Alignment Tendency between Part-of-speeches},
booktitle = {Coling 2010: Posters},
month = {August},
address = {Beijing, China},
publisher = {Coling 2010 Organizing Committee},
pages = {623--629},
url = {
http://www.aclweb.org/anthology/C10-2071},
year = 2010
}
Lee et al. (2010)
Jin-Xia Huang and Key-Sun Choi (2000):
Chinese-Korean Word Alignment Based on Linguistic Comparison, Proceedings of the 38th Annual Meeting of the Association of Computational Linguistics (ACL)
@InProceedings{Huang:2000,
author = {Jin-Xia Huang and Key-Sun Choi},
title = {{Chinese-Korean} Word Alignment Based on Linguistic Comparison},
url = {
http://www.aclweb.org/anthology/P00-1050},
booktitle = {Proceedings of the 38th Annual Meeting of the Association of Computational Linguistics (ACL)},
year = 2000
}
Huang and Choi (2000)
Ozdowska, Sylwia (2005):
Using Bilingual Dependencies to Align Words in English/French Parallel Corpora, Proceedings of the ACL Student Research Workshop
@InProceedings{ozdowska:2005:Student,
author = {Ozdowska, Sylwia},
title = {Using Bilingual Dependencies to Align Words in {E}nglish/{F}rench Parallel Corpora},
booktitle = {Proceedings of the ACL Student Research Workshop},
month = {June},
address = {Ann Arbor, Michigan},
publisher = {Association for Computational Linguistics},
pages = {127--132},
url = {
http://www.aclweb.org/anthology/P/P05/P05-2022},
year = 2005
}
Ozdowska (2005)
Grzegorz Kondrak (2005):
Cognates and Word Alignment in Bitexts, Proceedings of the Tenth Machine Translation Summit (MT Summit X)
@InProceedings{Kondrak:2005:MTS,
author = {Grzegorz Kondrak},
title = {Cognates and Word Alignment in Bitexts},
url = {
http://mt-archive.info/MTS-2005-Kondrak.pdf},
googlescholar = {10504796889953111683},
booktitle = {Proceedings of the Tenth Machine Translation Summit (MT Summit X)},
month = {September},
address = {Phuket, Thailand},
year = 2005
}
Kondrak (2005)