Word Alignment Based on Co-Occurence

While most current work on word alignment is model-based, more heuristic approaches are based on co-occurence statistics.

Word Alignment Based On Coocurrence is the main subject of 20 publications. 16 are discussed here.

Topics in WordAlignment

Topics in WordBasedModels

Publications

Early work in word alignment focused on co-occurence statistics to find evidence for word associations (Kaji and Aizono, 1996). These methods may find evidence for the alignment of a word to multiple translations, a problem called indirect association, which may be overcome with enforcing one-to-one alignments (Melamed, 1996).

Kumano and Hirakawa (1994) augment this method with an existing bilingual dictionary. Sato and Nakanishi (1998) use a maximum entropy model for word associations. Ker and Chang (1996) groups words together into sense classes from a thesaurus to improve word alignment accuracy.

Co-occurence counts may also be used for phrase alignment, although this typically requires more efficient data structures for storing all phrases (Cromieres, 2006). Chatterjee and Agrawal (2006) extends a recency vector approach (Fung and McKeown, 1994) with additional constraints. Lardilleux and Lepage (2008) iteratively match the longest common subsequences from sentence pairs and align the remainder.

Heuristic word alignment methods have may be extended into iterative algorithms, for instance the competitive linking algorithm by Melamed (1995); Melamed (1996); Melamed (1997); Melamed (2000) or bilingual bracketing (Wu, 1997). Tufiş (2002) extends a simple co-occurence method to align words.

Monolingual collocation may also be helpful for word alignment: Liu et al. (2010) use collocation statistics help group words into cepts.

Benchmarks

Discussion

New Publications

Bai, Ming-Hong and You, Jia-Ming and Chen, Keh-Jiann and Chang, Jason S. (2009): Acquiring Translation Equivalences of Multiword Expressions by Normalized Correlation Frequencies, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{bai-EtAl:2009:EMNLP,
author = {Bai, Ming-Hong and You, Jia-Ming and Chen, Keh-Jiann and Chang, Jason S.},
title = {Acquiring Translation Equivalences of Multiword Expressions by Normalized Correlation Frequencies},
booktitle = {Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing},
month = {August},
address = {Singapore},
publisher = {Association for Computational Linguistics},
pages = {478--486},
url = {http://www.aclweb.org/anthology/D/D09/D09-1050},
year = 2009
}
Bai et al. (2009)
Moore, Robert C. (2005): Association-Based Bilingual Word Alignment, Proceedings of the ACL Workshop on Building and Using Parallel Texts
add
@InProceedings{moore:2005:WPT,
author = {Moore, Robert C.},
title = {Association-Based Bilingual Word Alignment},
booktitle = {Proceedings of the ACL Workshop on Building and Using Parallel Texts},
month = {June},
address = {Ann Arbor, Michigan},
publisher = {Association for Computational Linguistics},
pages = {1--8},
url = {http://www.aclweb.org/anthology/W/W05/W05-0801},
year = 2005
}
Moore (2005)
Tiedemann, Jörg (2009): Evidence-Based Word Alignment, Proceedings of the Workshop on Natural Language Processing Methods and Corpora in Translation, Lexicography, and Language Learning
add
@InProceedings{tiedemann:2009:NLPMCTLLL,
author = {Tiedemann, J\"{o}rg},
title = {Evidence-Based Word Alignment},
booktitle = {Proceedings of the Workshop on Natural Language Processing Methods and Corpora in Translation, Lexicography, and Language Learning},
month = {September},
address = {Borovets, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {28--32},
url = {http://www.aclweb.org/anthology/W09-4205},
year = 2009
}
Tiedemann (2009)
I. Dan Melamed (1995): Automatic Evaluation and Uniform Filter Cascades for Inducing N-best Translation Lexicons, Third Workshop on Very Large Corpora
add
@Inproceedings{Melamed:1995,
author = {I. Dan Melamed},
title = {Automatic Evaluation and Uniform Filter Cascades for Inducing N-best Translation Lexicons},
url = {http://acl.ldc.upenn.edu/W/W95/W95-0115.pdf},
booktitle = {Third Workshop on Very Large Corpora},
year = 1995
}
Melamed (1995)

MT Research Survey Wiki

A Comprehensive Survey of Neural and Statistical Machine Translation Research Publications

Search Descriptions

Word Alignment Based on Co-Occurence

Publications

Benchmarks

Discussion

Related Topics

New Publications