Syntactic Coherence between Languages
The use of synchronous context free grammars assumes an isomorphism (modulo reordering) of the syntactic structure of the source and target sentence.
Syntactic Coherence is the main subject of 10 publications. 7 are discussed here.
Publications
Fox, Heidi (2002):
Phrasal Cohesion and Statistical Machine Translation, Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)
@inproceedings{Fox:2002,
author = {Fox, Heidi},
title = {Phrasal Cohesion and Statistical Machine Translation},
url = {
http://acl.ldc.upenn.edu/W/W02/W02-1039.pdf},
booktitle = {Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)},
month = {July},
address = {Philadelphia},
publisher = {Association for Computational Linguistics},
pages = {304--311},
year = 2002
}
Fox (2002);
Rebecca Hwa and Philip Resnik and Amy Weinberg and Okan Kolak (2002):
Evaluating Translational Correspondence using Annotation Projection, Proceedings of the 40th Annual Meeting of the Association of Computational Linguistics (ACL)
@inproceedings{Hwa:2002,
author = {Rebecca Hwa and Philip Resnik and Amy Weinberg and Okan Kolak},
title = {Evaluating Translational Correspondence using Annotation Projection},
url = {
http://drum.lib.umd.edu/bitstream/1903/1267/4/CS-TR-4455.pdf},
booktitle = {Proceedings of the 40th Annual Meeting of the Association of Computational Linguistics (ACL)},
year = 2002
}
Hwa et al. (2002) examine how well the underlying assumption of syntactic coherence between languages hold up in practice. In phrase-based systems, syntactic consitituents are not sufficient to map units between languages
Philipp Koehn and Franz Josef Och and Daniel Marcu (2003):
Statistical Phrase Based Translation, Proceedings of the Joint Conference on Human Language Technologies and the Annual Meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL)
mentioned in Phrase Based Models and Syntactic Coherence@InProceedings{Koehn:2003ph,
author = {Philipp Koehn and Franz Josef Och and Daniel Marcu},
title = {Statistical Phrase Based Translation},
booktitle = {Proceedings of the Joint Conference on Human Language Technologies and the Annual Meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL)},
url = {
http://acl.ldc.upenn.edu/N/N03/N03-1017.pdf},
year = 2003
}
(Koehn et al., 2003).
For establishing mappings in a tree-based transfer models, syntactic parsing of each side introduces constraints that may make it more difficult to match up sentences
Zhang, Hao and Gildea, Daniel (2004):
Syntax-Based Alignment: Supervised or Unsupervised? , Proceedings of Coling 2004
@inproceedings{Zhang:2004,
author = {Zhang, Hao and Gildea, Daniel},
title = {Syntax-Based Alignment: Supervised or Unsupervised? },
url = {
http://acl.ldc.upenn.edu/coling2004/MAIN/pdf/60-795.pdf},
googlescholar = {9558961265693794337},
booktitle = {Proceedings of Coling 2004 },
editor = {{}},
month = {Aug 23--Aug 27},
address = {Geneva, Switzerland},
publisher = {COLING},
pages = {418--424},
year = 2004
}
(Zhang and Gildea, 2004). When studying actual parallel sentences, the complexity of syntactic transfer rules to match them up are often more complex than expected
Wellington, Benjamin and Waxmonsky, Sonjia and Melamed, I. Dan (2006):
Empirical Lower Bounds on the Complexity of Translational Equivalence, Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
@InProceedings{wellington-waxmonsky-melamed:2006:COLACL,
author = {Wellington, Benjamin and Waxmonsky, Sonjia and Melamed, I. Dan},
title = {Empirical Lower Bounds on the Complexity of Translational Equivalence},
booktitle = {Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics},
month = {July},
address = {Sydney, Australia},
publisher = {Association for Computational Linguistics},
pages = {977--984},
url = {
http://www.aclweb.org/anthology/P/P06/P06-1123},
year = 2006
}
(Wellington et al., 2006).
Parallel tree-banks are also a source of data to examine the parallelism between two languages, although the parallelism also depends on the annotation standards
Buch-Kromann, Matthias (2007):
Computing Translation Units and Quantifying Parallelism in Parallel Dependency Treebanks, Proceedings of the Linguistic Annotation Workshop
@InProceedings{buchkromann:2007:LAW,
author = {Buch-Kromann, Matthias},
title = {Computing Translation Units and Quantifying Parallelism in Parallel Dependency Treebanks},
booktitle = {Proceedings of the Linguistic Annotation Workshop},
month = {June},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {69--76},
url = {
http://www.aclweb.org/anthology/W/W07/W07-1512},
year = 2007
}
(Buch-Kromann, 2007). Visualization and searching such treebanks may also provide important insights
Volk, Martin and Lundborg, Joakim and Mettler, Maël (2007):
A Search Tool for Parallel Treebanks, Proceedings of the Linguistic Annotation Workshop
@InProceedings{volk-lundborg-mettler:2007:LAW,
author = {Volk, Martin and Lundborg, Joakim and Mettler, Ma\"{e}l},
title = {A Search Tool for Parallel Treebanks},
booktitle = {Proceedings of the Linguistic Annotation Workshop},
month = {June},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {85--92},
url = {
http://www.aclweb.org/anthology/W/W07/W07-1514},
year = 2007
}
(Volk et al., 2007).
Benchmarks
Discussion
Related Topics
New Publications
S. Arnoult and K. Sima'an (2012):
Adjunct Alignment in Translation Data with an Application to Phrase Based Statistical Machine Translation, Proceedings of th 16th International Conference of the European Association for Machine Translation (EAMT)
@inproceedings{EAMT-2012-Arnoult,
author = {S. Arnoult and K. Sima'an},
title = {Adjunct Alignment in Translation Data with an Application to Phrase Based Statistical Machine Translation},
url = {
http://www.mt-archive.info/EAMT-2012-Arnoult},
pages = {287-294},
booktitle = {Proceedings of th 16th International Conference of the European Association for Machine Translation (EAMT)},
location = {Trento, Italy},
editor = {Mauro Cettolo and Marcello Federico and Lucia Specia and Andy Way},
year = 2012
}
Arnoult and Sima'an (2012)
Feng, Minwei and Sun, Weiwei and Ney, Hermann (2012):
Semantic Cohesion Model for Phrase-Based SMT, Proceedings of COLING 2012
@InProceedings{feng-sun-ney:2012:PAPERS,
author = {Feng, Minwei and Sun, Weiwei and Ney, Hermann},
title = {Semantic Cohesion Model for Phrase-Based {SMT}},
booktitle = {Proceedings of COLING 2012},
month = {December},
address = {Mumbai, India},
publisher = {The COLING 2012 Organizing Committee},
pages = {867--878},
url = {
http://www.aclweb.org/anthology/C12-1053},
year = 2012
}
Feng et al. (2012)
Goto, Isao and Utiyama, Masao and Sumita, Eiichiro (2012):
Post-ordering by Parsing for Japanese-English Statistical Machine Translation, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
@InProceedings{goto-utiyama-sumita:2012:ACL2012short,
author = {Goto, Isao and Utiyama, Masao and Sumita, Eiichiro},
title = {Post-ordering by Parsing for Japanese-English Statistical Machine Translation},
booktitle = {Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
month = {July},
address = {Jeju Island, Korea},
publisher = {Association for Computational Linguistics},
pages = {311--316},
url = {
http://www.aclweb.org/anthology/P12-2061},
year = 2012
}
Goto et al. (2012)