Generative Syntax-Based Models

Instead of learning syntactic rules from parallel corpora that have been word-aligned by other means, generative models may be used to integrate grammar induction and word alignment.

Generative Syntax Models is the main subject of 15 publications. 8 are discussed here.

Topics in SyntaxBasedModels

Publications

Inspired by the IBM models,

Kenji Yamada and Kevin Knight (2001): A Syntax-Based Statistical Translation Model, Proceedings of the 39th Annual Meeting of the Association of Computational Linguistics (ACL)

Yamada and Knight (2001) presents a generative tree-based model that is trained using the EM algorithm, thus aligning the words in the parallel corpus while extracting syntactic transfer rules. Syntax trees are provided by automatically parsing the English side of the corpus in a pre-processing step. They also present a chart parsing algorithm for their model

Kenji Yamada and Kevin Knight (2002): A Decoder for Syntax-Based Statistical MT, Proceedings of the 40th Annual Meeting of the Association of Computational Linguistics (ACL)

(Yamada and Knight, 2002). This model allows the integration of a syntactic language model

E. Charniak and Kevin Knight and Kenji Yamada (2003): Syntax-based Language Models for Statistical Machine Translation, Proceedings of the MT Summit IX

(Charniak et al., 2003).

Daniel Gildea (2003): Loosly Tree-Based Alignment for Machine Translation, Proceedings of the 41st Annual Meeting of the Association of Computational Linguistics (ACL)

Gildea (2003) introduce a clone operation to the model and extend it to dependency trees

Gildea, Daniel (2004): Dependencies vs. Constituents for Tree-Based Alignment , Proceedings of EMNLP 2004

(Gildea, 2004).

Relaxing the isomorphism between input and output trees leads to the idea of quasi-synchronous grammars (QG), which have shown to produce better word alignment quality than IBM models, but not symmetrized IBM models

Smith, David A. and Eisner, Jason (2006): Quasi-Synchronous Grammars: Alignment by Soft Projection of Syntactic Dependencies, Proceedings on the Workshop on Statistical Machine Translation

(Smith and Eisner, 2006). A similar relaxation is allowing multiple neighboring head nodes in the rules

Zhang, Min and Jiang, Hongfei and Aw, Aiti and Li, Haizhou and Tan, Chew Lim and Li, Sheng (2008): A Tree Sequence Alignment-based Tree-to-Tree Translation Model, Proceedings of ACL-08: HLT

(Zhang et al., 2008;

Zhang, Min and Jiang, Hongfei and Li, Haizhou and Aw, Aiti and Li, Sheng (2008): Grammar Comparison Study for Translational Equivalence Modeling and Statistical Machine Translation, Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

Zhang et al., 2008b).

Benchmarks

Discussion

New Publications

Jonathan Graehl and Kevin Knight and Jonathan May (2008): Training Tree Transducers, Computational Linguistics
add
@Article{CL:2008-3004,
author = {Jonathan Graehl and Kevin Knight and Jonathan May},
title = {Training Tree Transducers},
journal = {Computational Linguistics},
volume = {34},
number = {3},
url = {http://aclweb.org/anthology-new/J/J08/J08-3004.pdf},
year = 2008
}
Graehl et al. (2008)
Cohn, Trevor and Blunsom, Phil (2009): A Bayesian Model of Syntax-Directed Tree to String Grammar Induction, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
add
@InProceedings{cohn-blunsom:2009:EMNLP,
author = {Cohn, Trevor and Blunsom, Phil},
title = {A {Bayesian} Model of Syntax-Directed Tree to String Grammar Induction},
booktitle = {Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing},
month = {August},
address = {Singapore},
publisher = {Association for Computational Linguistics},
pages = {352--361},
url = {http://www.aclweb.org/anthology/D/D09/D09-1037},
year = 2009
}
Cohn and Blunsom (2009)
May, Jonathan and Knight, Kevin and Vogler, Heiko (2010): Efficient Inference through Cascades of Weighted Tree Transducers, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
add
@InProceedings{may-knight-vogler:2010:ACL,
author = {May, Jonathan and Knight, Kevin and Vogler, Heiko},
title = {Efficient Inference through Cascades of Weighted Tree Transducers},
booktitle = {Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics},
month = {July},
address = {Uppsala, Sweden},
publisher = {Association for Computational Linguistics},
pages = {1058--1066},
url = {http://www.aclweb.org/anthology/P10-1108},
year = 2010
}
May et al. (2010)
Dekai Wu (1995): An Algorithm for Simultaneously Bracketing Parallel Texts by Aligning Words, Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL)
add
@Inproceedings{Wu:1995,
author = {Dekai Wu},
title = {An Algorithm for Simultaneously Bracketing Parallel Texts by Aligning Words},
url = {http://acl.ldc.upenn.edu/P/P95/P95-1033.pdf},
googlescholar = {4644540557608616671},
booktitle = {Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL)},
year = 1995
}
Wu (1995)
Neubig, Graham and Watanabe, Taro and Sumita, Eiichiro and Mori, Shinsuke and Kawahara, Tatsuya (2011): An Unsupervised Model for Joint Phrase Alignment and Extraction, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies
add
@InProceedings{neubig-EtAl:2011:ACL-HLT2011,
author = {Neubig, Graham and Watanabe, Taro and Sumita, Eiichiro and Mori, Shinsuke and Kawahara, Tatsuya},
title = {An Unsupervised Model for Joint Phrase Alignment and Extraction},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies},
month = {June},
address = {Portland, Oregon, USA},
publisher = {Association for Computational Linguistics},
pages = {632--641},
url = {http://www.aclweb.org/anthology/P11-1064},
year = 2011
}
Neubig et al. (2011)
Gimpel, Kevin and Smith, Noah A. (2011): Generative Models of Monolingual and Bilingual Gappy Patterns, Proceedings of the Sixth Workshop on Statistical Machine Translation
add
@InProceedings{gimpel-smith:2011:WMT,
author = {Gimpel, Kevin and Smith, Noah A.},
title = {Generative Models of Monolingual and Bilingual Gappy Patterns},
booktitle = {Proceedings of the Sixth Workshop on Statistical Machine Translation},
month = {July},
address = {Edinburgh, Scotland},
publisher = {Association for Computational Linguistics},
pages = {512--522},
url = {http://www.aclweb.org/anthology/W11-2165},
year = 2011
}
Gimpel and Smith (2011)
Kenji Yamada (2002): A syntax-based translation model
add
@PhDThesis{Yamada:Thesis,
author = {Kenji Yamada},
title = {A syntax-based translation model},
school = {Department of Computer Science, University of Southern California, Los Angeles},
year = 2002
}
Yamada (2002)

MT Research Survey Wiki

A Comprehensive Survey of Neural and Statistical Machine Translation Research Publications

Search Descriptions

Generative Syntax-Based Models

Publications

Benchmarks

Discussion

Related Topics

New Publications