Generating Rich Morphology

Rich morphology is especially a problem on the target side, since choosing the right morphological variants depends on various factors (agreement constraints, grammatical gender). Often relevant information is distributed widely over the input sentence or miss altogether.

Generating Rich Morphology is the main subject of 22 publications. 15 are discussed here.

Topics in LinguisticProblems

Publications

Minkov et al. (2007) use a maximum entropy model to generate rich Russian morphology and show improved performance over using the standard approach of relying on the language model. Such a model may be used for statistical machine translation by adjusting the inflections in a post-processing stage (Toutanova et al., 2008). Similarly, Fraser et al. (2012) use a conditional random field model for each morphological feature for target-side lemmas in post-processing. Weller et al. (2013) show that prediction of the case of German noun phrases can be improved by learning subcategorization frames for verbs. Clifton and Sarkar (2011) overcome the need for morphological analyzers in this approach by using unsupervised morphology induction and use automatically generated suffix classes as tags.

Chahuneau et al. (2013) use a morphological prediction model to extend the phrase dictionary with inflected forms, initially for the insertion of determiners (Tsvetkov et al., 2013). This approach is available as a toolkit (Schlinger et al., 2013).

Translation between related morphologically rich related languages may model the lexical translation step as a morphological analysis, transfer and generation process using finite state tools (Tantug et al., 2007). But also splitting words into stem and morphemes is a valid strategy for translating into a language with rich morphology as demonstrated for English–Turkish (Oflazer and El-Kahlout, 2007) and English–Arabic (Badr et al., 2008), and also for translating between two highly inflected languages as in the case of Turkman–Turkish language pairs (Tantug et al., 2007).

Stymne (2011) addresses translation of definite noun phrases into Scandinavian languages where definiteness is expressed either in forms of determiners or noun suffixes.

Translating unknown morphological variants may be learned by analogy to other morphological spelling variations (Langlais and Patry, 2007).

For very closely related languages such as Catalan and Spanish translating not chunks of words but chunks of letters in a phrase-based approach achieves decent results, and addresses very well the problem of unknown words (Vilar et al., 2007).

Benchmarks

Discussion

New Publications

Kirchhoff, Katrin and Tam, Yik-Cheung and Richey, Colleen and Wang, Wen (2015): Morphological Modeling for Machine Translation of English-Iraqi Arabic Spoken Dialogs, Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
add
@InProceedings{kirchhoff-EtAl:2015:NAACL-HLT,
author = {Kirchhoff, Katrin and Tam, Yik-Cheung and Richey, Colleen and Wang, Wen},
title = {Morphological Modeling for Machine Translation of English-Iraqi Arabic Spoken Dialogs},
booktitle = {Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
month = {May--June},
address = {Denver, Colorado},
publisher = {Association for Computational Linguistics},
pages = {995--1000},
url = {http://www.aclweb.org/anthology/N15-1102},
year = 2015
}
Kirchhoff et al. (2015)
Gandhe, Ankur and Gangadharaiah, Rashmi (2013): Hypothesis Refinement Using Agreement Constraints in Machine Translation, Proceedings of the Sixth International Joint Conference on Natural Language Processing
add
@InProceedings{gandhe-gangadharaiah:2013:IJCNLP,
author = {Gandhe, Ankur and Gangadharaiah, Rashmi},
title = {Hypothesis Refinement Using Agreement Constraints in Machine Translation},
booktitle = {Proceedings of the Sixth International Joint Conference on Natural Language Processing},
month = {October},
address = {Nagoya, Japan},
publisher = {Asian Federation of Natural Language Processing},
pages = {429--437},
url = {http://www.aclweb.org/anthology/I13-1049},
year = 2013
}
Gandhe and Gangadharaiah (2013)
Salameh, Mohammad and Cherry, Colin and Kondrak, Grzegorz (2013): Reversing Morphological Tokenization in English-to-Arabic SMT, Proceedings of the 2013 NAACL HLT Student Research Workshop
add
@InProceedings{salameh-cherry-kondrak:2013:SRW,
author = {Salameh, Mohammad and Cherry, Colin and Kondrak, Grzegorz},
title = {Reversing Morphological Tokenization in English-to-Arabic SMT},
booktitle = {Proceedings of the 2013 NAACL HLT Student Research Workshop},
month = {June},
address = {Atlanta, Georgia},
publisher = {Association for Computational Linguistics},
pages = {47--53},
url = {http://www.aclweb.org/anthology/N13-2007},
year = 2013
}
Salameh et al. (2013)
Hassan Al-Haj and Alon Lavie (2010): The Impact of Arabic Morphological Segmentation on Broad-coverage English-to-Arabic Statistical Machine Translation, Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas
add
@inproceedings{AMTA-2010-AlHaj,
author = {Hassan Al-Haj and Alon Lavie},
title = {The Impact of Arabic Morphological Segmentation on Broad-coverage {English-to-Arabic} Statistical Machine Translation},
url = {http://www.mt-archive.info/AMTA-2010-AlHaj.pdf},
booktitle = {Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas},
location = {Denver, Colorado},
year = 2010
}
Al-Haj and Lavie (2010)
Minwoo Jeong and Kristina Toutanova and Hisami Suzuki and Chris Quirk (2010): A Discriminative Lexicon Model for Complex Morphology, Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas
add
@inproceedings{AMTA-2010-Jeong,
author = {Minwoo Jeong and Kristina Toutanova and Hisami Suzuki and Chris Quirk},
title = {A Discriminative Lexicon Model for Complex Morphology},
url = {http://www.mt-archive.info/AMTA-2010-Jeong.pdf},
booktitle = {Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas},
location = {Denver, Colorado},
year = 2010
}
Jeong et al. (2010)
El Kholy, Ahmed and Habash, Nizar (2012): Rich Morphology Generation Using Statistical Machine Translation, INLG 2012 Proceedings of the Seventh International Natural Language Generation Conference
add
@InProceedings{elkholy-habash:2012:INLG2012,
author = {El Kholy, Ahmed and Habash, Nizar},
title = {Rich Morphology Generation Using Statistical Machine Translation},
booktitle = {INLG 2012 Proceedings of the Seventh International Natural Language Generation Conference},
month = {May},
address = {Utica, IL},
publisher = {Association for Computational Linguistics},
pages = {90--94},
url = {http://www.aclweb.org/anthology/W12-1514},
year = 2012
}
Kholy and Habash (2012)
Jerneja Zganec Gros and Stanislav Gruden (2007): English-Slovenian Statistical Machine Translation: from a Lower- to a Highly-Inflected Language, Proceedings of the MT Summit XI
add
@inproceedings{Gros:2007:MTSummit,
author = {Jerneja Zganec Gros and Stanislav Gruden},
title = {E}nglish-{Slovenian Statistical Machine Translation: from a Lower- to a Highly-Inflected Language},
booktitle = {Proceedings of the {MT} Summit XI},
year = 2007
}
Gros and Gruden (2007)

MT Research Survey Wiki

A Comprehensive Survey of Neural and Statistical Machine Translation Research Publications

Search Descriptions

Generating Rich Morphology

Publications

Benchmarks

Discussion

Related Topics

New Publications