Generating Rich Morphology
Rich morphology is especially a problem on the target side, since choosing the right morphological variants depends on various factors (agreement constraints, grammatical gender). Often relevant information is distributed widely over the input sentence or miss altogether.
Generating Rich Morphology is the main subject of 22 publications. 15 are discussed here.
Publications
Minkov, Einat and Toutanova, Kristina and Suzuki, Hisami (2007):
Generating Complex Morphology for Machine Translation, Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
@InProceedings{minkov-toutanova-suzuki:2007:ACLMain,
author = {Minkov, Einat and Toutanova, Kristina and Suzuki, Hisami},
title = {Generating Complex Morphology for Machine Translation},
booktitle = {Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics},
month = {June},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {128--135},
url = {
http://www.aclweb.org/anthology/P/P07/P07-1017},
year = 2007
}
Minkov et al. (2007) use a maximum entropy model to generate rich Russian morphology and show improved performance over using the standard approach of relying on the language model. Such a model may be used for statistical machine translation by adjusting the inflections in a post-processing stage
Toutanova, Kristina and Suzuki, Hisami and Ruopp, Achim (2008):
Applying Morphology Generation Models to Machine Translation, Proceedings of ACL-08: HLT
@InProceedings{toutanova-suzuki-ruopp:2008:ACLMain,
author = {Toutanova, Kristina and Suzuki, Hisami and Ruopp, Achim},
title = {Applying Morphology Generation Models to Machine Translation},
booktitle = {Proceedings of ACL-08: HLT},
month = {June},
address = {Columbus, Ohio},
publisher = {Association for Computational Linguistics},
pages = {514--522},
url = {
http://www.aclweb.org/anthology/P/P08/P08-1059},
year = 2008
}
(Toutanova et al., 2008). Similarly,
Fraser, Alexander and Weller, Marion and Cahill, Aoife and Cap, Fabienne (2012):
Modeling Inflection and Word-Formation in SMT, Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
mentioned in Generating Rich Morphology and Compounds@InProceedings{fraser-EtAl:2012:EACL2012,
author = {Fraser, Alexander and Weller, Marion and Cahill, Aoife and Cap, Fabienne},
title = {Modeling Inflection and Word-Formation in SMT},
booktitle = {Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics},
month = {April},
address = {Avignon, France},
publisher = {Association for Computational Linguistics},
pages = {664--674},
url = {
http://www.aclweb.org/anthology/E12-1068},
year = 2012
}
Fraser et al. (2012) use a conditional random field model for each morphological feature for target-side lemmas in post-processing.
Weller, Marion and Fraser, Alexander and Schulte im Walde, Sabine (2013):
Using subcategorization knowledge to improve case prediction for translation to German, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
@InProceedings{weller-fraser-schulteimwalde:2013:ACL2013,
author = {Weller, Marion and Fraser, Alexander and Schulte im Walde, Sabine},
title = {Using subcategorization knowledge to improve case prediction for translation to German},
booktitle = {Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {593--603},
url = {
http://www.aclweb.org/anthology/P13-1058},
year = 2013
}
Weller et al. (2013) show that prediction of the case of German noun phrases can be improved by learning subcategorization frames for verbs.
Clifton, Ann and Sarkar, Anoop (2011):
Combining Morpheme-based Machine Translation with Post-processing Morpheme Prediction, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies
@InProceedings{clifton-sarkar:2011:ACL-HLT2011,
author = {Clifton, Ann and Sarkar, Anoop},
title = {Combining Morpheme-based Machine Translation with Post-processing Morpheme Prediction},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies},
month = {June},
address = {Portland, Oregon, USA},
publisher = {Association for Computational Linguistics},
pages = {32--42},
url = {
http://www.aclweb.org/anthology/P11-1004},
year = 2011
}
Clifton and Sarkar (2011) overcome the need for morphological analyzers in this approach by using unsupervised morphology induction and use automatically generated suffix classes as tags.
Chahuneau, Victor and Schlinger, Eva and Smith, Noah A. and Dyer, Chris (2013):
Translating into Morphologically Rich Languages with Synthetic Phrases, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
@InProceedings{chahuneau-EtAl:2013:EMNLP,
author = {Chahuneau, Victor and Schlinger, Eva and Smith, Noah A. and Dyer, Chris},
title = {Translating into Morphologically Rich Languages with Synthetic Phrases},
booktitle = {Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing},
month = {October},
address = {Seattle, Washington, USA},
publisher = {Association for Computational Linguistics},
pages = {1677--1687},
url = {
http://www.aclweb.org/anthology/D13-1174},
year = 2013
}
Chahuneau et al. (2013) use a morphological prediction model to extend the phrase dictionary with inflected forms, initially for the insertion of determiners
Tsvetkov, Yulia and Dyer, Chris and Levin, Lori and Bhatia, Archna (2013):
Generating English Determiners in Phrase-Based Translation with Synthetic Translation Options, Proceedings of the Eighth Workshop on Statistical Machine Translation
@InProceedings{tsvetkov-EtAl:2013:WMT,
author = {Tsvetkov, Yulia and Dyer, Chris and Levin, Lori and Bhatia, Archna},
title = {Generating {English} Determiners in Phrase-Based Translation with Synthetic Translation Options},
booktitle = {Proceedings of the Eighth Workshop on Statistical Machine Translation},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {271--280},
url = {
http://www.aclweb.org/anthology/W13-2234},
year = 2013
}
(Tsvetkov et al., 2013). This approach is available as a toolkit
Eva Schlinger and Victor Chahuneau and Chris Dyer (2013):
Morphogen: Translation into Morphologically Rich Languages with Synthetic Phrases, The Prague Bulletin of Mathematical Linguistics
@article{pbml-100-schlinger-chahuneau-dyer,
author = {Eva Schlinger and Victor Chahuneau and Chris Dyer},
title = {Morphogen: Translation into Morphologically Rich Languages with Synthetic Phrases},
url = {
http://ufal.mff.cuni.cz/pbml/100/art-schlinger-chahuneau-dyer.pdf},
pages = {51--62},
journal = {The Prague Bulletin of Mathematical Linguistics},
volume = {100},
year = 2013
}
(Schlinger et al., 2013).
Translation between related morphologically rich related languages may model the lexical translation step as a morphological analysis, transfer and generation process using finite state tools
Tantug, Ahmet Cüneyd and Adali, Esref and Oflazer, Kemal (2007):
Machine Translation between Turkic Languages, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions
@InProceedings{tantuvg-adali-oflazer:2007:PosterDemo,
author = {Tantu\v{g}, Ahmet C{\"u}neyd and Adali, Esref and Oflazer, Kemal},
title = {Machine Translation between Turkic Languages},
booktitle = {Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions},
month = {June},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {189--192},
url = {
http://www.aclweb.org/anthology/P/P07/P07-2048},
year = 2007
}
(Tantug et al., 2007). But also splitting words into stem and morphemes is a valid strategy for translating into a language with rich morphology as demonstrated for English–Turkish
Oflazer, Kemal and Durgar El-Kahlout, Ilknur (2007):
Exploring Different Representational Units in English-to-Turkish Statistical Machine Translation, Proceedings of the Second Workshop on Statistical Machine Translation
@InProceedings{oflazer-durgarelkahlout:2007:WMT,
author = {Oflazer, Kemal and Durgar El-Kahlout, Ilknur},
title = {Exploring Different Representational Units in {English-to-Turkish} Statistical Machine Translation},
booktitle = {Proceedings of the Second Workshop on Statistical Machine Translation},
month = {June},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {25--32},
url = {
http://www.aclweb.org/anthology/W/W07/W07-0204},
year = 2007
}
(Oflazer and El-Kahlout, 2007) and English–Arabic
Badr, Ibrahim and Zbib, Rabih and Glass, James (2008):
Segmentation for English-to-Arabic Statistical Machine Translation, Proceedings of ACL-08: HLT, Short Papers
mentioned in Generating Rich Morphology and Factored Translation Models@InProceedings{badr-zbib-glass:2008:ACLShort,
author = {Badr, Ibrahim and Zbib, Rabih and Glass, James},
title = {Segmentation for English-to-Arabic Statistical Machine Translation},
booktitle = {Proceedings of ACL-08: HLT, Short Papers},
month = {June},
address = {Columbus, Ohio},
publisher = {Association for Computational Linguistics},
pages = {153--156},
url = {
http://www.aclweb.org/anthology/P/P08/P08-2039},
year = 2008
}
(Badr et al., 2008), and also for translating between two highly inflected languages as in the case of Turkman–Turkish language pairs
Ahmet Cüneyd Tantug and Esref Adali and Kemal Oflazer (2007):
A MT System from Turkmen to Turkish Employing Finite State and Statistical Methods, Proceedings of the MT Summit XI
@inproceedings{Tantug:2007:MTSummit,
author = {Ahmet C{\"u}neyd Tantu\v{g} and Esref Adali and Kemal Oflazer},
title = {A {MT} System from {Turkmen} to {Turkish} Employing Finite State and Statistical Methods},
url = {
http://research.sabanciuniv.edu/6395/1/MT\_Summit\_XI.pdf},
googlescholar = {7486422021981643377},
booktitle = {Proceedings of the {MT} Summit XI},
year = 2007
}
(Tantug et al., 2007).
Sara Stymne (2011):
Definite Noun Phrases in Statistical Machine Translation into Scandinavian Languages, Proceedings of the 15th International Conference of the European Association for Machine Translation (EAMT)
@inproceedings{eamt11:Stymne,
author = {Sara Stymne},
title = {Definite Noun Phrases in Statistical Machine Translation into {S}candinavian Languages},
pages = {289--296},
booktitle = {Proceedings of the 15th International Conference of the European Association for Machine Translation (EAMT)},
location = {Leuven, Belgium},
editor = {Mikel L. Forcada and Heidi Depraetere and Vincent Vandeghinste},
url = {
http://mt-archive.info/EAMT-2011-Stymne.pdf},
year = 2011
}
Stymne (2011) addresses translation of definite noun phrases into Scandinavian languages where definiteness is expressed either in forms of determiners or noun suffixes.
Translating unknown morphological variants may be learned by analogy to other morphological spelling variations
Langlais, Philippe and Patry, Alexandre (2007):
Translating Unknown Words by Analogical Learning, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)
@InProceedings{langlais-patry:2007:EMNLP-CoNLL2007,
author = {Langlais, Philippe and Patry, Alexandre},
title = {Translating Unknown Words by Analogical Learning},
booktitle = {Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)},
pages = {877--886},
url = {
http://www.aclweb.org/anthology/D/D07/D07-1092},
year = 2007
}
(Langlais and Patry, 2007).
For very closely related languages such as Catalan and Spanish translating not chunks of words but chunks of letters in a phrase-based approach achieves decent results, and addresses very well the problem of unknown words
Vilar, David and Peter, Jan-Thorsten and Ney, Hermann (2007):
Can We Translate Letters?, Proceedings of the Second Workshop on Statistical Machine Translation
@InProceedings{vilar-peter-ney:2007:WMT,
author = {Vilar, David and Peter, Jan-Thorsten and Ney, Hermann},
title = {Can We Translate Letters?},
booktitle = {Proceedings of the Second Workshop on Statistical Machine Translation},
month = {June},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {33--39},
url = {
http://www.aclweb.org/anthology/W/W07/W07-0205},
year = 2007
}
(Vilar et al., 2007).
Benchmarks
Discussion
Related Topics
New Publications
Kirchhoff, Katrin and Tam, Yik-Cheung and Richey, Colleen and Wang, Wen (2015):
Morphological Modeling for Machine Translation of English-Iraqi Arabic Spoken Dialogs, Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
@InProceedings{kirchhoff-EtAl:2015:NAACL-HLT,
author = {Kirchhoff, Katrin and Tam, Yik-Cheung and Richey, Colleen and Wang, Wen},
title = {Morphological Modeling for Machine Translation of English-Iraqi Arabic Spoken Dialogs},
booktitle = {Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
month = {May--June},
address = {Denver, Colorado},
publisher = {Association for Computational Linguistics},
pages = {995--1000},
url = {
http://www.aclweb.org/anthology/N15-1102},
year = 2015
}
Kirchhoff et al. (2015)
Gandhe, Ankur and Gangadharaiah, Rashmi (2013):
Hypothesis Refinement Using Agreement Constraints in Machine Translation, Proceedings of the Sixth International Joint Conference on Natural Language Processing
@InProceedings{gandhe-gangadharaiah:2013:IJCNLP,
author = {Gandhe, Ankur and Gangadharaiah, Rashmi},
title = {Hypothesis Refinement Using Agreement Constraints in Machine Translation},
booktitle = {Proceedings of the Sixth International Joint Conference on Natural Language Processing},
month = {October},
address = {Nagoya, Japan},
publisher = {Asian Federation of Natural Language Processing},
pages = {429--437},
url = {
http://www.aclweb.org/anthology/I13-1049},
year = 2013
}
Gandhe and Gangadharaiah (2013)
Salameh, Mohammad and Cherry, Colin and Kondrak, Grzegorz (2013):
Reversing Morphological Tokenization in English-to-Arabic SMT, Proceedings of the 2013 NAACL HLT Student Research Workshop
@InProceedings{salameh-cherry-kondrak:2013:SRW,
author = {Salameh, Mohammad and Cherry, Colin and Kondrak, Grzegorz},
title = {Reversing Morphological Tokenization in English-to-Arabic SMT},
booktitle = {Proceedings of the 2013 NAACL HLT Student Research Workshop},
month = {June},
address = {Atlanta, Georgia},
publisher = {Association for Computational Linguistics},
pages = {47--53},
url = {
http://www.aclweb.org/anthology/N13-2007},
year = 2013
}
Salameh et al. (2013)
Hassan Al-Haj and Alon Lavie (2010):
The Impact of Arabic Morphological Segmentation on Broad-coverage English-to-Arabic Statistical Machine Translation, Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas
@inproceedings{AMTA-2010-AlHaj,
author = {Hassan Al-Haj and Alon Lavie},
title = {The Impact of Arabic Morphological Segmentation on Broad-coverage {English-to-Arabic} Statistical Machine Translation},
url = {
http://www.mt-archive.info/AMTA-2010-AlHaj.pdf},
booktitle = {Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas},
location = {Denver, Colorado},
year = 2010
}
Al-Haj and Lavie (2010)
Minwoo Jeong and Kristina Toutanova and Hisami Suzuki and Chris Quirk (2010):
A Discriminative Lexicon Model for Complex Morphology, Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas
@inproceedings{AMTA-2010-Jeong,
author = {Minwoo Jeong and Kristina Toutanova and Hisami Suzuki and Chris Quirk},
title = {A Discriminative Lexicon Model for Complex Morphology},
url = {
http://www.mt-archive.info/AMTA-2010-Jeong.pdf},
booktitle = {Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas},
location = {Denver, Colorado},
year = 2010
}
Jeong et al. (2010)
El Kholy, Ahmed and Habash, Nizar (2012):
Rich Morphology Generation Using Statistical Machine Translation, INLG 2012 Proceedings of the Seventh International Natural Language Generation Conference
@InProceedings{elkholy-habash:2012:INLG2012,
author = {El Kholy, Ahmed and Habash, Nizar},
title = {Rich Morphology Generation Using Statistical Machine Translation},
booktitle = {INLG 2012 Proceedings of the Seventh International Natural Language Generation Conference},
month = {May},
address = {Utica, IL},
publisher = {Association for Computational Linguistics},
pages = {90--94},
url = {
http://www.aclweb.org/anthology/W12-1514},
year = 2012
}
Kholy and Habash (2012)
Jerneja Zganec Gros and Stanislav Gruden (2007):
English-Slovenian Statistical Machine Translation: from a Lower- to a Highly-Inflected Language, Proceedings of the MT Summit XI
@inproceedings{Gros:2007:MTSummit,
author = {Jerneja Zganec Gros and Stanislav Gruden},
title = {E}nglish-{Slovenian Statistical Machine Translation: from a Lower- to a Highly-Inflected Language},
booktitle = {Proceedings of the {MT} Summit XI},
year = 2007
}
Gros and Gruden (2007)