Transliteration with Finite State Machines
Since transliteration is a monotone process without reordering, finite state machines have been used in early work.
Transliteration With FSM is the main subject of 9 publications. 8 are discussed here.
Publications
Kevin Knight and Jonathan Graehl (1997):
Machine Transliteration, Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL)
@Inproceedings{Knight:1997,
author = {Kevin Knight and Jonathan Graehl},
title = {Machine Transliteration},
url = {
http://acl.ldc.upenn.edu/P/P97/P97-1017.pdf},
googlescholar = {7841149911590716739},
booktitle = {Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL)},
year = 1997
}
Knight and Graehl (1997) present a model that maps between letters and phoneme representations. Such models may be extended by using a larger Markov window during mappings, i.e. using a larger context
Sung Young Jung and SungLim Hong and Eunok Paek (2000):
An English to Korean Transliteration Model of Extended Markov Window, Proceedings of the International Conference on Computational Linguistics (COLING)
@InProceedings{Jung:2000,
author = {Sung Young Jung and SungLim Hong and Eunok Paek},
title = {An {English} to {Korean} Transliteration Model of Extended {Markov} Window},
url = {
http://www.newdesign.aclweb.org/anthology-new/C/C00/C00-1056.pdf},
booktitle = {Proceedings of the International Conference on Computational Linguistics (COLING)},
year = 2000
}
(Jung et al., 2000).
Charles Schafer (2006):
Novel Probabilistic Finite-State Transducers for Cognate and Transliteration Modeling, 5th Conference of the Association for Machine Translation in the Americas (AMTA)
@InProceedings{Schafer:2006:AMTA,
author = {Charles Schafer},
title = {Novel Probabilistic Finite-State Transducers for Cognate and Transliteration Modeling},
url = {
http://www.mt-archive.info/AMTA-2006-Schafer.pdf},
googlescholar = {10223822532682874230},
booktitle = {5th Conference of the Association for Machine Translation in the Americas (AMTA)},
month = {August},
address = {Boston, Massachusetts},
year = 2006
}
Schafer (2006) compares a number of different finite state transducer architectures. For closely related language pairs, such as Hindi–Urdu, deterministic finite state machines may suffice
Malik, M. G. Abbas and Boitet, Christian and Bhattacharyya, Pushpak (2008):
Hindi Urdu Machine Transliteration using Finite-State Transducers, Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)
@InProceedings{malik-boitet-bhattacharyya:2008:PAPERS,
author = {Malik, M. G. Abbas and Boitet, Christian and Bhattacharyya, Pushpak},
title = {Hindi Urdu Machine Transliteration using Finite-State Transducers},
booktitle = {Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)},
month = {August},
address = {Manchester, UK},
publisher = {Coling 2008 Organizing Committee},
pages = {537--544},
url = {
http://www.aclweb.org/anthology/C08-1068},
year = 2008
}
(Malik et al., 2008).
Transliteration may use either phonetic representation to match characters of different writing systems
Kevin Knight and Jonathan Graehl (1997):
Machine Transliteration, Computational Linguistics
@Article{Knight:1998,
author = {Kevin Knight and Jonathan Graehl},
title = {Machine Transliteration},
url = {
http://acl.ldc.upenn.edu/J/J98/J98-4003.pdf},
journal = {Computational Linguistics},
volume = {24},
number = {4},
year = 1997
}
(Knight and Graehl, 1997) or map characters directly
Zhang, Min and Li, Haizhou and Su, Jian (2004):
Direct Orthographical Mapping for Machine Transliteration , Proceedings of Coling 2004
@inproceedings{Zhang:2004m,
author = {Zhang, Min and Li, Haizhou and Su, Jian},
title = {Direct Orthographical Mapping for Machine Transliteration },
url = {
http://acl.ldc.upenn.edu/coling2004/MAIN/pdf/103-239.pdf},
googlescholar = {9957851749002534275},
booktitle = {Proceedings of Coling 2004 },
editor = {{}},
month = {Aug 23--Aug 27},
address = {Geneva, Switzerland},
publisher = {COLING},
pages = {716--722},
year = 2004
}
(Zhang et al., 2004). Phoneme and grapheme information may be combined
Bilac, Slaven and Tanaka, Hozumi (2004):
A hybrid back-transliteration system for Japanese , Proceedings of Coling 2004
@inproceedings{Bilac:2004,
author = {Bilac, Slaven and Tanaka, Hozumi},
title = {A hybrid back-transliteration system for {J}apanese },
url = {
http://acl.ldc.upenn.edu/C/C04/C04-1086.pdf},
booktitle = {Proceedings of Coling 2004 },
editor = {{}},
month = {Aug 23--Aug 27},
address = {Geneva, Switzerland},
publisher = {COLING},
pages = {597--603},
year = 2004
}
(Bilac and Tanaka, 2004). Given small training corpora, using phonetic representations may be more robust
Yoon, Su-Youn and Kim, Kyoung-Young and Sproat, Richard (2007):
Multilingual Transliteration Using Feature based Phonetic Method, Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
mentioned in Transliteration With FSM and Transliteration Training Data@InProceedings{yoon-kim-sproat:2007:ACLMain,
author = {Yoon, Su-Youn and Kim, Kyoung-Young and Sproat, Richard},
title = {Multilingual Transliteration Using Feature based Phonetic Method},
booktitle = {Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics},
month = {June},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {112--119},
url = {
http://www.aclweb.org/anthology/P/P07/P07-1015},
year = 2007
}
(Yoon et al., 2007).
Benchmarks
Discussion
Related Topics
New Publications
Knight, Kevin (2009):
Automata for Transliteration and Machine Translation, Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)
@InProceedings{knight:2009:NEWS,
author = {Knight, Kevin},
title = {Automata for Transliteration and Machine Translation},
booktitle = {Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)},
month = {August},
address = {Suntec, Singapore},
publisher = {Association for Computational Linguistics},
pages = {27},
url = {
http://www.aclweb.org/anthology/W/W09/W09-3503},
year = 2009
}
Knight (2009)