Transliteration with Other Methods
A number of different machine learning methods have been applied to the transliteration task.
Transliteration With Other Methods is the main subject of 31 publications. 12 are discussed here.
Publications
Improvements have been shown when including context information in a maximum entropy model
Isao Goto and Naoto Kato and N. Uratani and Terumasa Ehara (2003):
Transliteration Considering Context Information Based on the Maximum Entropy Method, Proceedings of the MT Summit IX
@inproceedings{Goto:2003,
author = {Isao Goto and Naoto Kato and N. Uratani and Terumasa Ehara},
title = { Transliteration Considering Context Information Based on the Maximum Entropy Method},
url = {
http://www.mt-archive.info/MTS-2003-Goto-1.pdf},
googlescholar = {8281228346154003233},
booktitle = {Proceedings of the {MT} Summit IX},
year = 2003
}
(Goto et al., 2003;
Goto, Isao and Kato, Naoto and Ehara, Terumasa and Tanaka, Hideki (2004):
Back Transliteration from Japanese to English using Target English Context , Proceedings of Coling 2004
@inproceedings{Goto:2004,
author = {Goto, Isao and Kato, Naoto and Ehara, Terumasa and Tanaka, Hideki},
title = {Back Transliteration from {Japanese} to {English} using Target {English} Context },
url = {
http://acl.ldc.upenn.edu/coling2004/MAIN/pdf/119-360.pdf},
booktitle = {Proceedings of Coling 2004 },
editor = {{}},
month = {Aug 23--Aug 27},
address = {Geneva, Switzerland},
publisher = {COLING},
pages = {827--833},
year = 2004
}
Goto et al., 2004). Other machine learning methods such as discriminative training with the perceptron algorithm have been applied to transliteration
Zelenko, Dmitry and Aone, Chinatsu (2006):
Discriminative Methods for Transliteration, Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
@InProceedings{zelenko-aone:2006:EMNLP,
author = {Zelenko, Dmitry and Aone, Chinatsu},
title = {Discriminative Methods for Transliteration},
booktitle = {Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing},
month = {July},
address = {Sydney, Australia},
publisher = {Association for Computational Linguistics},
pages = {612--617},
url = {
http://www.aclweb.org/anthology/W/W06/W06-1672},
year = 2006
}
(Zelenko and Aone, 2006).
Jong-Hoon Oh and Key-Sun Choi (2002):
An English-Korean Transliteration Model Using Pronunciation and Contextual Rules, Proceedings of the International Conference on Computational Linguistics (COLING)
@InProceedings{Oh:2002,
author = {Jong-Hoon Oh and Key-Sun Choi},
title = {An {English-Korean} Transliteration Model Using Pronunciation and Contextual Rules},
url = {
http://acl.ldc.upenn.edu/coling2002/proceedings/data/area-17/co-264.pdf},
googlescholar = {4615448858746426369},
booktitle = {Proceedings of the International Conference on Computational Linguistics (COLING)},
year = 2002
}
Oh and Choi (2002) use a pipeline of different approaches for the various mapping stages.
Wei-Hao Lin and Hsin-Hsi Chen (2002):
Backward Machine Transliteration by Learning Phonetic Similarity, Proceedings of the Conference on Natural Language Learning (CoNLL)
@Inproceedings{Lin:2002,
author = {Wei-Hao Lin and Hsin-Hsi Chen},
title = {Backward Machine Transliteration by Learning Phonetic Similarity},
url = {
http://acl.ldc.upenn.edu/W/W02/W02-2017.pdf},
googlescholar = {6290493633163939852},
booktitle = {Proceedings of the Conference on Natural Language Learning (CoNLL)},
year = 2002
}
Lin and Chen (2002) present a gradient descent method to map phoneme sequences. Using phoneme and letter chunks may lead to better performance
In-Ho Kang and Gil Chang Kim (2000):
English-to-Korean Transliteration using Multiple Unbounded Overlapping Phoneme Chunks, Proceedings of the International Conference on Computational Linguistics (COLING)
@InProceedings{Kang:2000,
author = {In-Ho Kang and Gil Chang Kim},
title = {{English-to-Korean} Transliteration using Multiple Unbounded Overlapping Phoneme Chunks},
url = {
http://ucrel.lancs.ac.uk/acl/C/C00/C00-1061.pdf},
googlescholar = {4360095556375493324},
booktitle = {Proceedings of the International Conference on Computational Linguistics (COLING)},
year = 2000
}
(Kang and Kim, 2000).
Li, Haizhou and Zhang, Min and Su, Jian (2004):
A Joint Source-Channel Model for Machine Transliteration, Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL'04), Main Volume
@inproceedings{Li:2004,
author = {Li, Haizhou and Zhang, Min and Su, Jian},
title = {A Joint Source-Channel Model for Machine Transliteration},
url = {
http://acl.ldc.upenn.edu/acl2004/main/pdf/121\_pdf\_2-col.pdf},
booktitle = {Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL'04), Main Volume},
month = {July},
address = {Barcelona, Spain},
pages = {159--166},
year = 2004
}
Li et al. (2004) propose a method similar to the joint model for phrase-based machine translation. Such a substring mapping model has also been explored by others
Ekbal, Asif and Naskar, Sudip Kumar and Bandyopadhyay, Sivaji (2006):
A Modified Joint Source-Channel Model for Transliteration, Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions
@InProceedings{ekbal-naskar-bandyopadhyay:2006:POS,
author = {Ekbal, Asif and Naskar, Sudip Kumar and Bandyopadhyay, Sivaji},
title = {A Modified Joint Source-Channel Model for Transliteration},
booktitle = {Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions},
month = {July},
address = {Sydney, Australia},
publisher = {Association for Computational Linguistics},
pages = {191--198},
url = {
http://www.aclweb.org/anthology/P/P06/P06-2025},
year = 2006
}
(Ekbal et al., 2006), and may be implemented as bi-stream HMM
Zhao, Bing and Bach, Nguyen and Lane, Ian and Vogel, Stephan (2007):
A Log-Linear Block Transliteration Model based on Bi-Stream HMMs, Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference
@InProceedings{zhao-EtAl:2007:main,
author = {Zhao, Bing and Bach, Nguyen and Lane, Ian and Vogel, Stephan},
title = {A Log-Linear Block Transliteration Model based on Bi-Stream {HMM}s},
booktitle = {Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference},
month = {April},
address = {Rochester, New York},
publisher = {Association for Computational Linguistics},
pages = {364--371},
url = {
http://www.aclweb.org/anthology/N/N07/N07-1046},
year = 2007
}
(Zhao et al., 2007) or adapting a monotone phrase-based translation model
Sherif, Tarek and Kondrak, Grzegorz (2007):
Substring-Based Transliteration, Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
@InProceedings{sherif-kondrak:2007:ACLMain2,
author = {Sherif, Tarek and Kondrak, Grzegorz},
title = {Substring-Based Transliteration},
booktitle = {Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics},
month = {June},
address = {Prague, Czech Republic},
publisher = {Association for Computational Linguistics},
pages = {944--951},
url = {
http://www.aclweb.org/anthology/P/P07/P07-1119},
year = 2007
}
(Sherif and Kondrak, 2007). For different scripts for the same language, rule-based approaches typically suffice
Malik, M. G. Abbas (2006):
Punjabi Machine Transliteration, Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
@InProceedings{malik:2006:COLACL,
author = {Malik, M. G. Abbas},
title = {Punjabi Machine Transliteration},
booktitle = {Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics},
month = {July},
address = {Sydney, Australia},
publisher = {Association for Computational Linguistics},
pages = {1137--1144},
url = {
http://www.aclweb.org/anthology/P/P06/P06-1143},
year = 2006
}
(Malik, 2006). Ensemble learning, i.e., the combination of multiple transliteration engines, has been shown to be successful
Jong-Hoon Oh and Hitoshi Isahara (2007):
Machine Transliteration Using Multiple Transliteration Engines and Hypothesis Re-Ranking, Proceedings of the MT Summit XI
@inproceedings{Oh:2007:MTSummit,
author = {Jong-Hoon Oh and Hitoshi Isahara},
title = {Machine Transliteration Using Multiple Transliteration Engines and Hypothesis Re-Ranking},
url = {
http://www.mt-archive.info/MTS-2007-Oh.pdf},
googlescholar = {7932891967968576433},
booktitle = {Proceedings of the {MT} Summit XI},
year = 2007
}
(Oh and Isahara, 2007).
Benchmarks
Discussion
Related Topics
New Publications
Finch, Andrew and Liu, Lemao and Wang, Xiaolin and Sumita, Eiichiro (2015):
Neural Network Transduction Models in Transliteration Generation, Proceedings of the Fifth Named Entity Workshop
@InProceedings{finch-EtAl:2015:NEWS2015,
author = {Finch, Andrew and Liu, Lemao and Wang, Xiaolin and Sumita, Eiichiro},
title = {Neural Network Transduction Models in Transliteration Generation},
booktitle = {Proceedings of the Fifth Named Entity Workshop},
month = {July},
address = {Beijing, China},
publisher = {Association for Computational Linguistics},
pages = {61--66},
url = {
http://www.aclweb.org/anthology/W15-3909},
year = 2015
}
Finch et al. (2015)
Ammar, Waleed and Dyer, Chris and Smith, Noah (2012):
Transliteration by Sequence Labeling with Lattice Encodings and Reranking, Proceedings of the 4th Named Entity Workshop (NEWS) 2012
@InProceedings{ammar-dyer-smith:2012:NEWS2012,
author = {Ammar, Waleed and Dyer, Chris and Smith, Noah},
title = {Transliteration by Sequence Labeling with Lattice Encodings and Reranking},
booktitle = {Proceedings of the 4th Named Entity Workshop (NEWS) 2012},
month = {July},
address = {Jeju, Korea},
publisher = {Association for Computational Linguistics},
pages = {66--70},
url = {
http://www.aclweb.org/anthology/W12-4410},
year = 2012
}
Ammar et al. (2012)
Hagiwara, Masato and Sekine, Satoshi (2012):
Latent Semantic Transliteration using Dirichlet Mixture, Proceedings of the 4th Named Entity Workshop (NEWS) 2012
@InProceedings{hagiwara-sekine:2012:NEWS2012,
author = {Hagiwara, Masato and Sekine, Satoshi},
title = {Latent Semantic Transliteration using Dirichlet Mixture},
booktitle = {Proceedings of the 4th Named Entity Workshop (NEWS) 2012},
month = {July},
address = {Jeju, Korea},
publisher = {Association for Computational Linguistics},
pages = {30--37},
url = {
http://www.aclweb.org/anthology/W12-4404},
year = 2012
}
Hagiwara and Sekine (2012)
Kuo, Chan-Hung and Liu, Shih-Hung and Jiang, Mike Tian-Jian and Lee, Cheng-Wei and Hsu, Wen-Lian (2012):
Cost-benefit Analysis of Two-Stage Conditional Random Fields based English-to-Chinese Machine Transliteration, Proceedings of the 4th Named Entity Workshop (NEWS) 2012
@InProceedings{kuo-EtAl:2012:NEWS2012,
author = {Kuo, Chan-Hung and Liu, Shih-Hung and Jiang, Mike Tian-Jian and Lee, Cheng-Wei and Hsu, Wen-Lian},
title = {Cost-benefit Analysis of Two-Stage Conditional Random Fields based English-to-Chinese Machine Transliteration},
booktitle = {Proceedings of the 4th Named Entity Workshop (NEWS) 2012},
month = {July},
address = {Jeju, Korea},
publisher = {Association for Computational Linguistics},
pages = {76--80},
url = {
http://www.aclweb.org/anthology/W12-4412},
year = 2012
}
Kuo et al. (2012)
Bhargava, Aditya and Kondrak, Grzegorz (2011):
How do you pronounce your name? Improving G2P with transliterations, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies
@InProceedings{bhargava-kondrak:2011:ACL-HLT2011,
author = {Bhargava, Aditya and Kondrak, Grzegorz},
title = {How do you pronounce your name? Improving G2P with transliterations},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies},
month = {June},
address = {Portland, Oregon, USA},
publisher = {Association for Computational Linguistics},
pages = {399--408},
url = {
http://www.aclweb.org/anthology/P11-1041},
year = 2011
}
Bhargava and Kondrak (2011)
Hagiwara, Masato and Sekine, Satoshi (2011):
Latent Class Transliteration based on Source Language Origin, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies
@InProceedings{hagiwara-sekine:2011:ACL-HLT2011,
author = {Hagiwara, Masato and Sekine, Satoshi},
title = {Latent Class Transliteration based on Source Language Origin},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies},
month = {June},
address = {Portland, Oregon, USA},
publisher = {Association for Computational Linguistics},
pages = {53--57},
url = {
http://www.aclweb.org/anthology/P11-2010},
year = 2011
}
Hagiwara and Sekine (2011)
Ravi, Sujith and Knight, Kevin (2009):
Learning Phoneme Mappings for Transliteration without Parallel Data, Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
@InProceedings{ravi-knight:2009:NAACLHLT09,
author = {Ravi, Sujith and Knight, Kevin},
title = {Learning Phoneme Mappings for Transliteration without Parallel Data},
booktitle = {Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics},
month = {June},
address = {Boulder, Colorado},
publisher = {Association for Computational Linguistics},
pages = {37--45},
url = {
http://www.aclweb.org/anthology/N/N09/N09-1005},
year = 2009
}
Ravi and Knight (2009)
Cherry, Colin and Suzuki, Hisami (2009):
Discriminative Substring Decoding for Transliteration, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
@InProceedings{cherry-suzuki:2009:EMNLP,
author = {Cherry, Colin and Suzuki, Hisami},
title = {Discriminative Substring Decoding for Transliteration},
booktitle = {Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing},
month = {August},
address = {Singapore},
publisher = {Association for Computational Linguistics},
pages = {1066--1075},
url = {
http://www.aclweb.org/anthology/D/D09/D09-1111},
year = 2009
}
Cherry and Suzuki (2009)
Li, Haizhou and Kumaran, A and Pervouchine, Vladimir and Zhang, Min (2009):
Report of NEWS 2009 Machine Transliteration Shared Task, Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)
@InProceedings{li-EtAl:2009:NEWS1,
author = {Li, Haizhou and Kumaran, A and Pervouchine, Vladimir and Zhang, Min},
title = {Report of NEWS 2009 Machine Transliteration Shared Task},
booktitle = {Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)},
month = {August},
address = {Suntec, Singapore},
publisher = {Association for Computational Linguistics},
pages = {1--18},
url = {
http://www.aclweb.org/anthology/W/W09/W09-3501},
year = 2009
}
Li et al. (2009)
Li, Haizhou and Kumaran, A and Zhang, Min and Pervouchine, Vladimir (2009):
Whitepaper of NEWS 2009 Machine Transliteration Shared Task, Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)
@InProceedings{li-EtAl:2009:NEWS2,
author = {Li, Haizhou and Kumaran, A and Zhang, Min and Pervouchine, Vladimir},
title = {Whitepaper of NEWS 2009 Machine Transliteration Shared Task},
booktitle = {Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)},
month = {August},
address = {Suntec, Singapore},
publisher = {Association for Computational Linguistics},
pages = {19--26},
url = {
http://www.aclweb.org/anthology/W/W09/W09-3502},
year = 2009
}
Li et al. (2009)
Durrani, Nadir and Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut (2010):
Hindi-to-Urdu Machine Translation through Transliteration, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
@InProceedings{durrani-EtAl:2010:ACL,
author = {Durrani, Nadir and Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut},
title = {Hindi-to-Urdu Machine Translation through Transliteration},
booktitle = {Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics},
month = {July},
address = {Uppsala, Sweden},
publisher = {Association for Computational Linguistics},
pages = {465--474},
url = {
http://www.aclweb.org/anthology/P10-1048},
year = 2010
}
Durrani et al. (2010)
Andrew Finch and Eiichiro Sumita (2010):
A Bayesian Model of Bilingual Segmentation for Transliteration, Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT)
@inproceedings{iwslt10:TP:finch,
author = {Andrew Finch and Eiichiro Sumita},
title = {{A Bayesian Model of Bilingual Segmentation for Transliteration}},
url = {
http://mt-archive.info/IWSLT-2010-Finch.pdf},
googlescholar = {17895819961096651529},
editor = {Marcello Federico and Ian Lane and Michael Paul and Fran\c{c}ois Yvon},
booktitle = {Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT)},
pages = {259--266},
location = {Paris, France},
year = 2010
}
Finch and Sumita (2010)
Kirschenbaum, Amit and Wintner, Shuly (2009):
Lightly Supervised Transliteration for Machine Translation, Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)
@InProceedings{kirschenbaum-wintner:2009:EACL,
author = {Kirschenbaum, Amit and Wintner, Shuly},
title = {Lightly Supervised Transliteration for Machine Translation},
booktitle = {Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)},
month = {March},
address = {Athens, Greece},
publisher = {Association for Computational Linguistics},
pages = {433--441},
url = {
http://www.aclweb.org/anthology/E09-1050},
year = 2009
}
Kirschenbaum and Wintner (2009)
Li, Zhifei and Yarowsky, David (2008):
Unsupervised Translation Induction for Chinese Abbreviations using Monolingual Corpora, Proceedings of ACL-08: HLT
@InProceedings{li-yarowsky:2008:ACLMain,
author = {Li, Zhifei and Yarowsky, David},
title = {Unsupervised Translation Induction for {Chinese} Abbreviations using Monolingual Corpora},
booktitle = {Proceedings of ACL-08: HLT},
month = {June},
address = {Columbus, Ohio},
publisher = {Association for Computational Linguistics},
pages = {425--433},
url = {
http://www.aclweb.org/anthology/P/P08/P08-1049},
year = 2008
}
Li and Yarowsky (2008)
Eiji Aramaki and Takeshi Imai and Kengo Miyo and Kazuhiko Ohe (2008):
Orthographic Disambiguation Incorporating Transliterated Probability , Proceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP)
@inproceedings{Aramaki:2008:IJCNLP,
author = {Eiji Aramaki and Takeshi Imai and Kengo Miyo and Kazuhiko Ohe},
title = {Orthographic Disambiguation Incorporating Transliterated Probability },
url = {
http://luululu.sakura.ne.jp/paper/2008/ijcnlp-2008.pdf},
googlescholar = {16969379757232595440},
booktitle = {Proceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP)},
year = 2008
}
Aramaki et al. (2008)
Jin-Shea Kuo and Haizhou Li (2008):
Multi-View Co-Training of Transliteration Model , Proceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP)
@inproceedings{Kuo:2008:IJCNLP,
author = {Jin-Shea Kuo and Haizhou Li},
title = {Multi-View Co-Training of Transliteration Model },
url = {
http://www.newdesign.aclweb.org/anthology/I/I08/I08-1049.pdf},
googlescholar = {6262636896883489275},
booktitle = {Proceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP)},
year = 2008
}
Kuo and Li (2008)
Harshit Surana and Anil Kumar Singh (2008):
A More Discerning and Adaptable Multilingual Transliteration Mechanism for Indian Languages , Proceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP)
@inproceedings{Surana:2008:IJCNLP,
author = {Harshit Surana and Anil Kumar Singh},
title = {A More Discerning and Adaptable Multilingual Transliteration Mechanism for {I}ndian Languages },
url = {
http://oldsite.aclweb.org/anthology-new/I/I08/I08-1009.pdf},
googlescholar = {12688632788038861730},
booktitle = {Proceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP)},
year = 2008
}
Surana and Singh (2008)
Liao, Shasha (2008):
Combining Source and Target Language Information for Name Tagging of Machine Translation Output, Proceedings of the ACL-08: HLT Student Research Workshop
@InProceedings{liao:2008:SRW,
author = {Liao, Shasha},
title = {Combining Source and Target Language Information for Name Tagging of Machine Translation Output},
booktitle = {Proceedings of the ACL-08: HLT Student Research Workshop},
month = {June},
address = {Columbus, Ohio},
publisher = {Association for Computational Linguistics},
pages = {19--24},
url = {
http://www.aclweb.org/anthology/P/P08/P08-3004},
year = 2008
}
Liao (2008)
Andrew Finch and Eiichiro Sumita (2008):
Phrase-based Machine Transliteration , Proceedings of the Workshop on Technologies and Corpora for Asia-Pacific Speech Translation (TCAST)
@inproceedings{Finch:2008:IJCNLP,
author = {Andrew Finch and Eiichiro Sumita},
title = {Phrase-based Machine Transliteration },
url = {
http://www.mt-archive.info/IJCNLP-2008-Finch.pdf},
googlescholar = {2800070757021282496},
booktitle = {Proceedings of the Workshop on Technologies and Corpora for Asia-Pacific Speech Translation (TCAST)},
year = 2008
}
Finch and Sumita (2008)