Phrase-Based Models
Phrase based models, the workhorse of current statistical machine translation, go beyond word based models by mapping sequences of words.
Phrase Based Models and its 10 sub-topics are the main subject of 263 publications.
Publications
The modern statistical phrase-based models are rooted in work by
Franz Josef Och and Hans Weber (1998):
Improving Statistical Natural Language Translation with Categories and Rules, Proceedings of the 36th Annual Meeting of the Association of Computational Linguistics (ACL)
@Inproceedings{Och:1998,
author = {Franz Josef Och and Hans Weber},
title = {Improving Statistical Natural Language Translation with Categories and Rules},
url = {
http://acl.ldc.upenn.edu/P/P98/P98-2162.pdf},
googlescholar = {10889154217700208648},
booktitle = {Proceedings of the 36th Annual Meeting of the Association of Computational Linguistics (ACL)},
year = 1998
}
Och and Weber (1998);
Franz Josef Och and Christoph Tillmann and Hermann Ney (1999):
Improved Alignment Models for Statistical Machine Translation, Proceedings of the Joint Conference of Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP-VLC)
@Inproceedings{Och:1999Improved,
author = {Franz Josef Och and Christoph Tillmann and Hermann Ney},
title = {Improved Alignment Models for Statistical Machine Translation},
url = {
http://acl.ldc.upenn.edu/W/W99/W99-0604.pdf},
googlescholar = {11280572681759537884},
pages = {20--28},
booktitle = {Proceedings of the Joint Conference of Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP-VLC)},
year = 1999
}
Och et al. (1999);
Och (2002);
Franz Josef Och and Hermann Ney (2004):
The Alignment Template Approach to Statistical Machine Translation, Computational Linguistics
@Article{Och:CL:2004,
author = {Franz Josef Och and Hermann Ney},
title = {The Alignment Template Approach to Statistical Machine Translation},
url = {
http://acl.ldc.upenn.edu/J/J04/J04-4002.pdf},
googlescholar = {13320169434331885071},
journal = {Computational Linguistics},
volume = {30},
number = {4},
year = 2004
}
Och and Ney (2004) on alignment template models. These models defined phrases over word classes that were then instantiated with words.
Translating with the use of phrases in a statistical framework was also proposed by
I. Dan Melamed (1997):
A Portable Algorithm for Mapping Bitext Correspondence, Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL)
mentioned in Phrase Based Models and Sentence Alignment@Inproceedings{Melamed:1997,
author = {I. Dan Melamed},
title = {A Portable Algorithm for Mapping Bitext Correspondence},
url = {
http://acl.ldc.upenn.edu/P/P97/P97-1039.pdf},
googlescholar = {1974725806507584461},
booktitle = {Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL)},
year = 1997
}
Melamed (1997);
Ye-Yi Wang and Alex Waibel (1998):
Modeling with Structures in Statistical Machine translation, Proceedings of the 36th Annual Meeting of the Association of Computational Linguistics (ACL)
mentioned in Phrase Based Models and Inversion Transduction Grammars@Inproceedings{Wang:1998,
author = {Ye-Yi Wang and Alex Waibel},
title = {Modeling with Structures in Statistical Machine translation},
url = {
http://acl.ldc.upenn.edu/P/P98/P98-2221.pdf},
googlescholar = {12385920933414037529},
booktitle = {Proceedings of the 36th Annual Meeting of the Association of Computational Linguistics (ACL)},
year = 1998
}
Wang and Waibel (1998);
Venugopal, Ashish and Vogel, Stephan and Waibel, Alex (2003):
Effective Phrase Translation Extraction from Alignment Models, Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics
@inproceedings{Venugopal:2003,
author = {Venugopal, Ashish and Vogel, Stephan and Waibel, Alex},
title = {Effective Phrase Translation Extraction from Alignment Models},
booktitle = {Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics},
editor = {Erhard Hinrichs and Dan Roth},
url = {
http://www.aclweb.org/anthology/P03-1041.pdf},
pages = {319--326},
year = 2003
}
Venugopal et al. (2003);
Watanabe, Taro and Sumita, Eiichiro and Okuno, Hiroshi G. (2003):
Chunk-Based Statistical Translation, Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics
@inproceedings{Watanabe:2003,
author = {Watanabe, Taro and Sumita, Eiichiro and Okuno, Hiroshi G.},
title = {Chunk-Based Statistical Translation},
booktitle = {Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics},
editor = {Erhard Hinrichs and Dan Roth},
url = {
http://www.aclweb.org/anthology/P03-1039.pdf},
pages = {303--310},
year = 2003
}
Watanabe et al. (2003).
Daniel Marcu (2001):
Towards a Unified Approach to Memory- and Statistical-Based Machine Translation, Proceedings of the 39th Annual Meeting of the Association of Computational Linguistics (ACL)
@Inproceedings{Marcu:2001,
author = {Daniel Marcu},
title = {Towards a Unified Approach to Memory- and Statistical-Based Machine Translation},
url = {
http://acl.ldc.upenn.edu/acl2001/MAIN/MARCU.PDF},
booktitle = {Proceedings of the 39th Annual Meeting of the Association of Computational Linguistics (ACL)},
year = 2001
}
Marcu (2001) proposes the use of phrases within word-based model decoding. The use of log-linear models was proposed by
Franz Josef Och and Hermann Ney (2002):
Discriminative Training and Maximum Entropy Models for Statistical Machine Translation, Proceedings of the 40th Annual Meeting of the Association of Computational Linguistics (ACL)
@InProceedings{Och:2002,
author = {Franz Josef Och and Hermann Ney},
title = {Discriminative Training and Maximum Entropy Models for Statistical Machine Translation},
url = {
http://acl.ldc.upenn.edu/acl2002/MAIN/pdfs/Main074.pdf},
googlescholar = {2845378992177918439},
booktitle = {Proceedings of the 40th Annual Meeting of the Association of Computational Linguistics (ACL)},
year = 2002
}
Och and Ney (2002).
An influential description is presented by
Philipp Koehn and Franz Josef Och and Daniel Marcu (2003):
Statistical Phrase Based Translation, Proceedings of the Joint Conference on Human Language Technologies and the Annual Meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL)
mentioned in Phrase Based Models and Syntactic Coherence@InProceedings{Koehn:2003ph,
author = {Philipp Koehn and Franz Josef Och and Daniel Marcu},
title = {Statistical Phrase Based Translation},
booktitle = {Proceedings of the Joint Conference on Human Language Technologies and the Annual Meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL)},
url = {
http://acl.ldc.upenn.edu/N/N03/N03-1017.pdf},
year = 2003
}
Koehn et al. (2003), which is similar to the model by
Zens et al. (2002);
Richard Zens and Hermann Ney (2004):
Improvements in Phrase-Based Statistical Machine Translation, Proceedings of the Joint Conference on Human Language Technologies and the Annual Meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL)
@Inproceedings{Zens:2004,
author = {Richard Zens and Hermann Ney},
title = {Improvements in Phrase-Based Statistical Machine Translation},
url = {
http://acl.ldc.upenn.edu/hlt-naacl2004/main/pdf/90\_Paper.pdf},
googlescholar = {7560980352665557248},
booktitle = {Proceedings of the Joint Conference on Human Language Technologies and the Annual Meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL)},
year = 2004
}
Zens and Ney (2004).
Alicia Tribble and Stephan Vogel and Alex Waibel (2003):
Overlapping Phrase-level Translation Rules in an SMT Engine, Proceedings of International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE'03)
@inproceedings{Tribble:2003,
author = {Alicia Tribble and Stephan Vogel and Alex Waibel},
title = {Overlapping Phrase-level Translation Rules in an {SMT} Engine},
url = {
http://www.cs.cmu.edu/~atribble/Papers/CMU-Overlap-NLPKE03.pdf},
googlescholar = {5926642286844171853},
booktitle = {Proceedings of International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE'03)},
address = {Beijing, China},
year = 2003
}
Tribble et al. (2003) suggest the use overlapping phrases.
Adam Lopez and Philip Resnik (2006):
Word-Based Alignment, Phrase-Based Translation: What's the Link?, 5th Conference of the Association for Machine Translation in the Americas (AMTA)
mentioned in Phrase Based Models and Word Alignment Evaluation@InProceedings{Lopez:2006:AMTA,
author = {Adam Lopez and Philip Resnik},
title = {Word-Based Alignment, Phrase-Based Translation: What's the Link?},
url = {
http://www.mt-archive.info/AMTA-2006-Lopez.pdf},
googlescholar = {16252070359942137861},
booktitle = {5th Conference of the Association for Machine Translation in the Americas (AMTA)},
month = {August},
address = {Boston, Massachusetts},
year = 2006
}
Lopez and Resnik (2006) shows the contribution of the different components of a phrase-based model.
Benchmarks
Discussion
New Publications
Ojha, Atul Kr and Kumar, Ritesh and Bansal, Akanksha and Rani, Priya (2019):
Panlingua-KMI MT System for Similar Language Translation Task at WMT 2019, Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
mentioned in Phrase Based Models and Neural Network Models@inproceedings{ojha2019panlingua,
author = {Ojha, Atul Kr and Kumar, Ritesh and Bansal, Akanksha and Rani, Priya},
title = {Panlingua-KMI MT System for Similar Language Translation Task at WMT 2019},
booktitle = {Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)},
pages = {213--218},
year = 2019
}
Ojha et al. (2019)
Bogoychev, Nikolay and Hoang, Hieu (2016):
Fast and highly parallelizable phrase table for statistical machine translation, Proceedings of the First Conference on Machine Translation
@InProceedings{bogoychev-hoang:2016:WMT,
author = {Bogoychev, Nikolay and Hoang, Hieu},
title = {Fast and highly parallelizable phrase table for statistical machine translation},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {102--109},
url = {
http://www.aclweb.org/anthology/W/W16/W16-2211},
year = 2016
}
Bogoychev and Hoang (2016)
Nishino, Masaaki and Suzuki, Jun and Nagata, Masaaki (2016):
Phrase Table Pruning via Submodular Function Maximization, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
@InProceedings{nishino-suzuki-nagata:2016:P16-2,
author = {Nishino, Masaaki and Suzuki, Jun and Nagata, Masaaki},
title = {Phrase Table Pruning via Submodular Function Maximization},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {406--411},
url = {
http://anthology.aclweb.org/P16-2066},
year = 2016
}
Nishino et al. (2016)
Cuong, Hoang and Sima'an, Khalil (2014):
Latent Domain Phrase-based Models for Adaptation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
mentioned in Phrase Based Models and Domain Adaptation@InProceedings{cuong-simaan:2014:EMNLP2014,
author = {Cuong, Hoang and Sima'an, Khalil},
title = {Latent Domain Phrase-based Models for Adaptation},
booktitle = {Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
month = {October},
address = {Doha, Qatar},
publisher = {Association for Computational Linguistics},
pages = {566--576},
url = {
http://www.aclweb.org/anthology/D14-1062},
year = 2014
}
Cuong and Sima'an (2014)
Santanu Pal and Tanmoy Chakraborty and Sivaji Bandyopadhyay (2011):
Handling Multiword Expressions in Phrase-Based Statistical Machine Translation, Proceedings of the 13th Machine Translation Summit (MT Summit XIII)
@inproceedings{MTS-2011-Pal,
author = {Santanu Pal and Tanmoy Chakraborty and Sivaji Bandyopadhyay},
title = {Handling Multiword Expressions in Phrase-Based Statistical Machine Translation},
url = {
http://www.mt-archive.info/MTS-2011-Pal.pdf},
pages = {215-224},
booktitle = {Proceedings of the 13th Machine Translation Summit (MT Summit XIII)},
publisher = {International Association for Machine Translation},
location = {Xiamen, China},
year = 2011
}
Pal et al. (2011)
M. Junczys-Dowmunt (2012):
A Phrase Table without Phrases: Rank Encoding for Better Phrase Table Compression, Proceedings of th 16th International Conference of the European Association for Machine Translation (EAMT)
@inproceedings{EAMT-2012-Junczys-Dowmunt,
author = {M. Junczys-Dowmunt},
title = {A Phrase Table without Phrases: Rank Encoding for Better Phrase Table Compression},
url = {
http://www.mt-archive.info/EAMT-2012-Junczys-Dowmunt},
pages = {245-252},
booktitle = {Proceedings of th 16th International Conference of the European Association for Machine Translation (EAMT)},
location = {Trento, Italy},
editor = {Mauro Cettolo and Marcello Federico and Lucia Specia and Andy Way},
year = 2012
}
Junczys-Dowmunt (2012)
Wisniewski, Guillaume and Allauzen, Alexandre and Yvon, François (2010):
Assessing Phrase-Based Translation Models with Oracle Decoding, Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
@InProceedings{wisniewski-allauzen-yvon:2010:EMNLP,
author = {Wisniewski, Guillaume and Allauzen, Alexandre and Yvon, Fran\c{c}ois},
title = {Assessing Phrase-Based Translation Models with Oracle Decoding},
booktitle = {Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing},
month = {October},
address = {Cambridge, MA},
publisher = {Association for Computational Linguistics},
pages = {933--943},
url = {
http://www.aclweb.org/anthology/D/D10/D10-1091},
year = 2010
}
Wisniewski et al. (2010)
George Tambouratzis and Fotini Simistira and Sokratis Sofianopoulos and Nikos Tsimboukakis and Marina Vassiliou (2011):
A resource-light phrase scheme for language-portable MT, Proceedings of the 15th International Conference of the European Association for Machine Translation (EAMT)
@inproceedings{eamt11:Tambouratzis,
author = {George Tambouratzis and Fotini Simistira and Sokratis Sofianopoulos and Nikos Tsimboukakis and Marina Vassiliou},
title = {A resource-light phrase scheme for language-portable {MT}},
pages = {185--192},
booktitle = {Proceedings of the 15th International Conference of the European Association for Machine Translation (EAMT)},
location = {Leuven, Belgium},
editor = {Mikel L. Forcada and Heidi Depraetere and Vincent Vandeghinste},
url = {
http://mt-archive.info/EAMT-2011-Tambouratzis.pdf},
year = 2011
}
Tambouratzis et al. (2011)