Comparable Corpora
A comparable corpus is a pair of corpora in two different languages, which come from the same domain.
Comparable Corpora is the main subject of 33 publications. 12 are discussed here.
Publications
Parallel sentences may also be mined from comparable corpora such as news stories written on the same topic in different languages.
Munteanu, Dragos Stefan and Marcu, Daniel (2002):
Processing Comparable Corpora With Bilingual Suffix Trees, Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)
mentioned in Comparable Corpora, Truecasing and Spelling Correction@inproceedings{Munteanu:2002,
author = {Munteanu, Dragos Stefan and Marcu, Daniel},
title = {Processing Comparable Corpora With Bilingual Suffix Trees},
booktitle = {Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)},
url = {
http://aclweb.org/anthology//W/W02/W02-1037.pdf},
month = {July},
address = {Philadelphia},
publisher = {Association for Computational Linguistics},
pages = {289--295},
year = 2002
}
Munteanu and Marcu (2002) uses suffix trees, and in later work log-likelyhood ratios
Dragos Stefan Munteanu and Alexander Fraser and Daniel Marcu (2004):
Improved Machine Translation Performance via Parallel Sentence Extraction from Comparable Corpora, Proceedings of the Joint Conference on Human Language Technologies and the Annual Meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL)
@Inproceedings{Munteanu:2004,
author = {Dragos Stefan Munteanu and Alexander Fraser and Daniel Marcu},
title = {Improved Machine Translation Performance via Parallel Sentence Extraction from Comparable Corpora},
url = {
http://acl.ldc.upenn.edu/hlt-naacl2004/main/pdf/93\_Paper.pdf},
googlescholar = {13931404674250458886},
booktitle = {Proceedings of the Joint Conference on Human Language Technologies and the Annual Meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL)},
year = 2004
}
(Munteanu et al., 2004;
Dragos Stefan Munteanu and Daniel Marcu (2005):
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora, Computational Linguistics
@Article{Munteanu:CL:2005,
author = {Dragos Stefan Munteanu and Daniel Marcu},
title = {Improving Machine Translation Performance by Exploiting Non-Parallel Corpora},
url = {
http://acl.ldc.upenn.edu/J/J05/J05-4003.pdf?origin=publication\_detail},
googlescholar = {15197760803213593510},
journal = {Computational Linguistics},
volume = {31},
number = {4},
year = 2005
}
Munteanu and Marcu, 2005), to detect parallel sentences.
Abdul-Rauf, Sadaf and Schwenk, Holger (2009):
On the Use of Comparable Corpora to Improve SMT performance, Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)
@InProceedings{abdulrauf-schwenk:2009:EACL,
author = {Abdul-Rauf, Sadaf and Schwenk, Holger},
title = {On the Use of Comparable Corpora to Improve {SMT} performance},
booktitle = {Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)},
month = {March},
address = {Athens, Greece},
publisher = {Association for Computational Linguistics},
pages = {16--23},
url = {
http://www.aclweb.org/anthology/E09-1003},
year = 2009
}
Abdul-Rauf and Schwenk (2009);
Abdul Rauf, Sadaf and Schwenk, Holger (2009):
Exploiting Comparable Corpora with TER and TERp, Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
@InProceedings{abdulrauf-schwenk:2009:BUCC,
author = {Abdul Rauf, Sadaf and Schwenk, Holger},
title = {Exploiting Comparable Corpora with TER and TERp},
booktitle = {Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora},
month = {August},
address = {Singapore},
publisher = {Association for Computational Linguistics},
pages = {46--54},
url = {
http://www.aclweb.org/anthology/W/W09/W09-3109},
year = 2009
}
Rauf and Schwenk (2009);
Sadaf Abdul Rauf and Holger Schwenk (2011):
Parallel sentence generation from comparable corpora for improved SMT, Machine Translation
@article{MTJ:2011:Rauf,
author = {Sadaf Abdul Rauf and Holger Schwenk},
title = {Parallel sentence generation from comparable corpora for improved {SMT}},
pages = {341-375},
journal = {Machine Translation},
volume = {25},
number = {4},
month = {December},
year = 2011
}
Rauf and Schwenk (2011) translate one side of the comparable corpus into the other language, use information retrieval methods to find matching sentences and use the TER metric to measure their similarity.
D. \,Stef\uanescu and R. Ion and S. Hunsicker (2012):
Hybrid Parallel Sentence Mining from Comparable Corpora, Proceedings of the 16th International Conference of the European Association for Machine Translation (EAMT)
@inproceedings{EAMT-2012-Stefanescu,
author = {D. \,{S}tef\u{a}nescu and R. Ion and S. Hunsicker},
title = {Hybrid Parallel Sentence Mining from Comparable Corpora},
url = {
http://www.mt-archive.info/EAMT-2012-Stefanescu},
pages = {137-144},
booktitle = {Proceedings of the 16th International Conference of the European Association for Machine Translation (EAMT)},
location = {Trento, Italy},
editor = {Mauro Cettolo and Marcello Federico and Lucia Specia and Andy Way},
year = 2012
}
\,Stef\uanescu et al. (2012) report improvements with a more complex sentence similarity measure.
Instead of full sentences, parallel sentence fragments may be extracted from comparable corpora
Munteanu, Dragos Stefan and Marcu, Daniel (2006):
Extracting Parallel Sub-Sentential Fragments from Non-Parallel Corpora, Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
@InProceedings{munteanu-marcu:2006:COLACL,
author = {Munteanu, Dragos Stefan and Marcu, Daniel},
title = {Extracting Parallel Sub-Sentential Fragments from Non-Parallel Corpora},
booktitle = {Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics},
month = {July},
address = {Sydney, Australia},
publisher = {Association for Computational Linguistics},
pages = {81--88},
url = {
http://www.aclweb.org/anthology/P/P06/P06-1011},
year = 2006
}
(Munteanu and Marcu, 2006). Methods have been proposed to extract matching phrases
Takaaki Tanaka (2002):
Measuring the Similarity between Compound Nouns in Different Languages Using Non-Parallel Corpora, Proceedings of the International Conference on Computational Linguistics (COLING)
@InProceedings{Tanaka:2002,
author = {Takaaki Tanaka},
title = {Measuring the Similarity between Compound Nouns in Different Languages Using Non-Parallel Corpora},
url = {
http://acl.ldc.upenn.edu/coling2002/proceedings/data/area-13/co-179.pdf},
googlescholar = {15794656224648594415},
booktitle = {Proceedings of the International Conference on Computational Linguistics (COLING)},
year = 2002
}
(Tanaka, 2002) or web pages
Smith, Noah A. (2002):
From Words to Corpora: Recognizing Translation, Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)
@inproceedings{Smith:2002,
author = {Smith, Noah A.},
title = {From Words to Corpora: Recognizing Translation},
url = {
http://acl.ldc.upenn.edu/W/W02/W02-1013.pdf},
googlescholar = {18379354085355431073},
booktitle = {Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)},
month = {July},
address = {Philadelphia},
publisher = {Association for Computational Linguistics},
pages = {95--102},
year = 2002
}
(Smith, 2002) from such large collections.
Chris Quirk and Raghavendra Udupa and Arul Menezes (2007):
Generative Models of Noisy Translations with Applications to Parallel Fragment Extraction, Proceedings of the MT Summit XI
@inproceedings{Quirk:2007:MTSummit,
author = {Chris Quirk and Raghavendra Udupa and Arul Menezes},
title = {Generative Models of Noisy Translations with Applications to Parallel Fragment Extraction},
url = {
http://charlesneedham.com/en-us/projects/mt/mtsummit2007\_compcorp.pdf},
googlescholar = {9551046180396418551},
booktitle = {Proceedings of the {MT} Summit XI},
year = 2007
}
Quirk et al. (2007) propose a generative model for the same task.
Hewavitharana, Sanjika and Vogel, Stephan (2011):
Extracting Parallel Phrases from Comparable Data, Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
@InProceedings{hewavitharana-vogel:2011:BUCC,
author = {Hewavitharana, Sanjika and Vogel, Stephan},
title = {Extracting Parallel Phrases from Comparable Data},
booktitle = {Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web},
month = {June},
address = {Portland, Oregon},
publisher = {Association for Computational Linguistics},
pages = {61--68},
url = {
http://www.aclweb.org/anthology/W11-1209},
year = 2011
}
Hewavitharana and Vogel (2011) extract phrase pairs from comparable corpora, using a classifier approach.
Benchmarks
Discussion
Related Topics
The transition from parallel corpora over noisy corpora that require cleaning all the way to comparable corpora is fluent. A special topic is the extraction of bilingual dictionaries from comparable corpora. A comparable corpus is always a pair of two monolingual corpora. The target-side monolingual corpus may be used for training language models and the source-side monolingual corpus may be used for some domain adaptation methods.
New Publications
Viktor Hangya and Fabienne Braune and Yuliya Kalasouskaya and Alexander Fraser (2018):
Unsupervised Parallel Sentence Extraction from Comparable Corpora, Proceedings of the International Workshop on Spoken Language Translation (IWSLT)
@inproceedings{iwslt18-Unsupervised-Hangya,
author = {Viktor Hangya and Fabienne Braune and Yuliya Kalasouskaya and Alexander Fraser},
title = {Unsupervised Parallel Sentence Extraction from Comparable Corpora},
booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)},
year = 2018
}
Hangya et al. (2018)
Marie, Benjamin and Fujita, Atsushi (2017):
Phrase Table Induction Using In-Domain Monolingual Data for Domain Adaptation in Statistical Machine Translation, Transactions of the Association for Computational Linguistics
@article{TACL1166,
author = {Marie, Benjamin and Fujita, Atsushi },
title = {Phrase Table Induction Using In-Domain Monolingual Data for Domain Adaptation in Statistical Machine Translation},
journal = {Transactions of the Association for Computational Linguistics},
volume = {5},
keywords = {{}},
issn = {2307-387X},
url = {
https://transacl.org/ojs/index.php/tacl/article/view/1166},
pages = {487--500},
year = 2017
}
Marie and Fujita (2017)
Tufiş, Dan and Ion, Radu and Dumitrescu, Stefan and Stefanescu, Dan (2013):
Wikipedia as an SMT Training Corpus, Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013
@InProceedings{tufics-EtAl:2013:RANLP-2013,
author = {Tufi\c{s}, Dan and Ion, Radu and Dumitrescu, Stefan and Stefanescu, Dan},
title = {Wikipedia as an {SMT} Training Corpus},
booktitle = {Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013},
month = {September},
address = {Hissar, Bulgaria},
publisher = {INCOMA Ltd. Shoumen, BULGARIA},
pages = {702--709},
url = {
http://www.aclweb.org/anthology/R13-1091},
year = 2013
}
Tufiş et al. (2013)
Rios, Miguel and Sharoff, Serge (2015):
Obtaining SMT dictionaries for related languages, Proceedings of the Eighth Workshop on Building and Using Comparable Corpora
@InProceedings{rios-sharoff:2015:BUCC,
author = {Rios, Miguel and Sharoff, Serge},
title = {Obtaining {SMT} dictionaries for related languages},
booktitle = {Proceedings of the Eighth Workshop on Building and Using Comparable Corpora},
month = {July},
address = {Beijing, China},
publisher = {Association for Computational Linguistics},
pages = {68--73},
url = {
http://www.aclweb.org/anthology/W15-3410},
year = 2015
}
Rios and Sharoff (2015)
Seo, Hyeong-Won and Cheon, Minah and Kim, Jae-Hoon (2015):
Extracting Bilingual Lexica from Comparable Corpora Using Self-Organizing Maps, Proceedings of the Eighth Workshop on Building and Using Comparable Corpora
@InProceedings{seo-cheon-kim:2015:BUCC,
author = {Seo, Hyeong-Won and Cheon, Minah and Kim, Jae-Hoon},
title = {Extracting Bilingual Lexica from Comparable Corpora Using Self-Organizing Maps},
booktitle = {Proceedings of the Eighth Workshop on Building and Using Comparable Corpora},
month = {July},
address = {Beijing, China},
publisher = {Association for Computational Linguistics},
pages = {62--67},
url = {
http://www.aclweb.org/anthology/W15-3409},
year = 2015
}
Seo et al. (2015)
Rapp, Reinhard (2015):
A Methodology for Bilingual Lexicon Extraction from Comparable Corpora, Proceedings of the Fourth Workshop on Hybrid Approaches to Translation (HyTra)
@InProceedings{rapp:2015:HyTra-4,
author = {Rapp, Reinhard},
title = {A Methodology for Bilingual Lexicon Extraction from Comparable Corpora},
booktitle = {Proceedings of the Fourth Workshop on Hybrid Approaches to Translation (HyTra)},
month = {July},
address = {Beijing},
publisher = {Association for Computational Linguistics},
pages = {46--55},
url = {
http://www.aclweb.org/anthology/W15-4108},
year = 2015
}
Rapp (2015)
Krzysztof Wolk and Krzysztof Marasek (2015):
Unsupervised comparable corpora preparation and exploration for bi-lingual translation equivalents, Proceedings of the International Workshop on Spoken Language Translation (IWSLT)
@inproceedings{IWSLT-2015-Wolk-2,
author = {Krzysztof Wolk and Krzysztof Marasek},
title = {Unsupervised comparable corpora preparation and exploration for bi-lingual translation equivalents},
pages = {118-125},
booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)},
location = {Da Nang, Vietnam},
url = {
http://www.mt-archive.info/15/IWSLT-2015-wolk-2.pdf},
month = {December},
year = 2015
}
Wolk and Marasek (2015)
Krstovski, Kriste and Smith, David (2016):
Bootstrapping Translation Detection and Sentence Extraction from Comparable Corpora, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
@InProceedings{krstovski-smith:2016:N16-1,
author = {Krstovski, Kriste and Smith, David},
title = {Bootstrapping Translation Detection and Sentence Extraction from Comparable Corpora},
booktitle = {Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
month = {June},
address = {San Diego, California},
publisher = {Association for Computational Linguistics},
pages = {1127--1132},
url = {
http://www.aclweb.org/anthology/N16-1132},
year = 2016
}
Krstovski and Smith (2016)
Barrón-Cedeño, Alberto and España-Bonet, Cristina and Boldoba, Josu and Màrquez, Lluís (2015):
A Factory of Comparable Corpora from Wikipedia, Proceedings of the Eighth Workshop on Building and Using Comparable Corpora
mentioned in Parallel Corpora and Comparable Corpora@InProceedings{Barronetal:2015,
author = {{Barr\'on-Cede{\~n}o}, Alberto and {Espa{\~n}a-Bonet}, Cristina and {Boldoba}, Josu and {M\`arquez}, Llu\'{i}s},
title = {A Factory of Comparable Corpora from Wikipedia},
booktitle = {Proceedings of the Eighth Workshop on Building and Using Comparable Corpora},
pages = {3--13},
month = {July},
date = {30},
address = {Beijing, China},
language = {english},
url = {
http://www.aclweb.org/anthology/W15-3402},
year = 2015
}
Barrón-Cedeño et al. (2015)
Hazem, Amir and Morin, Emmanuel (2016):
Efficient Data Selection for Bilingual Terminology Extraction from Comparable Corpora, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
@InProceedings{hazem-morin:2016:COLING,
author = {Hazem, Amir and Morin, Emmanuel},
title = {Efficient Data Selection for Bilingual Terminology Extraction from Comparable Corpora},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {3401--3411},
url = {
http://aclweb.org/anthology/C16-1321},
year = 2016
}
Hazem and Morin (2016)
Zhang, Meng and Liu, Yang and Luan, Huanbo and Liu, Yiqun and Sun, Maosong (2016):
Inducing Bilingual Lexica From Non-Parallel Data With Earth Mover's Distance Regularization, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
@InProceedings{zhang-EtAl:2016:COLING6,
author = {Zhang, Meng and Liu, Yang and Luan, Huanbo and Liu, Yiqun and Sun, Maosong},
title = {Inducing Bilingual Lexica From Non-Parallel Data With Earth Mover's Distance Regularization},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers},
month = {December},
address = {Osaka, Japan},
publisher = {The COLING 2016 Organizing Committee},
pages = {3188--3198},
url = {
http://aclweb.org/anthology/C16-1300},
year = 2016
}
Zhang et al. (2016)
Liu, Chunyang and Liu, Yang and Sun, Maosong and Luan, Huanbo and Yu, Heng (2016):
Agreement-based Learning of Parallel Lexicons and Phrases from Non-Parallel Corpora, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
@InProceedings{liu-EtAl:2016:P16-11,
author = {Liu, Chunyang and Liu, Yang and Sun, Maosong and Luan, Huanbo and Yu, Heng},
title = {Agreement-based Learning of Parallel Lexicons and Phrases from Non-Parallel Corpora},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {1024--1033},
url = {
http://www.aclweb.org/anthology/P16-1097},
year = 2016
}
Liu et al. (2016)
- UNKNOWN CITATION 'Wołk2015'
- UNKNOWN CITATION 'Wołk20150724'
- UNKNOWN CITATION 'Wołk2014'
- UNKNOWN CITATION 'Wołk2014126'
Dou, Qing and Vaswani, Ashish and Knight, Kevin and Dyer, Chris (2015):
Unifying Bayesian Inference and Vector Space Models for Improved Decipherment, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
@InProceedings{dou-EtAl:2015:ACL-IJCNLP,
author = {Dou, Qing and Vaswani, Ashish and Knight, Kevin and Dyer, Chris},
title = {Unifying Bayesian Inference and Vector Space Models for Improved Decipherment},
booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
month = {July},
address = {Beijing, China},
publisher = {Association for Computational Linguistics},
pages = {836--845},
url = {
http://www.aclweb.org/anthology/P15-1081},
year = 2015
}
Dou et al. (2015)
Nuhn, Malte and Schamper, Julian and Ney, Hermann (2015):
UNRAVELâ"‚¬"A Decipherment Toolkit, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
@InProceedings{nuhn-schamper-ney:2015:ACL-IJCNLP,
author = {Nuhn, Malte and Schamper, Julian and Ney, Hermann},
title = {UNRAVELâ"‚¬"A Decipherment Toolkit},
booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
month = {July},
address = {Beijing, China},
publisher = {Association for Computational Linguistics},
pages = {549--553},
url = {
http://www.aclweb.org/anthology/P15-2090},
year = 2015
}
Nuhn et al. (2015)
Meiping Dong and Yang Liu and Huanbo Luan and Maosong Sun and Tatsuya Izuha and Dakun Zhang (2015):
Iterative Learning of Parallel Lexicons and Phrases from Non-Parallel Corpora, Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI)
@inproceedings{Dong:2015:ijcai,
author = {Meiping Dong and Yang Liu and Huanbo Luan and Maosong Sun and Tatsuya Izuha and Dakun Zhang},
title = {Iterative Learning of Parallel Lexicons and Phrases from Non-Parallel Corpora},
pages = {1250--1256},
booktitle = {Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI)},
url = {
http://ijcai.org/papers15/Papers/IJCAI15-180.pdf},
location = {Buenos Aires, Argentina},
year = 2015
}
Dong et al. (2015)
Chu, Chenhui and Nakazawa, Toshiaki and Kurohashi, Sadao (2013):
Accurate Parallel Fragment Extraction from Quasi--Comparable Corpora using Alignment Model and Translation Lexicon, Proceedings of the Sixth International Joint Conference on Natural Language Processing
@InProceedings{chu-nakazawa-kurohashi:2013:IJCNLP,
author = {Chu, Chenhui and Nakazawa, Toshiaki and Kurohashi, Sadao},
title = {Accurate Parallel Fragment Extraction from Quasi--Comparable Corpora using Alignment Model and Translation Lexicon},
booktitle = {Proceedings of the Sixth International Joint Conference on Natural Language Processing},
month = {October},
address = {Nagoya, Japan},
publisher = {Asian Federation of Natural Language Processing},
pages = {1144--1150},
url = {
http://www.aclweb.org/anthology/I13-1163},
year = 2013
}
Chu et al. (2013)
Fu, Xiaoyin and Wei, Wei and Lu, Shixiang and Chen, Zhenbiao and Xu, Bo (2013):
Phrase-based Parallel Fragments Extraction from Comparable Corpora, Proceedings of the Sixth International Joint Conference on Natural Language Processing
@InProceedings{fu-EtAl:2013:IJCNLP,
author = {Fu, Xiaoyin and Wei, Wei and Lu, Shixiang and Chen, Zhenbiao and Xu, Bo},
title = {Phrase-based Parallel Fragments Extraction from Comparable Corpora},
booktitle = {Proceedings of the Sixth International Joint Conference on Natural Language Processing},
month = {October},
address = {Nagoya, Japan},
publisher = {Asian Federation of Natural Language Processing},
pages = {972--976},
url = {
http://www.aclweb.org/anthology/I13-1129},
year = 2013
}
Fu et al. (2013)
McCrae, John Philip and Cimiano, Philipp (2013):
Mining translations from the web of open linked data, Proceedings of the Joint Workshop on NLP&LOD and SWAIE: Semantic Web, Linked Open Data and Information Extraction
@InProceedings{mccrae-cimiano:2013:NLP-LOD-SWAIE,
author = {McCrae, John Philip and Cimiano, Philipp},
title = {Mining translations from the web of open linked data},
booktitle = {Proceedings of the Joint Workshop on NLP&LOD and SWAIE: Semantic Web, Linked Open Data and Information Extraction},
month = {September},
address = {Hissar, Bulgaria},
publisher = {INCOMA Ltd. Shoumen, BULGARIA},
pages = {8--11},
url = {
http://www.aclweb.org/anthology/W13-5203},
year = 2013
}
McCrae and Cimiano (2013)
Lapshinova-Koltunski, Ekaterina (2013):
VARTRA: A Comparable Corpus for Analysis of Translation Variation, Proceedings of the Sixth Workshop on Building and Using Comparable Corpora
@InProceedings{lapshinovakoltunski:2013:BUCC,
author = {Lapshinova-Koltunski, Ekaterina},
title = {VARTRA: A Comparable Corpus for Analysis of Translation Variation},
booktitle = {Proceedings of the Sixth Workshop on Building and Using Comparable Corpora},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {77--86},
url = {
http://www.aclweb.org/anthology/W13-2510},
year = 2013
}
Lapshinova-Koltunski (2013)
Preiss, Judita (2012):
Identifying Comparable Corpora Using LDA, Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
@InProceedings{preiss:2012:NAACL-HLT,
author = {Preiss, Judita},
title = {Identifying Comparable Corpora Using LDA},
booktitle = {Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
month = {June},
address = {Montr\'{e}al, Canada},
publisher = {Association for Computational Linguistics},
pages = {558--562},
url = {
http://www.aclweb.org/anthology/N12-1065},
year = 2012
}
Preiss (2012)
Toni Badia and Gemma Boleda and Maite Melero and Antoni Oliver (2005):
An n-gram Approach to Exploiting a Monolingual Corpus for Machine Translation, Proceedings of the Workshop on Example-based Machine Translation at MT Summit X
@InProceedings{Badia:2005:MTS,
author = {Toni Badia and Gemma Boleda and Maite Melero and Antoni Oliver},
title = {An n-gram Approach to Exploiting a Monolingual Corpus for Machine Translation},
url = {
http://mt-archive.info/MTS-2005-Badia.pdf},
googlescholar = {11888473321532340496},
booktitle = {Proceedings of the Workshop on Example-based Machine Translation at {MT} Summit X},
month = {September},
address = {Phuket, Thailand},
year = 2005
}
Badia et al. (2005)