Collecting Parallel Corpora
The web is the main source for parallel corpora today, which requires a number of processing steps, but also other data resources have been explored.
Parallel Corpora is the main subject of 84 publications. 30 are discussed here.
Philip Resnik (1999):
Mining the Web for Bilingual Text, Proceedings of the 37th Annual Meeting of the Association of Computational Linguistics (ACL)
author = {Philip Resnik},
title = {Mining the Web for Bilingual Text},
url = {},
googlescholar = {4360226935188574245},
booktitle = {Proceedings of the 37th Annual Meeting of the Association of Computational Linguistics (ACL)},
year = 1999
Resnik (1999) describes a method to automatically find parallel documents on the web.
Fukushima, Ken'ichi and Taura, Kenjiro and Chikayama, Takashi (2006):
A Fast and Accurate Method for Detecting English-Japanese Parallel Texts, Proceedings of the Workshop on Multilingual Language Resources and Interoperability
author = {Fukushima, Ken'ichi and Taura, Kenjiro and Chikayama, Takashi},
title = {A Fast and Accurate Method for Detecting {English-Japanese} Parallel Texts},
booktitle = {Proceedings of the Workshop on Multilingual Language Resources and Interoperability},
month = {July},
address = {Sydney, Australia},
publisher = {Association for Computational Linguistics},
pages = {60--67},
url = {},
year = 2006
Fukushima et al. (2006) use a dictionary to detect parallel documents, while
Bo Li and Juan Liu (2008):
Mining Chinese-English Parallel Corpora from the Web , Proceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP)
author = {Bo Li and Juan Liu},
title = {Mining {C}hinese-{E}nglish Parallel Corpora from the Web },
url = {},
googlescholar = {8567864887980266695},
booktitle = {Proceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP)},
year = 2008
Li and Liu (2008) use a number of criteria such as similarity of the URL and page content. Acquiring parallel corpora, however, typically requires some manual involvement
Philipp Koehn (2002):
Europarl: A Multilingual Corpus for Evaluation of Machine Translation @misc{Europarl,
author = {Philipp Koehn},
title = {Europarl: A Multilingual Corpus for Evaluation of Machine Translation},
howpublished = {Unpublished, {\tt$\sim$koehn/europarl/}},
year = 2002
(Koehn, 2002;
Joel Martin and Howard Johnson and Benoit Farley and Anna Maclachlan (2003):
Aligning and Using an English-Inuktitut Parallel Corpus, HLT-NAACL 2003 Workshop: Building and Using Parallel Texts: Data Driven Machine Translation and Beyond
author = {Joel Martin and Howard Johnson and Benoit Farley and Anna Maclachlan },
title = {Aligning and Using an {English-Inuktitut} Parallel Corpus},
url = {},
booktitle = {HLT-NAACL 2003 Workshop: Building and Using Parallel Texts: Data Driven Machine Translation and Beyond},
editor = {Rada Mihalcea and Ted Pedersen},
month = {May 31},
address = {Edmonton, Alberta, Canada},
publisher = {Association for Computational Linguistics},
year = 2003
Martin et al., 2003;
Philipp Koehn (2005):
Europarl: A Parallel Corpus for Statistical Machine Translation, Proceedings of the Tenth Machine Translation Summit (MT Summit X)
author = {Philipp Koehn},
title = {Europarl: A Parallel Corpus for Statistical Machine Translation},
url = {},
googlescholar = {6985235632472432229},
booktitle = {Proceedings of the Tenth Machine Translation Summit (MT Summit X)},
month = {September},
address = {Phuket, Thailand},
year = 2005
Koehn, 2005), including the matching of documents
Utiyama, Masao and Isahara, Hitoshi (2003):
Reliable Measures for Aligning Japanese-English News Articles and Sentences, Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics
author = {Utiyama, Masao and Isahara, Hitoshi},
title = {Reliable Measures for Aligning {Japanese-English} News Articles and Sentences},
booktitle = {Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics},
editor = {Erhard Hinrichs and Dan Roth},
url = {},
pages = {72--79},
year = 2003
(Utiyama and Isahara, 2003). A large collection of corpora is maintained at the
OPUS web site Jörg Tiedemann (2012):
Parallel Data, Tools and Interfaces in OPUS, Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012)
author = {J{\"o}rg Tiedemann},
title = {Parallel Data, Tools and Interfaces in OPUS},
url = {\_Paper.pdf},
note = {ACL Anthology Identifier: L12-1246},
booktitle = {Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012)},
month = {May},
address = {Istanbul, Turkey},
editor = {Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Mehmet U\u{g}ur Do\u{g}an and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis},
publisher = {European Language Resources Association (ELRA)},
isbn = {978-2-9517408-7-7},
language = {English},
pages = {2214--2218},
year = 2012
(Tiedemann, 2012).
Masao Uchiyama and Hitoshi Isahara (2007):
A Japanese-English Patent Parallel Corpus, Proceedings of the MT Summit XI
author = {Masao Uchiyama and Hitoshi Isahara},
title = {A {J}apanese-{E}nglish Patent Parallel Corpus},
booktitle = {Proceedings of the {MT} Summit XI},
year = 2007
Uchiyama and Isahara (2007) report on the efforts to build a Japanese-English patent corpus and
Lieve Macken and Julia Trushkina and Lidia Rura (2007):
Dutch Parallel Corpus: MT Corpus and translator's aid, Proceedings of the MT Summit XI
author = {Lieve Macken and Julia Trushkina and Lidia Rura},
title = {D}utch Parallel Corpus: {MT Corpus and translator's aid},
url = {},
googlescholar = {1625623404376163668},
booktitle = {Proceedings of the {MT} Summit XI},
year = 2007
Macken et al. (2007) on efforts on a broad-based Dutch-English corpus.
Wolfgang Täger (2011):
The Sentence-Aligned European Patent Corpus, Proceedings of the 15th International Conference of the European Association for Machine Translation (EAMT)
author = {Wolfgang T{\"a}ger},
title = {The Sentence-Aligned European Patent Corpus},
url = {},
googlescholar = {8983346114011238566},
pages = {177--184},
booktitle = {Proceedings of the 15th International Conference of the European Association for Machine Translation (EAMT)},
location = {Leuven, Belgium},
editor = {Mikel L. Forcada and Heidi Depraetere and Vincent Vandeghinste},
year = 2011
Täger (2011) describes the creation of the European patent corpus.
M. Cettolo and C. Girardi and M. Federico (2012):
WIT3: Web Inventory of Transcribed and Translated Talks, Proceedings of th 16th International Conference of the European Association for Machine Translation (EAMT)
mentioned in Parallel Corpora and Evaluation Campaigns@inproceedings{EAMT-2012-Cettolo,
author = {M. Cettolo and C. Girardi and M. Federico},
title = {WIT3: Web Inventory of Transcribed and Translated Talks},
url = {},
pages = {261-268},
booktitle = {Proceedings of th 16th International Conference of the European Association for Machine Translation (EAMT)},
location = {Trento, Italy},
editor = {Mauro Cettolo and Marcello Federico and Lucia Specia and Andy Way},
year = 2012
Cettolo et al. (2012) explain the creation of a multilingual parallel corpus of subtitles from the TED Talks website. A discussion of the pitfalls during the construction of parallel corpora is given by
Heiki-Jaan Kaalep and Kaarel Veskis (2007):
Comparing Parallel Corpora and Evaluating their Quality, Proceedings of the MT Summit XI
author = {Heiki-Jaan Kaalep and Kaarel Veskis},
title = {Comparing Parallel Corpora and Evaluating their Quality},
url = {},
googlescholar = {11072725916960369152},
booktitle = {Proceedings of the {MT} Summit XI},
year = 2007
Kaalep and Veskis (2007). A 200 million word Czech-English corpus from various sources was collected
Ondřej Bojar and Adam Liška and Zdenek \vZabokrtský (2010):
Evaluating Utility of Data Sources in a Large Parallel Czech-English Corpus CzEng 0.9, Proceedings of LREC2010
author = {Ond{\v{r}}ej Bojar and Adam Li\v{s}ka and Zden{\v{e}}k {\v{Z}}abokrtsk{\'{y}}},
title = {Evaluating Utility of Data Sources in a Large Parallel {Czech-English} Corpus {CzEng} 0.9},
url = {\_Paper.pdf},
booktitle = {Proceedings of LREC2010},
year = 2010
(Bojar et al., 2010) and linguistically annotated
Ondřej Bojar and Zdenek \vZabokrtský and Ondřej Dušek and Petra Galuščáková and Martin Majliš and David Mareček and Jiří Maršík and Michal Novák and Martin Popel and Aleš Tamchyna (2012):
The Joy of Parallelism with CzEng 1.0, Proceedings of LREC2012
author = {Ond{\v{r}}ej Bojar and Zden{\v{e}}k {\v{Z}}abokrtsk{\'{y}} and Ond{\v{r}}ej Du{\v{s}}ek and Petra Galu{\v{s}}{\v{c}}{\'{a}}kov{\'{a}} and Martin Majli{\v{s}} and David Mare{\v{c}}ek and Ji{\v{r}}{\'{\i}} Mar{\v{s}}{\'{\i}}k and Michal Nov{\'{a}}k and Martin Popel and Ale{\v{s}} Tamchyna},
title = {The Joy of Parallelism with CzEng 1.0},
booktitle = {Proceedings of LREC2012},
organization = {ELRA},
address = {Istanbul, Turkey},
month = {May},
url = {},
publisher = {European Language Resources Association},
year = 2012
(Bojar et al., 2012).
Uszkoreit, Jakob and Ponte, Jay and Popat, Ashok and Dubiner, Moshe (2010):
Large Scale Parallel Document Mining for Machine Translation, Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)
author = {Uszkoreit, Jakob and Ponte, Jay and Popat, Ashok and Dubiner, Moshe},
title = {Large Scale Parallel Document Mining for Machine Translation},
booktitle = {Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)},
month = {August},
address = {Beijing, China},
publisher = {Coling 2010 Organizing Committee},
pages = {1101--1109},
url = {},
year = 2010
Uszkoreit et al. (2010) address the problem of document alignment by translation of all documents into English and then use of information retrieval methods.
With the increasing use of machine translation on the web, distinguishing between human and machine translated texts becomes a challenge.
Venugopal, Ashish and Uszkoreit, Jakob and Talbot, David and Och, Franz and Ganitkevitch, Juri (2011):
Watermarking the Outputs of Structured Prediction with an application in Statistical Machine Translation., Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing
mentioned in Parallel Corpora and Corpus Cleaning@InProceedings{venugopal-EtAl:2011:EMNLP,
author = {Venugopal, Ashish and Uszkoreit, Jakob and Talbot, David and Och, Franz and Ganitkevitch, Juri},
title = {Watermarking the Outputs of Structured Prediction with an application in Statistical Machine Translation.},
booktitle = {Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing},
month = {July},
address = {Edinburgh, Scotland, UK.},
publisher = {Association for Computational Linguistics},
pages = {1363--1372},
url = {},
year = 2011
Venugopal et al. (2011) propose a method to watermark the output of machine translation systems to aid this distinction.
Antonova, Alexandra and Misyurev, Alexey (2011):
Building a Web-Based Parallel Corpus and Filtering Out Machine-Translated Text, Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
mentioned in Parallel Corpora and Corpus Cleaning@InProceedings{antonova-misyurev:2011:BUCC,
author = {Antonova, Alexandra and Misyurev, Alexey},
title = {Building a Web-Based Parallel Corpus and Filtering Out Machine-Translated Text},
booktitle = {Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web},
month = {June},
address = {Portland, Oregon},
publisher = {Association for Computational Linguistics},
pages = {136--144},
url = {},
year = 2011
Antonova and Misyurev (2011) report that rule-based machine translation output can be detected due to certain word choices, and machine translation output due to lack of reordering.
Spencer Rarrick and Chris Quirk and Will Lewis (2011):
MT Detection in Web-Scraped Parallel Corpora, Proceedings of the 13th Machine Translation Summit (MT Summit XIII)
mentioned in Parallel Corpora and Corpus Cleaning@inproceedings{MTS-2011-Rarrick,
author = {Spencer Rarrick and Chris Quirk and Will Lewis},
title = {MT Detection in Web-Scraped Parallel Corpora},
url = {},
pages = {422-430},
booktitle = {Proceedings of the 13th Machine Translation Summit (MT Summit XIII)},
publisher = {International Association for Machine Translation},
location = {Xiamen, China},
year = 2011
Rarrick et al. (2011) train a classifier to learn the distinction and show that removing such data leads to better translation quality.
Parallel corpora may also be built by dedicated manual translation efforts
Ulrich Germann (2001):
Building a Statistical Machine Translation System from Scratch: How Much Bang for the Buck Can We Expect?, Workshop on Data-Driven Machine Translation at 39th Annual Meeting of the Association of Computational Linguistics (ACL)
author = {Ulrich Germann},
title = {Building a Statistical Machine Translation System from Scratch: How Much Bang for the Buck Can We Expect?},
url = {},
booktitle = {Workshop on Data-Driven Machine Translation at 39th Annual Meeting of the Association of Computational Linguistics (ACL)},
year = 2001
(Germann, 2001). It may be useful to focus on the most relevant new sentences, using methods such active learning
Hemali Majithia and Philip Rennart and Evelyne Tzoukermann (2005):
Rapid Ramp-up for Statistical Machine Translation: Minimal Training for Maximal Coverage, Proceedings of the Tenth Machine Translation Summit (MT Summit X)
author = {Hemali Majithia and Philip Rennart and Evelyne Tzoukermann},
title = {Rapid Ramp-up for Statistical Machine Translation: Minimal Training for Maximal Coverage},
url = {},
googlescholar = {8162219579429246669},
booktitle = {Proceedings of the Tenth Machine Translation Summit (MT Summit X)},
month = {September},
address = {Phuket, Thailand},
year = 2005
(Majithia et al., 2005). Crowd-sourcing with inexperienced translators
Zaidan, Omar F. and Callison-Burch, Chris (2011):
Crowdsourcing Translation: Professional Quality from Non-Professionals, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies
author = {Zaidan, Omar F. and Callison-Burch, Chris},
title = {Crowdsourcing Translation: Professional Quality from Non-Professionals},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies},
month = {June},
address = {Portland, Oregon, USA},
publisher = {Association for Computational Linguistics},
pages = {1220--1229},
url = {},
year = 2011
(Zaidan and Callison-Burch, 2011) may be used to reduce cost.
Post, Matt and Callison-Burch, Chris and Osborne, Miles (2012):
Constructing Parallel Corpora for Six Indian Languages via Crowdsourcing, Proceedings of the Seventh Workshop on Statistical Machine Translation
author = {Post, Matt and Callison-Burch, Chris and Osborne, Miles},
title = {Constructing Parallel Corpora for Six Indian Languages via Crowdsourcing},
booktitle = {Proceedings of the Seventh Workshop on Statistical Machine Translation},
month = {June},
address = {Montreal, Canada},
publisher = {Association for Computational Linguistics},
pages = {154--162},
url = {},
year = 2012
Post et al. (2012) follow this approach to create parallel corpora for 6 Indian languages.
Translation memories may also be a useful training resource
Philippe Langlais and Michel Simard (2002):
Merging Example-Based and Statistical Machine Translation: An Experiment, Machine Translation: From Research to Real Users, 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002 Tiburon, CA, USA, October 6-12, 2002, Proceedings
author = {Philippe Langlais and Michel Simard},
title = {Merging Example-Based and Statistical Machine Translation: An Experiment},
url = {},
editor = {Stephen D. Richardson},
booktitle = {Machine Translation: From Research to Real Users, 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002 Tiburon, CA, USA, October 6-12, 2002, Proceedings},
publisher = {Springer},
series = {Lecture Notes in Computer Science},
volume = {2499},
isbn = {3-540-44282-0},
bibsource = {DBLP,},
year = 2002
(Langlais and Simard, 2002).
Other methods focus on fishing the web for the translation of particular terms
Nagata, Masaaki and Saito, Teruka and Suzuki, Kenji (2001):
Using the Web as a Bilingual Dictionary , Workshop on Data-Driven Machine Translation at 39th Annual Meeting of the Association of Computational Linguistics (ACL)
author = {Nagata, Masaaki and Saito, Teruka and Suzuki, Kenji},
title = {Using the Web as a Bilingual Dictionary },
url = {},
googlescholar = {8038620055339305841},
booktitle = {Workshop on Data-Driven Machine Translation at 39th Annual Meeting of the Association of Computational Linguistics (ACL)},
year = 2001
(Nagata et al., 2001) or phrases
Yunbo Cao and Hang Li (2002):
Base Noun Phrase Translation Using Web Data and the EM Algorithm, Proceedings of the International Conference on Computational Linguistics (COLING)
author = {Yunbo Cao and Hang Li},
title = {Base Noun Phrase Translation Using Web Data and the {EM} Algorithm},
url = {},
googlescholar = {10844118442717667840},
booktitle = {Proceedings of the International Conference on Computational Linguistics (COLING)},
year = 2002
(Cao and Li, 2002). Related is the targeted crawling for in-domain parallel corpora
Pavel Pecina and Antonio Toral and Andy Way and Vassilis Papavassiliou and Prokopis Prokopidis and Maria Giagkou (2011):
Towards Using Web-Crawled Data for Domain Adaptation in Statistical Machine Translation, Proceedings of the 15th International Conference of the European Association for Machine Translation (EAMT)
author = {Pavel Pecina and Antonio Toral and Andy Way and Vassilis Papavassiliou and Prokopis Prokopidis and Maria Giagkou},
title = {Towards Using Web-Crawled Data for Domain Adaptation in Statistical Machine Translation},
url = {\_Using\_Web-Crawled\_Data\_for\_Domain\_Adaptation\_in\_Statistical\_Machine\_Translation.pdf},
googlescholar = {15625079164006063045},
pages = {297--304},
booktitle = {Proceedings of the 15th International Conference of the European Association for Machine Translation (EAMT)},
location = {Leuven, Belgium},
editor = {Mikel L. Forcada and Heidi Depraetere and Vincent Vandeghinste},
year = 2011
(Pecina et al., 2011).
It is not clear, if it matters in which translation direction the parallel corpus was constructed, of if both sides were translated from a third language.
van Halteren, Hans (2008):
Source Language Markers in EUROPARL Translations, Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)
author = {van Halteren, Hans},
title = {Source Language Markers in {EUROPARL} Translations},
booktitle = {Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)},
month = {August},
address = {Manchester, UK},
publisher = {Coling 2008 Organizing Committee},
pages = {937--944},
url = {},
year = 2008
Halteren (2008) shows that it is possible to reliably detect the source language in English texts from the European Parliament proceedings, so the original source language does have some effect.
Related Topics
New Publications
- UNKNOWN CITATION 'simion-collins-stein:2015:EMNLP'
Hai Long Trieu and Le Minh Nguyen (2017):
A Multilingual Parallel Corpus for Improving Machine Translation on Southeast Asian Languages, Machine Translation Summit XVI
author = {Hai Long Trieu and Le Minh Nguyen},
title = {A Multilingual Parallel Corpus for Improving Machine Translation on Southeast Asian Languages},
booktitle = {Machine Translation Summit XVI},
location = {Nagoya, Japan},
year = 2017
Trieu and Nguyen (2017)
Abate, Solomon Teferra and Melese, Michael and Tachbelie, Martha Yifiru and Meshesha, Million and Atinafu, Solomon and Mulugeta, Wondwossen and Assabie, Yaregal and Abera, Hafte and Ephrem, Binyam and Abebe, Tewodros and Tsegaye, Wondimagegnhue and Lemma, Amanuel and Andargie, Tsegaye and Shifaw, Seifedin (2018):
Parallel Corpora for bi-lingual English-Ethiopian Languages Statistical Machine Translation, Proceedings of the 27th International Conference on Computational Linguistics
author = {Abate, Solomon Teferra and Melese, Michael and Tachbelie, Martha Yifiru and Meshesha, Million and Atinafu, Solomon and Mulugeta, Wondwossen and Assabie, Yaregal and Abera, Hafte and Ephrem, Binyam and Abebe, Tewodros and Tsegaye, Wondimagegnhue and Lemma, Amanuel and Andargie, Tsegaye and Shifaw, Seifedin},
title = {Parallel Corpora for bi-lingual English-Ethiopian Languages Statistical Machine Translation},
booktitle = {Proceedings of the 27th International Conference on Computational Linguistics},
month = {aug},
address = {Santa Fe, New Mexico, USA},
publisher = {Association for Computational Linguistics},
url = {},
pages = {3102--3111},
year = 2018
Abate et al. (2018)
Teferra Abate, Solomon and Melese, Michael and Yifiru Tachbelie, Martha and Meshesha, Million and Atinafu, Solomon and Mulugeta, Wondwossen and Assabie, Yaregal and Abera, Hafte and Ephrem, Binyam and Abebe, Tewodros and Tsegaye, Wondimagegnhue and Lemma, Amanuel and Andargie, Tsegaye and Shifaw, Seifedin (2018):
Parallel Corpora for bi-Directional Statistical Machine Translation for Seven Ethiopian Language Pairs, Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing
author = {Teferra Abate, Solomon and Melese, Michael and Yifiru Tachbelie, Martha and Meshesha, Million and Atinafu, Solomon and Mulugeta, Wondwossen and Assabie, Yaregal and Abera, Hafte and Ephrem, Binyam and Abebe, Tewodros and Tsegaye, Wondimagegnhue and Lemma, Amanuel and Andargie, Tsegaye and Shifaw, Seifedin},
title = {Parallel Corpora for bi-Directional Statistical Machine Translation for Seven Ethiopian Language Pairs},
booktitle = {Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing},
month = {aug},
address = {Santa Fe, New Mexico, USA},
publisher = {Association for Computational Linguistics},
url = {},
pages = {83--90},
year = 2018
Abate et al. (2018)
Deng, Dun and Xue, Nianwen (2014):
Building a Hierarchically Aligned Chinese-English Parallel Treebank, Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
author = {Deng, Dun and Xue, Nianwen},
title = {Building a Hierarchically Aligned Chinese-English Parallel Treebank},
booktitle = {Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers},
month = {August},
address = {Dublin, Ireland},
publisher = {Dublin City University and Association for Computational Linguistics},
pages = {1511--1520},
url = {},
year = 2014
Deng and Xue (2014)
Hieber, Felix and Jehl, Laura and Riezler, Stefan (2013):
Task Alternation in Parallel Sentence Retrieval for Twitter Translation, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
author = {Hieber, Felix and Jehl, Laura and Riezler, Stefan},
title = {Task Alternation in Parallel Sentence Retrieval for Twitter Translation},
booktitle = {Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {323--327},
url = {},
year = 2013
Hieber et al. (2013)
Wenjun Du and Wuying Liu and Junting Yu and Mianzhu Yi (2015):
Russian-Chinese Sentence-level Aligned News Corpus, Proceedings of the 18th Annual Conference of the European Association for Machine Translation
author = {Wenjun Du and Wuying Liu and Junting Yu and Mianzhu Yi},
title = {Russian-Chinese Sentence-level Aligned News Corpus},
booktitle = {Proceedings of the 18th Annual Conference of the European Association for Machine Translation},
month = {May},
address = {Antalya, Turkey},
url = {},
editor = {\^IIknur Durgar Elâ"‚¬"Kahlout and Mehmed \"Ozkan and Felipe S\'anchezâ"‚¬"Mart\'inez and Gema Ram\'irezâ"‚¬"S\'anchez and Fred Hollowood and Andy Way},
pages = {213},
year = 2015
Du et al. (2015)
Resnik, Philip and Smith, Noah A (2003):
The web as a parallel corpus, Computational Linguistics
author = {Resnik, Philip and Smith, Noah A},
title = {The web as a parallel corpus},
journal = {Computational Linguistics},
volume = {29},
number = {3},
pages = {349--380},
publisher = {MIT Press},
year = 2003
Resnik and Smith (2003)
Shi, Lei and Niu, Cheng and Zhou, Ming and Gao, Jianfeng (2006):
A dom tree alignment model for mining parallel data from the web, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
author = {Shi, Lei and Niu, Cheng and Zhou, Ming and Gao, Jianfeng},
title = {A dom tree alignment model for mining parallel data from the web},
booktitle = {Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics},
pages = {489--496},
organization = {Association for Computational Linguistics},
url = {},
year = 2006
Shi et al. (2006)
Font Llitjós, Ariadna (2006):
Can the Internet help improve Machine Translation?, Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Doctoral Consortium
author = {Font Llitj\'{o}s, Ariadna},
title = {Can the Internet help improve Machine Translation?},
booktitle = {Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Doctoral Consortium},
month = {June},
address = {New York City, USA},
publisher = {Association for Computational Linguistics},
pages = {219--222},
url = {},
year = 2006
Llitjós (2006)
Germann, Ulrich (2016):
Bilingual Document Alignment with Latent Semantic Indexing, Proceedings of the First Conference on Machine Translation
author = {Germann, Ulrich},
title = {Bilingual Document Alignment with Latent Semantic Indexing},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {692--696},
url = {},
year = 2016
Germann (2016)
Gomes, Luís and Pereira Lopes, Gabriel (2016):
First Steps Towards Coverage-Based Document Alignment, Proceedings of the First Conference on Machine Translation
author = {Gomes, Lu\'{i}s and Pereira Lopes, Gabriel},
title = {First Steps Towards Coverage-Based Document Alignment},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {697--702},
url = {},
year = 2016
Gomes and Lopes (2016)
Germann, Ulrich (2016):
Bilingual Document Alignment with Latent Semantic Indexing, Proceedings of the First Conference on Machine Translation
author = {Germann, Ulrich},
title = {Bilingual Document Alignment with Latent Semantic Indexing},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {692--696},
url = {},
year = 2016
Germann (2016)
Gomes, Luís and Pereira Lopes, Gabriel (2016):
First Steps Towards Coverage-Based Document Alignment, Proceedings of the First Conference on Machine Translation
author = {Gomes, Lu\'{i}s and Pereira Lopes, Gabriel},
title = {First Steps Towards Coverage-Based Document Alignment},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {697--702},
url = {},
year = 2016
Gomes and Lopes (2016)
Jakubina, Laurent and Langlais, Phillippe (2016):
BAD LUC@WMT 2016: a Bilingual Document Alignment Platform Based on Lucene, Proceedings of the First Conference on Machine Translation
author = {Jakubina, Laurent and Langlais, Phillippe},
title = {BAD LUC$@$WMT 2016: a Bilingual Document Alignment Platform Based on Lucene},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {703--709},
url = {},
year = 2016
Jakubina and Langlais (2016)
Dara, Aswarth Abhilash and Lin, Yiu-Chang (2016):
YODA System for WMT16 Shared Task: Bilingual Document Alignment, Proceedings of the First Conference on Machine Translation
author = {Dara, Aswarth Abhilash and Lin, Yiu-Chang},
title = {YODA System for WMT16 Shared Task: Bilingual Document Alignment},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {679--684},
url = {},
year = 2016
Dara and Lin (2016)
Esplà-Gomis, Miquel and Forcada, Mikel and Ortiz Rojas, Sergio and Ferrández-Tordera, Jorge (2016):
Bitextor's participation in WMT'16: shared task on document alignment, Proceedings of the First Conference on Machine Translation
author = {Espl\`{a}-Gomis, Miquel and Forcada, Mikel and Ortiz Rojas, Sergio and Ferr\'{a}ndez-Tordera, Jorge},
title = {Bitextor's participation in WMT'16: shared task on document alignment},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {685--691},
url = {},
year = 2016
Esplà-Gomis et al. (2016)
Le, Thanh and Vu, Hoa Trong and Oberländer, Jonathan and Bojar, Ondřej (2016):
Using Term Position Similarity and Language Modeling for Bilingual Document Alignment, Proceedings of the First Conference on Machine Translation
author = {Le, Thanh and Vu, Hoa Trong and Oberl\"{a}nder, Jonathan and Bojar, Ond\v{r}ej},
title = {Using Term Position Similarity and Language Modeling for Bilingual Document Alignment},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {710--716},
url = {},
year = 2016
Le et al. (2016)
MedveÄ", Marek and Jakubícek, Miloš and Kovár, Vojtech (2016):
English-French Document Alignment Based on Keywords and Statistical Translation, Proceedings of the First Conference on Machine Translation
author = {MedveÄ", Marek and Jakub\'{i}cek, Milo\v{s} and Kov\'{a}r, Vojtech},
title = {English-French Document Alignment Based on Keywords and Statistical Translation},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {728--732},
url = {},
year = 2016
MedveÄ" et al. (2016)
Azpeitia, Andoni and Etchegoyhen, Thierry (2016):
DOCAL - Vicomtech's Participation in the WMT16 Shared Task on Bilingual Document Alignment, Proceedings of the First Conference on Machine Translation
author = {Azpeitia, Andoni and Etchegoyhen, Thierry},
title = {DOCAL - Vicomtech's Participation in the WMT16 Shared Task on Bilingual Document Alignment},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {666--671},
url = {},
year = 2016
Azpeitia and Etchegoyhen (2016)
Papavassiliou, Vassilis and Prokopidis, Prokopis and Piperidis, Stelios (2016):
The ILSP/ARC submission to the WMT 2016 Bilingual Document Alignment Shared Task, Proceedings of the First Conference on Machine Translation
author = {Papavassiliou, Vassilis and Prokopidis, Prokopis and Piperidis, Stelios},
title = {The ILSP/ARC submission to the WMT 2016 Bilingual Document Alignment Shared Task},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {733--739},
url = {},
year = 2016
Papavassiliou et al. (2016)
Lohar, Pintu and Afli, Haithem and Liu, Chao-Hong and Way, Andy (2016):
The ADAPT Bilingual Document Alignment system at WMT16, Proceedings of the First Conference on Machine Translation
author = {Lohar, Pintu and Afli, Haithem and Liu, Chao-Hong and Way, Andy},
title = {The ADAPT Bilingual Document Alignment system at WMT16},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {717--723},
url = {},
year = 2016
Lohar et al. (2016)
Mahata, Sainik and Das, Dipankar and Pal, Santanu (2016):
WMT2016: A Hybrid Approach to Bilingual Document Alignment, Proceedings of the First Conference on Machine Translation
author = {Mahata, Sainik and Das, Dipankar and Pal, Santanu},
title = {WMT2016: A Hybrid Approach to Bilingual Document Alignment},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {724--727},
url = {},
year = 2016
Mahata et al. (2016)
Shchukin, Vadim and Khristich, Dmitry and Galinskaya, Irina (2016):
Word Clustering Approach to Bilingual Document Alignment (WMT 2016 Shared Task), Proceedings of the First Conference on Machine Translation
author = {Shchukin, Vadim and Khristich, Dmitry and Galinskaya, Irina},
title = {Word Clustering Approach to Bilingual Document Alignment (WMT 2016 Shared Task)},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {740--744},
url = {},
year = 2016
Shchukin et al. (2016)
Buck, Christian and Koehn, Philipp (2016):
Findings of the WMT 2016 Bilingual Document Alignment Shared Task, Proceedings of the First Conference on Machine Translation
author = {Buck, Christian and Koehn, Philipp},
title = {Findings of the WMT 2016 Bilingual Document Alignment Shared Task},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {554--563},
url = {},
year = 2016
Buck and Koehn (2016)
Buck, Christian and Koehn, Philipp (2016):
Quick and Reliable Document Alignment via TF/IDF-weighted Cosine Distance, Proceedings of the First Conference on Machine Translation
author = {Buck, Christian and Koehn, Philipp},
title = {Quick and Reliable Document Alignment via TF/IDF-weighted Cosine Distance},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {672--678},
url = {},
year = 2016
Buck and Koehn (2016)
Wang Ling and LuÃs Marujo and Chris Dyer and Alan W. Black and Isabel Trancoso (2016):
Mining Parallel Corpora from Sina Weibo and Twitter, Computational Linguistics
author = {Wang Ling and LuÃs Marujo and Chris Dyer and Alan W. Black and Isabel Trancoso},
title = {Mining Parallel Corpora from Sina Weibo and Twitter},
journal = {Computational Linguistics},
volume = {41},
number = {2},
month = {June},
year = 2016
Ling et al. (2016)
Barrón-Cedeño, Alberto and España-Bonet, Cristina and Boldoba, Josu and Màrquez, Lluís (2015):
A Factory of Comparable Corpora from Wikipedia, Proceedings of the Eighth Workshop on Building and Using Comparable Corpora
mentioned in Parallel Corpora and Comparable Corpora@InProceedings{Barronetal:2015,
author = {{Barr\'on-Cede{\~n}o}, Alberto and {Espa{\~n}a-Bonet}, Cristina and {Boldoba}, Josu and {M\`arquez}, Llu\'{i}s},
title = {A Factory of Comparable Corpora from Wikipedia},
booktitle = {Proceedings of the Eighth Workshop on Building and Using Comparable Corpora},
pages = {3--13},
month = {July},
date = {30},
address = {Beijing, China},
language = {english},
url = {},
year = 2015
Barrón-Cedeño et al. (2015)
Jalili Sabet, Masoud and Negri, Matteo and Turchi, Marco and Barbu, Eduard (2016):
An Unsupervised Method for Automatic Translation Memory Cleaning, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
author = {Jalili Sabet, Masoud and Negri, Matteo and Turchi, Marco and Barbu, Eduard},
title = {An Unsupervised Method for Automatic Translation Memory Cleaning},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
month = {August},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {287--292},
url = {},
year = 2016
Sabet et al. (2016)
Xiaoyi Ma and Mark Y. Liberman (1999):
BITS: A method for bilingual text search over the web, In Proceedings of the Machine Translation Summit VII
author = {Xiaoyi Ma and Mark Y. Liberman},
title = {BITS: A method for bilingual text search over the web},
booktitle = {In Proceedings of the Machine Translation Summit VII},
url = {},
year = 1999
Ma and Liberman (1999)
Ieva Zariņa and Pēteris \cNikiforovs and Raivis Skadiņš (2015):
Word Alignment Based Parallel Corpora Evaluation and Cleaning Using Machine Learning Techniques, Proceedings of the 18th Annual Conference of the European Association for Machine Translation
author = {Ieva Zari\c{n}a and P\={e}teris \c{N}ikiforovs and Raivis Skadi\c{n}\v{s}},
title = {Word Alignment Based Parallel Corpora Evaluation and Cleaning Using Machine Learning Techniques},
booktitle = {Proceedings of the 18th Annual Conference of the European Association for Machine Translation},
month = {May},
address = {Antalya, Turkey},
url = {},
editor = {\^IIknur Durgar Elâ"‚¬"Kahlout and Mehmed \"Ozkan and Felipe S\'anchezâ"‚¬"Mart\'inez and Gema Ram\'irezâ"‚¬"S\'anchez and Fred Hollowood and Andy Way},
pages = {185--192},
year = 2015
Zariņa et al. (2015)
Francisco Guzman and Hassan Sajjad and Stephan Vogel and Ahmed Abdelali (2013):
The AMARA corpus: building resources for translating the web's educational content, Proceedings of the International Workshop on Spoken Language Translation (IWSLT)
author = {Francisco Guzman and Hassan Sajjad and Stephan Vogel and Ahmed Abdelali},
title = {The {AMARA} corpus: building resources for translating the web's educational content},
url = {},
booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)},
year = 2013
Guzman et al. (2013)
Antonio Toral and Raphael Rubino and Miquel Esplà-Gomis and Tommi Pirinen and Andy Way and Gema RamÃrez-Sánchez (2014):
Extrinsic evaluation of web-crawlers in machine translation: a study on Croatian-English for the tourism domain, Proceedings of 17th Annual conference of the European Association for Machine Translation
author = {Antonio Toral and Raphael Rubino and Miquel Espl\`{a}-Gomis and Tommi Pirinen and Andy Way and Gema RamÃrez-S\'{a}nchez},
title = {Extrinsic evaluation of web-crawlers in machine translation: a study on Croatian-English for the tourism domain},
booktitle = {Proceedings of 17th Annual conference of the European Association for Machine Translation},
pages = {221-224},
url = {},
location = {Dubrovnik, Croatia},
year = 2014
Toral et al. (2014)
Haddow, Barry and Hernandez, Adolfo and Neubarth, Friedrich and Trost, Harald (2013):
Corpus development for machine translation between standard and dialectal varieties, Proceedings of the Workshop on Adaptation of Language Resources and Tools for Closely Related Languages and Language Variants
author = {Haddow, Barry and Hernandez, Adolfo and Neubarth, Friedrich and Trost, Harald},
title = {Corpus development for machine translation between standard and dialectal varieties},
booktitle = {Proceedings of the Workshop on Adaptation of Language Resources and Tools for Closely Related Languages and Language Variants},
month = {September},
address = {Hissar, Bulgaria},
publisher = {INCOMA Ltd. Shoumen, BULGARIA},
pages = {7--14},
url = {},
year = 2013
Haddow et al. (2013)
Ling, Wang and Marujo, Luis and Dyer, Chris and Black, Alan W and Trancoso, Isabel (2014):
Crowdsourcing High-Quality Parallel Data Extraction from Twitter, Proceedings of the Ninth Workshop on Statistical Machine Translation
author = {Ling, Wang and Marujo, Luis and Dyer, Chris and Black, Alan W and Trancoso, Isabel},
title = {Crowdsourcing High-Quality Parallel Data Extraction from Twitter},
booktitle = {Proceedings of the Ninth Workshop on Statistical Machine Translation},
month = {June},
address = {Baltimore, Maryland, USA},
publisher = {Association for Computational Linguistics},
pages = {426--436},
url = {},
year = 2014
Ling et al. (2014)
Matthias Eck and Yury Zemlyanskiy and Joy Zhang and Alex Waibel (2014):
Extracting Translation Pairs from Social Network Content, Proceedings of the International Workshop on Spoken Language Translation (IWSLT)
author = {Matthias Eck and Yury Zemlyanskiy and Joy Zhang and Alex Waibel},
title = {Extracting Translation Pairs from Social Network Content},
pages = {200--205},
booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)},
year = 2014
Eck et al. (2014)
Ling, Wang and Xiang, Guang and Dyer, Chris and Black, Alan and Trancoso, Isabel (2013):
Microblogs as Parallel Corpora, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
author = {Ling, Wang and Xiang, Guang and Dyer, Chris and Black, Alan and Trancoso, Isabel},
title = {Microblogs as Parallel Corpora},
booktitle = {Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {176--186},
url = {},
year = 2013
Ling et al. (2013)
Smith, Jason R. and Saint-Amand, Herve and Plamada, Magdalena and Koehn, Philipp and Callison-Burch, Chris and Lopez, Adam (2013):
Dirt Cheap Web-Scale Parallel Text from the Common Crawl, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
author = {Smith, Jason R. and Saint-Amand, Herve and Plamada, Magdalena and Koehn, Philipp and Callison-Burch, Chris and Lopez, Adam},
title = {Dirt Cheap Web-Scale Parallel Text from the Common Crawl},
booktitle = {Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {1374--1383},
url = {},
year = 2013
Smith et al. (2013)
Francis Bond and Shan Wang (2014):
Issues in building English-Chinese parallel corpora with WordNets., Proceedings of the Seventh Global Wordnet Conference
author = {Francis Bond and Shan Wang},
title = {Issues in building English-Chinese parallel corpora with WordNets.},
booktitle = {Proceedings of the Seventh Global Wordnet Conference},
url = {},
pages = {391-399},
editor = {Heili Orav and Christiane Fellbaum and Piek Vossen},
address = {Tartu, Estonia},
year = 2014
Bond and Wang (2014)
Papavassiliou, Vassilis and Prokopidis, Prokopis and Thurmair, Gregor (2013):
A modular open-source focused crawler for mining monolingual and bilingual corpora from the web, Proceedings of the Sixth Workshop on Building and Using Comparable Corpora
author = {Papavassiliou, Vassilis and Prokopidis, Prokopis and Thurmair, Gregor},
title = {A modular open-source focused crawler for mining monolingual and bilingual corpora from the web},
booktitle = {Proceedings of the Sixth Workshop on Building and Using Comparable Corpora},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {43--51},
url = {},
year = 2013
Papavassiliou et al. (2013)
Eisele, Andreas (2005):
First Steps towards Multi-Engine Machine Translation, Proceedings of the ACL Workshop on Building and Using Parallel Texts
mentioned in Parallel Corpora and System Combination@InProceedings{eisele:2005:WPT,
author = {Eisele, Andreas},
title = {First Steps towards Multi-Engine Machine Translation},
booktitle = {Proceedings of the ACL Workshop on Building and Using Parallel Texts},
month = {June},
address = {Ann Arbor, Michigan},
publisher = {Association for Computational Linguistics},
pages = {155--158},
url = {},
year = 2005
Eisele (2005)
Victoria Arranz and Olivier Hamon and Karim Boudahmane and Martine Garnier-Rizet (2011):
Protocol and lessons learnt from the production of parallel corpora for the evaluation of speech translation systems, Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT)
author = {Victoria Arranz and Olivier Hamon and Karim Boudahmane and Martine Garnier-Rizet},
title = {Protocol and lessons learnt from the production of parallel corpora for the evaluation of speech translation systems},
url = {},
pages = {129-135},
editor = {Marcello Federico and Mei-Yuh Hwang and Margit R{\"o}dder and Sebastian St{\"u}ker},
booktitle = {Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT)},
location = {San Francisco, USA},
year = 2011
Arranz et al. (2011)
Bin Lu and Ka Po Chow and Benjamin K. Tsou (2011):
The Cultivation of a Chinese-English-Japanese Trilingual Parallel Corpus from Comparable Patents, Proceedings of the 13th Machine Translation Summit (MT Summit XIII)
author = {Bin Lu and Ka Po Chow and Benjamin K. Tsou},
title = {The Cultivation of a {Chinese-English-Japanese} Trilingual Parallel Corpus from Comparable Patents},
url = {},
pages = {472-479},
booktitle = {Proceedings of the 13th Machine Translation Summit (MT Summit XIII)},
publisher = {International Association for Machine Translation},
location = {Xiamen, China},
year = 2011
Lu et al. (2011)
Gascó, Guillem and Rocha, Martha-Alicia and Sanchis-Trilles, Germán and Andrés-Ferrer, Jesús and Casacuberta, Francisco (2012):
Does more data always yield better translations?, Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
author = {Gasc\'{o}, Guillem and Rocha, Martha-Alicia and Sanchis-Trilles, Germ\'{a}n and Andr\'{e}s-Ferrer, Jes\'{u}s and Casacuberta, Francisco},
title = {Does more data always yield better translations?},
booktitle = {Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics},
month = {April},
address = {Avignon, France},
publisher = {Association for Computational Linguistics},
pages = {152--161},
url = {},
year = 2012
Gascó et al. (2012)
Tatsuya Ishisaka and Masao Utiyama and Eiichiro Sumita and Kazuhide Yamamoto (2009):
Development of a Japanese-English Software Manual Parallel Corpus, Proceedings of the Twelfth Machine Translation Summit (MT Summit XII)
author = {Tatsuya Ishisaka and Masao Utiyama and Eiichiro Sumita and Kazuhide Yamamoto},
title = {Development of a {J}apanese-{E}nglish Software Manual Parallel Corpus},
url = {\_Development\_of\_a\_Japanese-English\_Software\_Manual\_Paralell\_Corpus/file/e0b4951bdc04b06036.pdf},
googlescholar = {16239386671948484399},
booktitle = {Proceedings of the Twelfth Machine Translation Summit (MT Summit XII)},
publisher = {International Association for Machine Translation},
location = {Ottawa, Ontario, Canada},
year = 2009
Ishisaka et al. (2009)
Alexandre Rafalovitch and Robert Dale (2009):
United Nations General Assembly Resolutions: A Six-Language Parallel Corpus, Proceedings of the Twelfth Machine Translation Summit (MT Summit XII)
author = {Alexandre Rafalovitch and Robert Dale},
title = {United {N}ations {G}eneral {A}ssembly Resolutions: A Six-Language Parallel Corpus},
url = {\_Dale\_MT\_Summit\_2009.pdf},
googlescholar = {10722155333156234579},
booktitle = {Proceedings of the Twelfth Machine Translation Summit (MT Summit XII)},
publisher = {International Association for Machine Translation},
location = {Ottawa, Ontario, Canada},
year = 2009
Rafalovitch and Dale (2009)
Masao Utiyama and Daisuke Kawahara and Keiji Yasuda and Eiichiro Sumita (2009):
Mining Parallel Texts from Mixed-Language Web Pages, Proceedings of the Twelfth Machine Translation Summit (MT Summit XII)
author = {Masao Utiyama and Daisuke Kawahara and Keiji Yasuda and Eiichiro Sumita},
title = {Mining Parallel Texts from Mixed-Language Web Pages},
url = {},
googlescholar = {162748147626911829},
booktitle = {Proceedings of the Twelfth Machine Translation Summit (MT Summit XII)},
publisher = {International Association for Machine Translation},
location = {Ottawa, Ontario, Canada},
year = 2009
Utiyama et al. (2009)
Qibo Zhu and Diana Inkpen and Ash Asudeh (2009):
Inducing translations from officially published materials in Canadian government websites, Proceedings of the Twelfth Machine Translation Summit (MT Summit XII)
author = {Qibo Zhu and Diana Inkpen and Ash Asudeh},
title = {Inducing translations from officially published materials in {C}anadian government websites},
booktitle = {Proceedings of the Twelfth Machine Translation Summit (MT Summit XII)},
publisher = {International Association for Machine Translation},
location = {Ottawa, Ontario, Canada},
year = 2009
Zhu et al. (2009)
Hong, Gumwon and Li, Chi-Ho and Zhou, Ming and Rim, Hae-Chang (2010):
An Empirical Study on Web Mining of Parallel Data, Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)
author = {Hong, Gumwon and Li, Chi-Ho and Zhou, Ming and Rim, Hae-Chang},
title = {An Empirical Study on Web Mining of Parallel Data},
booktitle = {Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)},
month = {August},
address = {Beijing, China},
publisher = {Coling 2010 Organizing Committee},
pages = {474--482},
url = {},
year = 2010
Hong et al. (2010)
Han, Xiwu and Li, Hanzhang and Zhao, Tiejun (2009):
Train the Machine with What It Can Learn---Corpus Selection for SMT, Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
author = {Han, Xiwu and Li, Hanzhang and Zhao, Tiejun},
title = {Train the Machine with What It Can Learn---Corpus Selection for SMT},
booktitle = {Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora},
month = {August},
address = {Singapore},
publisher = {Association for Computational Linguistics},
pages = {27--33},
url = {},
year = 2009
Han et al. (2009)
Donghua Xu and Chew Lim Tan (1999):
Alignment and Matching of Bilingual English-Chinese News Texts, Machine Translation
author = {Donghua Xu and Chew Lim Tan},
title = {Alignment and Matching of Bilingual {E}nglish-{C}hinese News Texts},
pages = {1--33},
journal = {Machine Translation},
volume = {14},
number = {1},
month = {March},
year = 1999
Xu and Tan (1999)
Miquel Esplà-Gomis (2009):
Bitextor: a Free/Open-source Software to Harvest Translation Memories from Multilingual Websites, MT Summit Workshop on New Tools for Translators
author = {Miquel Espl\`{a}-Gomis},
title = {Bitextor: a Free/Open-source Software to Harvest Translation Memories from Multilingual Websites},
booktitle = {MT Summit Workshop on New Tools for Translators},
publisher = {International Association for Machine Translation},
location = {Ottawa, Ontario, Canada},
year = 2009
Esplà-Gomis (2009)
Ambati, Vamshi and Vogel, Stephan (2010):
Can Crowds Build Parallel Corpora for Machine Translation Systems?, Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
author = {Ambati, Vamshi and Vogel, Stephan},
title = {Can Crowds Build Parallel Corpora for Machine Translation Systems?},
booktitle = {Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk},
month = {June},
address = {Los Angeles},
publisher = {Association for Computational Linguistics},
pages = {62--65},
url = {},
year = 2010
Ambati and Vogel (2010)
Hu, Chang and Resnik, Philip and Kronrod, Yakov and Eidelman, Vladimir and Buzek, Olivia and Bederson, Benjamin B. (2011):
The Value of Monolingual Crowdsourcing in a Real-World Translation Scenario: Simulation using Haitian Creole Emergency SMS Messages, Proceedings of the Sixth Workshop on Statistical Machine Translation
mentioned in Parallel Corpora and Sparse Data@InProceedings{hu-EtAl:2011:WMT,
author = {Hu, Chang and Resnik, Philip and Kronrod, Yakov and Eidelman, Vladimir and Buzek, Olivia and Bederson, Benjamin B.},
title = {The Value of Monolingual Crowdsourcing in a Real-World Translation Scenario: Simulation using Haitian Creole Emergency SMS Messages},
booktitle = {Proceedings of the Sixth Workshop on Statistical Machine Translation},
month = {July},
address = {Edinburgh, Scotland},
publisher = {Association for Computational Linguistics},
pages = {399--404},
url = {},
year = 2011
Hu et al. (2011)
Krstovski, Kriste and Smith, David A. (2011):
A Minimally Supervised Approach for Detecting and Ranking Document Translation Pairs, Proceedings of the Sixth Workshop on Statistical Machine Translation
author = {Krstovski, Kriste and Smith, David A.},
title = {A Minimally Supervised Approach for Detecting and Ranking Document Translation Pairs},
booktitle = {Proceedings of the Sixth Workshop on Statistical Machine Translation},
month = {July},
address = {Edinburgh, Scotland},
publisher = {Association for Computational Linguistics},
pages = {207--216},
url = {},
year = 2011
Krstovski and Smith (2011)
Cartoni, Bruno and Zufferey, Sandrine and Meyer, Thomas and Popescu-Belis, Andrei (2011):
How Comparable are Parallel Corpora? Measuring the Distribution of General Vocabulary and Connectives, Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
author = {Cartoni, Bruno and Zufferey, Sandrine and Meyer, Thomas and Popescu-Belis, Andrei},
title = {How Comparable are Parallel Corpora? Measuring the Distribution of General Vocabulary and Connectives},
booktitle = {Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web},
month = {June},
address = {Portland, Oregon},
publisher = {Association for Computational Linguistics},
pages = {78--86},
url = {},
year = 2011
Cartoni et al. (2011)
Gahbiche-Braham, Souhir and Bonneau-Maynard, Hélène and Yvon, François (2011):
Two Ways to Use a Noisy Parallel News Corpus for Improving Statistical Machine Translation, Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
author = {Gahbiche-Braham, Souhir and Bonneau-Maynard, H\'{e}l\`{e}ne and Yvon, Fran\c{c}ois},
title = {Two Ways to Use a Noisy Parallel News Corpus for Improving Statistical Machine Translation},
booktitle = {Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web},
month = {June},
address = {Portland, Oregon},
publisher = {Association for Computational Linguistics},
pages = {44--51},
url = {},
year = 2011
Gahbiche-Braham et al. (2011)
Patry, Alexandre and Langlais, Philippe (2011):
Identifying Parallel Documents from a Large Bilingual Collection of Texts: Application to Parallel Article Extraction in Wikipedia., Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
author = {Patry, Alexandre and Langlais, Philippe},
title = {Identifying Parallel Documents from a Large Bilingual Collection of Texts: Application to Parallel Article Extraction in Wikipedia.},
booktitle = {Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web},
month = {June},
address = {Portland, Oregon},
publisher = {Association for Computational Linguistics},
pages = {87--95},
url = {},
year = 2011
Patry and Langlais (2011)
John Fry (2005):
Assembling a Parallel Corpus from RSS News Feeds, Proceedings of the Workshop on Example-based Machine Translation at MT Summit X
author = {John Fry},
title = {Assembling a Parallel Corpus from {RSS} News Feeds},
url = {},
googlescholar = {12617274806825000713},
booktitle = {Proceedings of the Workshop on Example-based Machine Translation at {MT} Summit X},
month = {September},
address = {Phuket, Thailand},
year = 2005
Fry (2005)