Sentence Alignment
Translated texts are often found in the form of translated documents or web pages. Since sentences are not always mapped one-to-one, sentence alignment methods are needed.
Sentence Alignment is the main subject of 46 publications. 28 are discussed here.
Publications
Sentence alignment was a very active field of research in the early days of statistical machine translation. An influential early method is based on sentence length, measured in words
Peter F. Brown and Jennifer C. Lai and Robert L. Mercer (1991):
ALIGNING SENTENCES IN PARALLEL CORPORA, Proceedings of the 29th Annual Meeting of the Association of Computational Linguistics (ACL)
@Inproceedings{Brown:1991al,
author = {Peter F. Brown and Jennifer C. Lai and Robert L. Mercer},
title = {ALIGNING SENTENCES IN PARALLEL CORPORA},
booktitle = {Proceedings of the 29th Annual Meeting of the Association of Computational Linguistics (ACL)},
year = 1991
}
(Brown et al., 1991;
William A. Gale and Kenneth Ward Church (1991):
A PROGRAM FOR ALIGNING SENTENCES IN BILINGUAL CORPORA, Proceedings of the 29th Annual Meeting of the Association of Computational Linguistics (ACL)
@Inproceedings{Gale:1991,
author = {William A. Gale and Kenneth Ward Church},
title = {A PROGRAM FOR ALIGNING SENTENCES IN BILINGUAL CORPORA},
url = {
http://acl.ldc.upenn.edu/H/H91/H91-1026.pdf},
booktitle = {Proceedings of the 29th Annual Meeting of the Association of Computational Linguistics (ACL)},
year = 1991
}
Gale and Church, 1991;
William A. Gale and Kenneth Ward Church (1993):
A program for aligning sentences in bilingual corpora, Computational Linguistics
@Article{Gale:1993,
author = {William A. Gale and Kenneth Ward Church},
title = {A program for aligning sentences in bilingual corpora},
url = {
http://acl.ldc.upenn.edu/J/J93/J93-1004.pdf},
journal = {Computational Linguistics},
volume = {19},
number = {1},
year = 1993
}
Gale and Church, 1993) or characters
Kenneth Ward Church (1993):
Char align: A Program for Aligning Parallel Texts at the Character Level , Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics (ACL)
@Inproceedings{Church:1993,
author = {Kenneth Ward Church},
title = {Char align: A Program for Aligning Parallel Texts at the Character Level },
url = {
http://acl.ldc.upenn.edu/P/P93/P93-1001.pdf},
googlescholar = {63602749800220446},
booktitle = {Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics (ACL)},
year = 1993
}
(Church, 1993). Other methods may use alignment chains
I. Dan Melamed (1996):
A Geometric Approach to Mapping Bitext Correspondence, Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)
mentioned in Sentence Alignment and Word Alignment Based On Coocurrence@Inproceedings{Melamed:1996,
author = {I. Dan Melamed},
title = {A Geometric Approach to Mapping Bitext Correspondence},
url = {
http://acl.ldc.upenn.edu/W/W96/W96-0201.pdf},
googlescholar = {598522478706255108},
booktitle = {Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)},
year = 1996
}
(Melamed, 1996;
I. Dan Melamed (1999):
Bitext Maps and Alignment via Pattern Recognition, Computational Linguistics
@Article{Melamed:1999,
author = {I. Dan Melamed},
title = {Bitext Maps and Alignment via Pattern Recognition},
url = {
http://acl.ldc.upenn.edu/J/J99/J99-1003.pdf?origin=publication\_detail},
googlescholar = {12764996208636055149},
journal = {Computational Linguistics},
volume = {25},
number = {1},
pages = {107--130},
year = 1999
}
Melamed, 1999), model omissions
I. Dan Melamed (1996):
Automatic Detection of Omissions in Translations, Proceedings of the 16th International Conference on Computational Linguistics (COLING)
@InProceedings{Melamed:1996b,
author = {I. Dan Melamed},
title = {Automatic Detection of Omissions in Translations},
url = {
http://acl.ldc.upenn.edu/C/C96/C96-2129.pdf},
googlescholar = {11384947881851115088},
booktitle = {Proceedings of the 16th International Conference on Computational Linguistics (COLING)},
year = 1996
}
(Melamed, 1996), distinguish between large-scale segmentation of text an detailed sentence alignment
(Simard and Plamondon, 1996), apply line detection method from image processing to detect large-scale alignment patterns
Jason S. Chang and Mathis H. Chen (1997):
An Alignment Method for Noisy Parallel Corpora based on Image Processing Techniques, Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL)
@Inproceedings{Chang:1997,
author = {Jason S. Chang and Mathis H. Chen},
title = {An Alignment Method for Noisy Parallel Corpora based on Image Processing Techniques},
url = {
http://acl.ldc.upenn.edu/P/P97/P97-1038.pdf},
googlescholar = {12222328424950527864},
booktitle = {Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL)},
year = 1997
}
(Chang and Chen, 1997;
I. Dan Melamed (1997):
A Portable Algorithm for Mapping Bitext Correspondence, Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL)
mentioned in Phrase Based Models and Sentence Alignment@Inproceedings{Melamed:1997,
author = {I. Dan Melamed},
title = {A Portable Algorithm for Mapping Bitext Correspondence},
url = {
http://acl.ldc.upenn.edu/P/P97/P97-1039.pdf},
googlescholar = {1974725806507584461},
booktitle = {Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL)},
year = 1997
}
Melamed, 1997).
Kay and Röscheisen (1993) propose an iterative algorithm that uses spelling similarity and word co-occurrences to drive sentence alignment. Several researchers proposed including lexical information
Stanley F. Chen (1993):
ALIGNING SENTENCES IN BILINGUAL CORPORA USING LEXICAL INFORMATION , Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics (ACL)
@Inproceedings{Chen:1993,
author = {Stanley F. Chen},
title = {ALIGNING SENTENCES IN BILINGUAL CORPORA USING LEXICAL INFORMATION },
url = {
http://acl.ldc.upenn.edu/P/P93/P93-1002.pdf},
googlescholar = {5311346999303906403},
booktitle = {Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics (ACL)},
year = 1993
}
(Chen, 1993;
Ido Dagan and Kenneth Ward Church and William A. Gale (1993):
Robust Bilingual Word Alignment for Machine Aided Translation, Proceedings of the Workshop on Very Large Corpora (VLC)
@Inproceedings{Dagan:1993,
author = {Ido Dagan and Kenneth Ward Church and William A. Gale},
title = {Robust Bilingual Word Alignment for Machine Aided Translation},
url = {
http://acl.ldc.upenn.edu/W/W93/W93-0301.pdf},
googlescholar = {831185247678677342},
booktitle = {Proceedings of the Workshop on Very Large Corpora (VLC)},
year = 1993
}
Dagan et al., 1993;
Takehito Utsuro and Hiroshi Ikeda and Masaya Yamane and Yuji Matsumoto and Makoto Nagao (1994):
Bilingual Text, Matching using Bilingual Dictionary and Statistics , Proceedings of the 15th International Conference on Computational Linguistics (COLING)
@InProceedings{Utsuro:1994,
author = {Takehito Utsuro and Hiroshi Ikeda and Masaya Yamane and Yuji Matsumoto and Makoto Nagao},
title = {Bilingual Text, Matching using Bilingual Dictionary and Statistics },
url = {
http://acl.ldc.upenn.edu/C/C94/C94-2175.pdf},
googlescholar = {8093274267969998069},
booktitle = {Proceedings of the 15th International Conference on Computational Linguistics (COLING)},
year = 1994
}
Utsuro et al., 1994;
Dekai Wu (1994):
ALIGNING A PARALLEL ENGLISH-CHINESE CORPUS STATISTICALLY WITH LEXICAL CRITERIA, Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (ACL)
@Inproceedings{Wu:1994,
author = {Dekai Wu},
title = {ALIGNING A PARALLEL ENGLISH-CHINESE CORPUS STATISTICALLY WITH LEXICAL CRITERIA},
url = {
http://acl.ldc.upenn.edu/P/P94/P94-1012.pdf},
googlescholar = {5390460009631001793},
booktitle = {Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (ACL)},
year = 1994
}
Wu, 1994;
Masahiko Haruno and Takefumi Yamazaki (1996):
High-Performance Bilingual Text Alignment Using Statistical and Dictionary Information, Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (ACL)
@Inproceedings{Haruno:1996,
author = {Masahiko Haruno and Takefumi Yamazaki},
title = {High-Performance Bilingual Text Alignment Using Statistical and Dictionary Information},
url = {
http://acl.ldc.upenn.edu/P/P96/P96-1018.pdf},
googlescholar = {17248445556749412280},
booktitle = {Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (ACL)},
year = 1996
}
Haruno and Yamazaki, 1996;
Thomas C. Chuang and Jason S. Chang (2002):
Adaptive Sentence Alignment based on Length and Lexical Information, Proceedings of the ACL-02 Demonstration Session
@inproceedings{Chuang:2002,
author = {Thomas C. Chuang and Jason S. Chang},
title = {Adaptive Sentence Alignment based on Length and Lexical Information},
url = {
http://acl.ldc.upenn.edu/acl2002/DEMOS/pdfs/DEMO005.pdf},
booktitle = {Proceedings of the ACL-02 Demonstration Session},
year = 2002
}
Chuang and Chang, 2002;
Tz-Liang Kueng and Keh-Yih Su (2002):
A Robust Cross-Style Bilingual Sentence Alignment Model, Proceedings of the International Conference on Computational Linguistics (COLING)
@InProceedings{Kueng:2002,
author = {Tz-Liang Kueng and Keh-Yih Su},
title = {A Robust Cross-Style Bilingual Sentence Alignment Model},
booktitle = {Proceedings of the International Conference on Computational Linguistics (COLING)},
year = 2002
}
Kueng and Su, 2002;
Robert C. Moore (2002):
Fast and Accurate Sentence Alignment of Bilingual Corpora, Machine Translation: From Research to Real Users, 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002 Tiburon, CA, USA, October 6-12, 2002, Proceedings
@inproceedings{Moore:2002,
author = {Robert C. Moore},
title = {Fast and Accurate Sentence Alignment of Bilingual Corpora},
url = {
http://research.microsoft.com/pubs/68886/sent-align2-amta-final.pdf},
googlescholar = {13221034929964728422},
editor = {Stephen D. Richardson},
booktitle = {Machine Translation: From Research to Real Users, 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002 Tiburon, CA, USA, October 6-12, 2002, Proceedings},
publisher = {Springer},
series = {Lecture Notes in Computer Science},
volume = {2499},
isbn = {3-540-44282-0},
bibsource = {DBLP,
http://dblp.uni-trier.de},
year = 2002
}
Moore, 2002;
Stephen Nightingale and Hideki Tanaka (2003):
Comparing the Sentence Alignment Yield from Two News Corpora Using a Dictionary-Based Alignment System, HLT-NAACL 2003 Workshop: Building and Using Parallel Texts: Data Driven Machine Translation and Beyond
@inproceedings{Nightingale:2003,
author = {Stephen Nightingale and Hideki Tanaka },
title = {Comparing the Sentence Alignment Yield from Two News Corpora Using a Dictionary-Based Alignment System},
url = {
http://acl.ldc.upenn.edu/W/W03/W03-0321.pdf},
googlescholar = {13696946875396536456},
booktitle = {HLT-NAACL 2003 Workshop: Building and Using Parallel Texts: Data Driven Machine Translation and Beyond},
editor = {Rada Mihalcea and Ted Pedersen},
month = {May 31},
address = {Edmonton, Alberta, Canada},
publisher = {Association for Computational Linguistics},
year = 2003
}
Nightingale and Tanaka, 2003;
Aswani, Niraj and Gaizauskas, Robert (2005):
A Hybrid Approach to Align Sentences and Words in English-Hindi Parallel Corpora, Proceedings of the ACL Workshop on Building and Using Parallel Texts
@InProceedings{aswani-gaizauskas:2005:WPT1,
author = {Aswani, Niraj and Gaizauskas, Robert},
title = {A Hybrid Approach to Align Sentences and Words in {E}nglish-{H}indi Parallel Corpora},
booktitle = {Proceedings of the ACL Workshop on Building and Using Parallel Texts},
month = {June},
address = {Ann Arbor, Michigan},
publisher = {Association for Computational Linguistics},
pages = {57--64},
url = {
http://www.aclweb.org/anthology/W/W05/W05-0808},
year = 2005
}
Aswani and Gaizauskas, 2005), content words
Harris Papageorgiou and Lambros Cranias and Stelios Piperidis (1994):
AUTOMATIC ALIGNMENT IN PARALLEL CORPORA, Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (ACL)
@Inproceedings{Papageorgiou:1994,
author = {Harris Papageorgiou and Lambros Cranias and Stelios Piperidis},
title = {AUTOMATIC ALIGNMENT IN PARALLEL CORPORA},
url = {
http://acl.ldc.upenn.edu/P/P94/P94-1051.pdf},
googlescholar = {5654702234864199495},
booktitle = {Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (ACL)},
year = 1994
}
(Papageorgiou et al., 1994), numbers and n-grams
Mark W. Davis and Ted E. Dunning and William C. Ogden (1995):
Text Alignment in the Real World: Improving Alignments of Noisy Translations Using Common Lexical Features, String Matching Strategies and N-Gram Comparisons, Proceedings of Meeting of the European Chapter of the Association of Computational Linguistics (EACL)
@InProceedings{Davis:1995,
author = {Mark W. Davis and Ted E. Dunning and William C. Ogden},
title = {Text Alignment in the Real World: Improving Alignments of Noisy Translations Using Common Lexical Features, String Matching Strategies and N-Gram Comparisons},
url = {
http://acl.ldc.upenn.edu/E/E95/E95-1010.pdf},
googlescholar = {9685012630949139307},
booktitle = {Proceedings of Meeting of the European Chapter of the Association of Computational Linguistics (EACL)},
year = 1995
}
(Davis et al., 1995). Sentence alignment may also be improved by a third language in multilingual corpora
Michel Simard (1999):
Text-Translation Alignment: Three Languages Are Better Than Two, Proceedings of the Joint Conference of Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP-VLC)
@Inproceedings{Simard:1999,
author = {Michel Simard},
title = {Text-Translation Alignment: Three Languages Are Better Than Two},
url = {
http://acl.ldc.upenn.edu/W/W99/W99-0602.pdf},
googlescholar = {10216192400680370515},
booktitle = {Proceedings of the Joint Conference of Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP-VLC)},
year = 1999
}
(Simard, 1999). More effort is needed to align very noisy corpora
Bing Zhao and Klaus Zechner and Stephan Vogel and Alex Waibel (2003):
Efficient Optimization for Bilingual Sentence Alignment Based on Linear Regression, HLT-NAACL 2003 Workshop: Building and Using Parallel Texts: Data Driven Machine Translation and Beyond
@inproceedings{Zhao:2003c,
author = {Bing Zhao and Klaus Zechner and Stephan Vogel and Alex Waibel },
title = {Efficient Optimization for Bilingual Sentence Alignment Based on Linear Regression},
url = {
http://acl.ldc.upenn.edu/W/W03/W03-0315.pdf},
googlescholar = {16634099164253930600},
booktitle = {HLT-NAACL 2003 Workshop: Building and Using Parallel Texts: Data Driven Machine Translation and Beyond},
editor = {Rada Mihalcea and Ted Pedersen},
month = {May 31},
address = {Edmonton, Alberta, Canada},
publisher = {Association for Computational Linguistics},
year = 2003
}
(Zhao et al., 2003). Different sentence alignment methods are compared by
Singh, Anil Kumar and Husain, Samar (2005):
Comparison, Selection and Use of Sentence Alignment Algorithms for New Language Pairs, Proceedings of the ACL Workshop on Building and Using Parallel Texts
@InProceedings{singh-husain:2005:WPT,
author = {Singh, Anil Kumar and Husain, Samar},
title = {Comparison, Selection and Use of Sentence Alignment Algorithms for New Language Pairs},
booktitle = {Proceedings of the ACL Workshop on Building and Using Parallel Texts},
month = {June},
address = {Ann Arbor, Michigan},
publisher = {Association for Computational Linguistics},
pages = {99--106},
url = {
http://www.aclweb.org/anthology/W/W05/W05-0816},
year = 2005
}
Singh and Husain (2005).
Xu, Jia and Zens, Richard and Ney, Hermann (2006):
Partitioning Parallel Documents Using Binary Segmentation, Proceedings on the Workshop on Statistical Machine Translation
@InProceedings{xu-zens-ney:2006:WMT,
author = {Xu, Jia and Zens, Richard and Ney, Hermann},
title = {Partitioning Parallel Documents Using Binary Segmentation},
booktitle = {Proceedings on the Workshop on Statistical Machine Translation},
month = {June},
address = {New York City},
publisher = {Association for Computational Linguistics},
pages = {78--85},
url = {
http://www.aclweb.org/anthology/W/W06/W06-3111},
year = 2006
}
Xu et al. (2006) propose a method that iteratively performs binary splits of a document to obtain a sentence alignment.
Enright, Jessica and Kondrak, Grzegorz (2007):
A Fast Method for Parallel Document Identification, Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
@InProceedings{enright-kondrak:2007:ShortPapers,
author = {Enright, Jessica and Kondrak, Grzegorz},
title = {A Fast Method for Parallel Document Identification},
booktitle = {Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers},
month = {April},
address = {Rochester, New York},
publisher = {Association for Computational Linguistics},
pages = {29--32},
url = {
http://www.aclweb.org/anthology/N/N07/N07-2008},
year = 2007
}
Enright and Kondrak (2007) use a simple and fast method for document alignment that relies of overlap of rare but identically spelled words, which are mostly cognates, names, and numbers.
Benchmarks
Discussion
Related Topics
Extracting parallel sentences from comparable corpora is a much harder challenge.
New Publications
- UNKNOWN CITATION 'Wołk2014'
\'Eva Mújdricza-Maydt and Huiqin Körkel-Qu and Stefan Riezler and Sebastian Padó (2013):
High-Precision Sentence Alignment by Bootstrapping from Wood Standard Annotations, The Prague Bulletin of Mathematical Linguistics
@article{pbml-99-mujdricza-maydt-et-al,
author = {\'E}va M{\'u}jdricza-Maydt and Huiqin K{\"o}rkel-Qu and Stefan Riezler and Sebastian Pad{\'o},
title = {High-Precision Sentence Alignment by Bootstrapping from Wood Standard Annotations},
url = {
http://ufal.mff.cuni.cz/pbml/99/art-mujdricza-maydt-et-al.pdf},
pages = {5--16},
journal = {The Prague Bulletin of Mathematical Linguistics},
volume = {99},
year = 2013
}
Mújdricza-Maydt et al. (2013)
Quan, Xiaojun and Kit, Chunyu and Song, Yan (2013):
Non-Monotonic Sentence Alignment via Semisupervised Learning, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
@InProceedings{quan-kit-song:2013:ACL2013,
author = {Quan, Xiaojun and Kit, Chunyu and Song, Yan},
title = {Non-Monotonic Sentence Alignment via Semisupervised Learning},
booktitle = {Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {622--630},
url = {
http://www.aclweb.org/anthology/P13-1061},
year = 2013
}
Quan et al. (2013)
Kutuzov, Andrey (2013):
Improving English-Russian sentence alignment through POS tagging and Damerau-Levenshtein distance, Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing
@InProceedings{kutuzov:2013:BSNLP,
author = {Kutuzov, Andrey},
title = {Improving English-Russian sentence alignment through POS tagging and Damerau-Levenshtein distance},
booktitle = {Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {63--68},
url = {
http://www.aclweb.org/anthology/W13-2410},
year = 2013
}
Kutuzov (2013)
Krstovski, Kriste and Smith, David A. (2013):
Online Polylingual Topic Models for Fast Document Translation Detection, Proceedings of the Eighth Workshop on Statistical Machine Translation
@InProceedings{krstovski-smith:2013:WMT,
author = {Krstovski, Kriste and Smith, David A.},
title = {Online Polylingual Topic Models for Fast Document Translation Detection},
booktitle = {Proceedings of the Eighth Workshop on Statistical Machine Translation},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {252--261},
url = {
http://www.aclweb.org/anthology/W13-2232},
year = 2013
}
Krstovski and Smith (2013)
Zaidan, Omar and Chowdhary, Vishal (2013):
Evaluating (and Improving) Sentence Alignment under Noisy Conditions, Proceedings of the Eighth Workshop on Statistical Machine Translation
@InProceedings{zaidan-chowdhary:2013:WMT,
author = {Zaidan, Omar and Chowdhary, Vishal},
title = {Evaluating (and Improving) Sentence Alignment under Noisy Conditions},
booktitle = {Proceedings of the Eighth Workshop on Statistical Machine Translation},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {484--493},
url = {
http://www.aclweb.org/anthology/W13-2261},
year = 2013
}
Zaidan and Chowdhary (2013)
Plamada, Magdalena and Volk, Martin (2013):
Mining for Domain-specific Parallel Text from Wikipedia, Proceedings of the Sixth Workshop on Building and Using Comparable Corpora
@InProceedings{plamada-volk:2013:BUCC,
author = {Plamada, Magdalena and Volk, Martin},
title = {Mining for Domain-specific Parallel Text from Wikipedia},
booktitle = {Proceedings of the Sixth Workshop on Building and Using Comparable Corpora},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {112--120},
url = {
http://www.aclweb.org/anthology/W13-2514},
year = 2013
}
Plamada and Volk (2013)
Zhang, Chengzhi and Yao, Xuchen and Kit, Chunyu (2013):
Finding More Bilingual Webpages with High Credibility via Link Analysis, Proceedings of the Sixth Workshop on Building and Using Comparable Corpora
@InProceedings{zhang-yao-kit:2013:BUCC,
author = {Zhang, Chengzhi and Yao, Xuchen and Kit, Chunyu},
title = {Finding More Bilingual Webpages with High Credibility via Link Analysis},
booktitle = {Proceedings of the Sixth Workshop on Building and Using Comparable Corpora},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {138--143},
url = {
http://www.aclweb.org/anthology/W13-2517},
year = 2013
}
Zhang et al. (2013)
Fethi Lamraoui and Philippe Langlais (2013):
Yet Another Fast and Robust and Open Source Sentence Aligner. Time to Reconsider Sentence Alignment, Machine Translation Summit XIV
@inproceedings{MTS2013-Lamraoui,
author = {Fethi Lamraoui and Philippe Langlais},
title = {Yet Another Fast and Robust and Open Source Sentence Aligner. {Time} to Reconsider Sentence Alignment},
url = {
http://www.mt-archive.info/10/MTS-2013-Lamraoui.pdf},
pages = {77--84},
booktitle = {Machine Translation Summit XIV},
year = 2013
}
Lamraoui and Langlais (2013)
Stymne, Sara and Hardmeier, Christian and Tiedemann, Jörg and Nivre, Joakim (2013):
Feature Weight Optimization for Discourse-Level SMT, Proceedings of the Workshop on Discourse in Machine Translation
@InProceedings{stymne-EtAl:2013:DiscoMT,
author = {Stymne, Sara and Hardmeier, Christian and Tiedemann, J\"{o}rg and Nivre, Joakim},
title = {Feature Weight Optimization for Discourse-Level SMT},
booktitle = {Proceedings of the Workshop on Discourse in Machine Translation},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {60--69},
url = {
http://www.aclweb.org/anthology/W13-3308},
year = 2013
}
Stymne et al. (2013)
Stymne, Sara and Hardmeier, Christian and Tiedemann, Jörg and Nivre, Joakim (2013):
Feature Weight Optimization for Discourse-Level SMT, Proceedings of the Workshop on Discourse in Machine Translation
@InProceedings{stymne-EtAl:2013:DiscoMT,
author = {Stymne, Sara and Hardmeier, Christian and Tiedemann, J\"{o}rg and Nivre, Joakim},
title = {Feature Weight Optimization for Discourse-Level SMT},
booktitle = {Proceedings of the Workshop on Discourse in Machine Translation},
month = {August},
address = {Sofia, Bulgaria},
publisher = {Association for Computational Linguistics},
pages = {60--69},
url = {
http://www.aclweb.org/anthology/W13-3308},
year = 2013
}
Stymne et al. (2013)
Rico Sennrich and Martin Volk (2010):
MT-based Sentence Alignment for OCR-generated Parallel Texts, Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas
@inproceedings{AMTA-2010-Sennrich,
author = {Rico Sennrich and Martin Volk},
title = {MT}-based Sentence Alignment for {OCR-generated Parallel Texts},
url = {
http://www.mt-archive.info/AMTA-2010-Sennrich.pdf},
booktitle = {Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas},
location = {Denver, Colorado},
year = 2010
}
Sennrich and Volk (2010)
Shi, Lei and Zhou, Ming (2008):
Improved Sentence Alignment on Parallel Web Pages Using a Stochastic Tree Alignment Model, Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing
@InProceedings{shi-zhou:2008:EMNLP,
author = {Shi, Lei and Zhou, Ming},
title = {Improved Sentence Alignment on Parallel Web Pages Using a Stochastic Tree Alignment Model},
booktitle = {Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing},
month = {October},
address = {Honolulu, Hawaii},
publisher = {Association for Computational Linguistics},
pages = {505--513},
url = {
http://www.aclweb.org/anthology/D08-1053},
year = 2008
}
Shi and Zhou (2008)
Mamitimin, Samat and Hou, Min (2009):
Chinese-Uyghur Sentence Alignment: An Approach Based on Anchor Sentences, Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
@InProceedings{mamitimin-hou:2009:BUCC,
author = {Mamitimin, Samat and Hou, Min},
title = {Chinese-Uyghur Sentence Alignment: An Approach Based on Anchor Sentences},
booktitle = {Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora},
month = {August},
address = {Singapore},
publisher = {Association for Computational Linguistics},
pages = {38--45},
url = {
http://www.aclweb.org/anthology/W/W09/W09-3108},
year = 2009
}
Mamitimin and Hou (2009)
Braune, Fabienne and Fraser, Alexander (2010):
Improved Unsupervised Sentence Alignment for Symmetrical and Asymmetrical Parallel Corpora, Coling 2010: Posters
@InProceedings{braune-fraser:2010:POSTERS,
author = {Braune, Fabienne and Fraser, Alexander},
title = {Improved Unsupervised Sentence Alignment for Symmetrical and Asymmetrical Parallel Corpora},
booktitle = {Coling 2010: Posters},
month = {August},
address = {Beijing, China},
publisher = {Coling 2010 Organizing Committee},
pages = {81--89},
url = {
http://www.aclweb.org/anthology/C10-2010},
year = 2010
}
Braune and Fraser (2010)
Li, Peng and Sun, Maosong and Xue, Ping (2010):
Fast-Champollion: A Fast and Robust Sentence Alignment Algorithm, Coling 2010: Posters
@InProceedings{li-sun-xue:2010:POSTERS,
author = {Li, Peng and Sun, Maosong and Xue, Ping},
title = {Fast-Champollion: A Fast and Robust Sentence Alignment Algorithm},
booktitle = {Coling 2010: Posters},
month = {August},
address = {Beijing, China},
publisher = {Coling 2010 Organizing Committee},
pages = {710--718},
url = {
http://www.aclweb.org/anthology/C10-2081},
year = 2010
}
Li et al. (2010)
Slayden, Glenn and Hwang, Mei-Yuh and Schwartz, Lee (2010):
Thai Sentence-Breaking for Large-Scale SMT, Proceedings of the 1st Workshop on South and Southeast Asian Natural Language Processing
@InProceedings{slayden-hwang-schwartz:2010:SSANLP,
author = {Slayden, Glenn and Hwang, Mei-Yuh and Schwartz, Lee},
title = {Thai Sentence-Breaking for Large-Scale SMT},
booktitle = {Proceedings of the 1st Workshop on South and Southeast Asian Natural Language Processing},
month = {August},
address = {Beijing, China},
publisher = {Coling 2010 Organizing Committee},
pages = {8--16},
url = {
http://www.aclweb.org/anthology/W10-3602},
year = 2010
}
Slayden et al. (2010)
Vilar, Juan Miguel (2005):
Experiments Using MAR for Aligning Corpora, Proceedings of the ACL Workshop on Building and Using Parallel Texts
@InProceedings{vilar:2005:WPT,
author = {Vilar, Juan Miguel},
title = {Experiments Using {MAR} for Aligning Corpora},
booktitle = {Proceedings of the ACL Workshop on Building and Using Parallel Texts},
month = {June},
address = {Ann Arbor, Michigan},
publisher = {Association for Computational Linguistics},
pages = {95--98},
url = {
http://www.aclweb.org/anthology/W/W05/W05-0815},
year = 2005
}
Vilar (2005)
Thomas C. Chuang and Jiang-Cheng Wu and Tracy Lin and Wen-Chie Shei and Jason S. Chang (2004):
Bilingual Sentence Alignment Based on Punctuation Statistics and Lexicon, Proceedings of the Internation Joint Conference on Natural Language Processing (IJCNLP)
@inproceedings{Chuang:2004,
author = {Thomas C. Chuang and Jiang-Cheng Wu and Tracy Lin and Wen-Chie Shei and Jason S. Chang},
title = {Bilingual Sentence Alignment Based on Punctuation Statistics and Lexicon},
booktitle = {Proceedings of the Internation Joint Conference on Natural Language Processing (IJCNLP)},
year = 2004
}
Chuang et al. (2004)
David D. Palmer and Marti A. Hearst (1997):
Adaptive Multilingual Sentence Boundary Disambiguation, Computational Linguistics
@Article{Palmer:1997,
author = {David D. Palmer and Marti A. Hearst},
title = {Adaptive Multilingual Sentence Boundary Disambiguation},
url = {
http://acl.ldc.upenn.edu/J/J97/J97-2002.pdf?origin=publication\_detail},
googlescholar = {10610553735381302170},
journal = {Computational Linguistics},
volume = {23},
number = {3},
year = 1997
}
Palmer and Hearst (1997)