Project leaders: Patrik Lambert, Holger Schwenk and Loïc Barrault
In many scenarios we need to evaluate the quality of a translated sentence, but no reference translation is available. For example, in the "wikitrans" scenario (http://statmt.org/mtm4/?n=Main.WikiTrans), in which users can propose corrections to automatic translations, it is useful to determine whether the proposed correction is valid or not. Automatic translations can be added to a parallel corpus to adapt it to another domain (Schwenk 2008). In this scenario some confidence on the quality of the translation is also useful to discard incorrect translations. Several confidence metrics have been proposed. The aim of the project is to implement some of them.
Team: Raphael Payen, Haithem Afli, Andreas Kirkedal, Loïc Barrault
DAY1 : Survey
S. Raybaud, C. Lavecchia, D. Langlois, K. Smaïli
Word- and Sentence-level Confidence Measures for Machine Translation
http://www.mt-archive.info/EAMT-2009-Raybaud.pdf
N. Ueffing and H. Ney, Word-Level Confidence Estimation for Machine Translation
http://portal.acm.org/citation.cfm?id=1245137
J. Blatz, E. Fitzgerald, G. Foster, S. Gandrabur, C. Goutte, A. Kulesza, A. Sanchis, N. Ueffing
Confidence Estimation for Machine Translation
http://portal.acm.org/citation.cfm?id=1220401
C. B. Quirck
Training a Sentence-Level Machine Translation Confidence Measure
http://research.microsoft.com/apps/pubs/?id=68968
L. Specia
Estimating the Sentence-Level Quality of Machine Translation Systems
http://clg.wlv.ac.uk/papers/Specia_MTSummit2009.pdf
DAY2: Ideas
Got some code from S. Raybaud. There is no documentation ... we need to understand that code!!!
DAY3:
DAY4: - tokenizing - reversing (thanks Andreas!) - nothing (thanks Loic)
DAY5: CMD :
train_cmi -e corpus/news-train08.tok.en -f corpus/news-train08.tok.fr --mi mifile -N
train_imi -t corpus/news-train08.tok.en --mi mifile -N
cm-ngram -lm LMs/news-en.news-en.4g.kn-int.1-1-1-1.sblm -text corpus/news-train08.tok.en