Project leaders: Yashar Mehdad, Matteo Negri, Marcello Federico
Evaluation is becoming increasingly important in Machine Translation (MT) as an essential component, as in other areas of Natural Language Processing (NLP). The quality of MT systems, so far, can be evaluated by human judge or automatically using many metrics such as BLEU (Papineni et al., 2002), NIST (Doddington, 2002), METEOR (Banerjee and Lavie, 2005) and TER (Snover et al., 2006) and with different associated resource and requirements. Due to the time and cost constraints, the automatic evaluation has been given a considerable attention through the recent workshops and shared-tasks1 (Callison-Burch et al., 2008) in order to overcome the shortcomings with the current methods employed for the evaluation of Machine Translation technology. However, there are still several problems in the current technology for MT evaluation:
This emerges the need to overcome the mentioned problems through the development of systems and algorithms which can judge the adequacy of MT output without any form of reference translation. Without more suitable approaches to address these difficulties, the improvement of MT technology and the level of its applicability is far to be provided.
In addition, two main features draw a border line between this project and Confidence Estimation (CE) as well as MT technology. In the first place, the challenge is distinctfrom MT, because the quality score is assigned on the output of the MT system without any information about the expected output and MT system's core algorithm. Besides providing additional information about the output, this score can facilitate the efficient evaluation of MT with no manual effort. Likewise, avoiding the complexity imposed by SMT technology (e.g. search), there is more room for exploiting semantic features such as: Word Sense Disambiguation (WSD) and Semantic Role Labeling (SMT). On the other hand, in contract with CE which focuses more on the overall quality of the MT output, this score can focus on the adequacy. This would bring along more interesting issues which concern more the semantic and structural aspects of MT. Moreover, the outcome technology can benefit other applications including cross-language semantic similarity, cross-lingual textual entailment (Mehdad et al., 2010), lexical choice in machine translation (Bangalore et al., 2007), cross-lingual content synchronization and merging. In general, moving to this direction can integrate more semantic information in MT which in principal can help improving this technology.
Towards this project, we hope to partially defeat this problem. We also hope that such project makes it easier for researchers to be more active in bridging semantics and MT.
================================================
Project meeting day 1:
Attendees: Marcello Federico, Daniele Pighin, Hana Bechara, Angelilu Lazandoo, Nikos Engonpoulos, Alina Petrova, Jose Camargo De Souza, Yashar Mehdad
Some literature to study:
The issues discussed:
Some solutions and discussions:
Conclusion:
Project meeting day 2
Attendees: Marcello Federico, Daniele Pighin, Hanna Bechara, Angeliki Lazaridou, Nikos Engonopoulos, Jose Camargo De Souza, Yashar Mehdad, Marco Turchi, Antonio Valerio
The issues discussed:
Proposed discussion:
Tasks: - Feature extraction:
- Dataset prepration: Daniele and Yashar
I’ve recieved an email from Eleftherios Avramidis and he mentioned about another paper of DFKI which is very related to our work: http%3a%2f%2fwww.statmt.org%2fwmt11%2fpdf%2fWMT04.pdf He kindly proposed to help us in this project. He already sent me the data (WMT 2008,9 and 2010), while they worked on ranking.
=============================================================
- What has been done:
- Things to be done:
- Preliminary result