All competitions are now available on Codabench:
The test data can be downloaded from: github.com/WMT-QE-Task/wmt-qe-2024-data
New DEADLINES for the shared task! See below
OVERVIEW
This shared task focuses on automatic methods for estimating the quality of neural machine translation output at run-time, without relying on reference translations. As in previous years, we propose subtasks that address quality estimation at sentence and word levels. This year, we introduce a new subtask where participants are required to provide explanations for the detected errors, as well as, suggest corrections.
Languages covered
The list below provides the language pairs covered this year and the annotation available for each language pair:
-
English to German (MQM)
-
English to Spanish (MQM)
-
English to Hindi (MQM, DA, Post-edits)
-
English to Gujarati (DA)
-
English to Telugu (DA)
-
English to Tamil (DA, Post-edits)
This year no training and validation datasets will be released for Task 1: Sentence-level quality estimation and Task 2: Fine grained error detection. Participants should use the datasets from the previous year’s shared task available at wmt-qe-task.github.io/wmt-qe-2023/. For the new task on quality-informed automatic post-editing (QEAPE), we release the training and validation data containing post-edits for English-Hindi and English-Tamil language pairs. In addition, there is synthetic data available on the GitHub repository, to further support the new Task 3. You can find it in the train_dev folders for each LP: github.com/WMT-QE-Task/wmt-qe-2024-data/tree/main/train_dev/Task_3.
Tasks organised
Task 1: Sentence-level quality estimation
We follow the trend of the previous years and we organise a sentence-level quality estimation subtask where the goal is to predict the quality score for each source-target sentence pair. Depending on the language pair, the participants will be asked to predict either the direct assessment (DA) score or the multi-dimensional quality metrics (MQM) score. In the case of English-Hindi participants can predict both scores. Detailed information about the language pairs, the annotation specifics, and the available training and development resources are available at wmt-qe-task.github.io/.
Find more details about the task in the Sentence-level quality estimation task description.
Task 2: Fine grained error detection
This is a word-level subtask where the goal is to predict the translation error spans as opposed to binary OK/BAD tasks as it is normally case with word-level quality estimation tasks. For this subtask we will use the error spans obtained from the MQM annotations. Participants will be asked to predict both the error span (start and end indices) as well as the error severity (major or minor) for each segment.
Find more details about the task in the Fine grained error detection task description.
Task 3: Quality-informed automatic post-editing (QEAPE)
This is a new task which proposes to combine quality estimation and automatic post-editing in order to correct the output of machine translation. In light of this, we invite participants to submit systems capable of automatically generating QE predictions for machine-translated text and the corresponding output corrections. The objective is to explore how quality estimates (possibly at different levels of granularity) can inform post-editing. For instance, global sentence-level QE annotations may guide more or less radical post-editing strategies, while word-level annotations can be used for fine-grained, pinpointed corrections. We also encourage approaches that leverage quality explanations generated by large language models. Although the task is focused on quality informed APE, we also allow participants to submit APE output without QE predictions to understand the impact of their QE system (but not outputs that do not contain the correction). Submission w/o QE predictions will also be considered official. The task focuses on two language pairs: English-Hindi and English-Tamil.
For more details see the QEAPE task description.
DEADLINES
Submission opening |
22nd July 2024 (updated deadline) |
Submission deadline |
31st July 2024 (updated deadline) |
Paper submission deadline to WMT |
TBA (follows EMNLP) |
WMT Notification of acceptance |
TBA (follows EMNLP) |
WMT Camera-ready deadline |
TBA (follows EMNLP) |
Conference |
15-16 November, 2024 |
All deadlines are in AoE (Anywhere on Earth). Dates are specified with respect to EMNLP 2024.
SUBMISSION LINKS
Notes
-
You need to register for each of the competitions to be able to submit your predictions. Clicking on the “My Submissions” tab will prompt you to register and we will approve your registration.
-
As a reminder please use with a single account or organisation per team.
-
If your submissions are stuck in “Submitted” phase at any point please notify us instead of resubmitting multiple times. We can delete stuck submissions and try to handle the issue as occasionally codabench faces issues with remote workers. See also: github.com/codalab/codabench/issues/1546
How to check your performance
During the competition:
Once the submission state is finished you can click on it and
-
For Task 1 and Task 2: Download the “Output from scoring step”,
-
For Task 3: Check the stdout log of the scoring logs
In both cases, you can also see the baseline performance. For the baselines we used:
-
Task 1: CometKiwi 2022: huggingface.co/Unbabel/wmt22-cometkiwi-da
-
Task 2: word-level adaptation of CometKiwi: huggingface.co/Unbabel/WMT24-QE-task2-baseline
-
Task 3: Do nothing baseline: we used the hypothesis as the post-edition
You can also download your submission file itself by clicking on the desired submission at any stage.
After the competition:
The leaderboard will be visible, using your submission with the best average score (primary metric) across language pairs to guide the selection.
ORGANIZERS
-
Giuseppe Attanasio, Instituto de Telecomunicações, Instituto Superior Técnico, Universidade de Lisboa
-
Pushpak Bhattacharyya, Indian Institute of Technology Bombay, India
-
Frédéric Blain, Tilburg University, Netherlands
-
Rajen Chatterjee, Apple Inc, US
-
Sourabh Deoghare, Indian Institute of Technology Bombay, India
-
Catarina Farinha, Unbabel, Portugal
-
Markus Freitag, Google, US
-
Nuno M Guerreiro, Instituto de Telecomunicações, Unbabel, Instituto Superior Técnico, Centrale Supélec – Université Paris-Saclay
-
Diptesh Kanojia, University of Surrey, UK
-
Andre Martins: Instituto de Telecomunicações, Instituto Superior Técnico, Universidade de Lisboa (Lisbon ELLIS Unit), Unbabel
-
Matteo Negri, FBK, Italy
-
Constantin Orasan, University of Surrey, UK
-
Ricardo Rei, Unbabel, Portugal
-
José GC de Souza, Unbabel
-
Marco Turchi, Zoom, US
-
Chrysoula Zerva: Instituto de Telecomunicações, Instituto Superior Técnico, Universidade de Lisboa (Lisbon ELLIS Unit)
STAY IN TOUCH
-
Please register to our Google group to stay in touch.
-
Follow us on Twitter: @qe_task