EMNLP 2024

NINTH CONFERENCE ON
MACHINE TRANSLATION (WMT24)

November 15-16, 2024
Miami, Florida, USA
 
[HOME] [PROGRAM] [PAPERS] [AUTHORS]
TRANSLATION TASKS: [GENERAL MT (NEWS)] [LOW-RESOURCE LANGUAGES OF SPAIN] [INDIC MT] [CHAT TASK] [BIOMEDICAL] [MULTIINDIC22MT TASK] [ENGLISH-TO-LOWRES MULTIMODAL MT TASK] [NON-REPETITIVE] [PATENT] [LITERARY]
EVALUATION TASKS: [METRICS TASK] [MT TEST SUITES] [QUALITY ESTIMATION]
OTHER TASKS: [OPEN LANGUAGE DATA INITIATIVE]

All competitions are now available on Codabench:

The test data can be downloaded from: github.com/WMT-QE-Task/wmt-qe-2024-data

New DEADLINES for the shared task! See below

OVERVIEW

This shared task focuses on automatic methods for estimating the quality of neural machine translation output at run-time, without relying on reference translations. As in previous years, we propose subtasks that address quality estimation at sentence and word levels. This year, we introduce a new subtask where participants are required to provide explanations for the detected errors, as well as, suggest corrections.

Languages covered

The list below provides the language pairs covered this year and the annotation available for each language pair:

  • English to German (MQM)

  • English to Spanish (MQM)

  • English to Hindi (MQM, DA, Post-edits)

  • English to Gujarati (DA)

  • English to Telugu (DA)

  • English to Tamil (DA, Post-edits)

This year no training and validation datasets will be released for Task 1: Sentence-level quality estimation and Task 2: Fine grained error detection. Participants should use the datasets from the previous year’s shared task available at wmt-qe-task.github.io/wmt-qe-2023/. For the new task on quality-informed automatic post-editing (QEAPE), we release the training and validation data containing post-edits for English-Hindi and English-Tamil language pairs. In addition, there is synthetic data available on the GitHub repository, to further support the new Task 3. You can find it in the train_dev folders for each LP: github.com/WMT-QE-Task/wmt-qe-2024-data/tree/main/train_dev/Task_3.

Tasks organised

Task 1: Sentence-level quality estimation

We follow the trend of the previous years and we organise a sentence-level quality estimation subtask where the goal is to predict the quality score for each source-target sentence pair. Depending on the language pair, the participants will be asked to predict either the direct assessment (DA) score or the multi-dimensional quality metrics (MQM) score. In the case of English-Hindi participants can predict both scores. Detailed information about the language pairs, the annotation specifics, and the available training and development resources are available at wmt-qe-task.github.io/.

Find more details about the task in the Sentence-level quality estimation task description.

Task 2: Fine grained error detection

This is a word-level subtask where the goal is to predict the translation error spans as opposed to binary OK/BAD tasks as it is normally case with word-level quality estimation tasks. For this subtask we will use the error spans obtained from the MQM annotations. Participants will be asked to predict both the error span (start and end indices) as well as the error severity (major or minor) for each segment.

Find more details about the task in the Fine grained error detection task description.

Task 3: Quality-informed automatic post-editing (QEAPE)

This is a new task which proposes to combine quality estimation and automatic post-editing in order to correct the output of machine translation. In light of this, we invite participants to submit systems capable of automatically generating QE predictions for machine-translated text and the corresponding output corrections. The objective is to explore how quality estimates (possibly at different levels of granularity) can inform post-editing. For instance, global sentence-level QE annotations may guide more or less radical post-editing strategies, while word-level annotations can be used for fine-grained, pinpointed corrections. We also encourage approaches that leverage quality explanations generated by large language models. Although the task is focused on quality informed APE, we also allow participants to submit APE output without QE predictions to understand the impact of their QE system (but not outputs that do not contain the correction). Submission w/o QE predictions will also be considered official. The task focuses on two language pairs: English-Hindi and English-Tamil.

For more details see the QEAPE task description.

DEADLINES

Submission opening

22nd July 2024 (updated deadline)

Submission deadline

31st July 2024 (updated deadline)

Paper submission deadline to WMT

TBA (follows EMNLP)

WMT Notification of acceptance

TBA (follows EMNLP)

WMT Camera-ready deadline

TBA (follows EMNLP)

Conference

15-16 November, 2024

All deadlines are in AoE (Anywhere on Earth). Dates are specified with respect to EMNLP 2024.

Notes

  1. You need to register for each of the competitions to be able to submit your predictions. Clicking on the “My Submissions” tab will prompt you to register and we will approve your registration.

  2. As a reminder please use with a single account or organisation per team.

  3. If your submissions are stuck in “Submitted” phase at any point please notify us instead of resubmitting multiple times. We can delete stuck submissions and try to handle the issue as occasionally codabench faces issues with remote workers. See also: github.com/codalab/codabench/issues/1546

How to check your performance

During the competition:

Once the submission state is finished you can click on it and

  • For Task 1 and Task 2: Download the “Output from scoring step”,

  • For Task 3: Check the stdout log of the scoring logs

In both cases, you can also see the baseline performance. For the baselines we used:

You can also download your submission file itself by clicking on the desired submission at any stage.

After the competition:

The leaderboard will be visible, using your submission with the best average score (primary metric) across language pairs to guide the selection.

ORGANIZERS

  • Giuseppe Attanasio, Instituto de Telecomunicações, Instituto Superior Técnico, Universidade de Lisboa

  • Pushpak Bhattacharyya, Indian Institute of Technology Bombay, India

  • Frédéric Blain, Tilburg University, Netherlands

  • Rajen Chatterjee, Apple Inc, US

  • Sourabh Deoghare, Indian Institute of Technology Bombay, India

  • Catarina Farinha, Unbabel, Portugal

  • Markus Freitag, Google, US

  • Nuno M Guerreiro, Instituto de Telecomunicações, Unbabel, Instituto Superior Técnico, Centrale Supélec – Université Paris-Saclay

  • Diptesh Kanojia, University of Surrey, UK

  • Andre Martins: Instituto de Telecomunicações, Instituto Superior Técnico, Universidade de Lisboa (Lisbon ELLIS Unit), Unbabel

  • Matteo Negri, FBK, Italy

  • Constantin Orasan, University of Surrey, UK

  • Ricardo Rei, Unbabel, Portugal

  • José GC de Souza, Unbabel

  • Marco Turchi, Zoom, US

  • Chrysoula Zerva: Instituto de Telecomunicações, Instituto Superior Técnico, Universidade de Lisboa (Lisbon ELLIS Unit)

ASSOCIATED PROJECTS

Logo

STAY IN TOUCH