EMNLP 2024


November 15-16, 2024
Miami, Florida, USA

Chat Task website


Translating conversational text, in particular customer support chats, is an important and challenging application for machine translation technology. This type of content has so far not been extensively explored in prior MT research, largely due to the lack of publicly available data sets. Prior related work has mostly focused on movie subtitles and European Parliament speeches. In contrast to the translation of the news stories, software manuals, biomedical text, etc. in which the text is carefully authored and well formated, chat conversations are less planned, more informal, and often ungrammatical. Further, such conversations are usually characterized by shorter and simpler sentences and contain more pronouns. In effect, the task of translating chat conversations can be regarded as a two-in-one task, modelling both dialogue and translation at the same time.

Machine translation systems trained for chat conversations are expected to deal with the task’s inherent challenges and characteristics, such as (among others):

  • The importance of using extended context for translating the segments and modelling dialogue. E.g. Agreement and anaphora resolution requiring inter-sentential modelling: "I had a flight with AirLiberty for next Saturday from Lisbon to Abu Dhabi. Could you please change it to next Monday?"

  • Robustness to noisy input. Chat text is usually noisier, containing misspelled words, wrong casings, incomplete sentences, etc.

  • Consistent and coherent translation throughout the entire conversation.

  • Modeling of all the speakers and language directions involved in the conversation, where each can be regarded as a different sub-domain (depending on the task).

The primary goal of this Chat shared task is to encourage participants to train and test models specific for bilingual chat conversations in a corpus composed of original bilingual costumer support conversations.

Language Pairs

  • en⇔de

  • en⇔fr

  • en⇔pt_br

  • en⇔ko

  • en⇔nl

We encourage participants to use the bilingual context in the translation models submissions.

Have questions or suggestions? Feel free to Contact Us!


Training set ready to download

March 2024

Validation set ready to download

May 2024

Test set ready to download

18th July 2024

Submission deadline for Chat task

26th July 2024

Paper submission deadline to WMT

20th August 2024

WMT Notification of acceptance

20th September 2024

WMT Camera-ready deadline

3rd October 2024


15-16 November 2024


The goals of chat translation shared task are to provide the common ground for:

  • Studying the applicability of machine translation systems for translating conversational text;

  • Investigating the impact of context in a conversation’s translation;

  • Studying the feasibility of an all-in-one multi-lingual system;

Task Description

A critical challenge faced by international companies today is delivering customer support in several different languages. One solution to this challenge is centralizing support with English speaking agents and having a translation layer in the middle to translate from the customer’s language into the agent’s (English) and vice versa.


The data used in this shared task is composed of genuine bilingual costumer support conversations. One of the main challenges of this domain is the lack of real bilingual training data available.

Please note, that all the data released for the WMT24 Chat Translation task is under the license of CC-BY-NC-4.0 and can be freely used for research purposes only. Please note that, as the license states, no commercial uses are permitted for this corpus. We just ask that you cite the WMT24 Chat Translation Task overview paper. Any other use is not permitted unless previous written authorization is given by Unbabel.


The Training sets of all the language pairs are available in the github repository. The files contain the bilingual conversations by a customer and an agent in their original language.

Note: for training and validation purposes you can use the training data of the general task (including the data of the previous editions), the data of the other tracks (eg. biomedical) if you find them useful for this task, and the other corpora (either parallel or monolingual) that are publicly available for the research purposes, like most of the corpora available on OPUS, as well as the data of the previous edition of the Chat Translation Task.

Baseline Scores on DevSets

Baseline scores on Dev data will be avaiable by May 2024.

Test Sets (Evaluation Data)

Test Sets will be available by June/July 2024.

Submission Format



The Systems' performance will be evaluated along two domensions:

1) Human evaluation

  • Human evaluation that utilizes the conversational context to rate the translation quality on a direct assessment scale of [0-100].

2) Automatic evaluation

  • Overall translation quality via chrF, BLEU, COMET-22 and Contextual-Comet-QE.

  • Accuracy performance (F1 scores) of system’s outputs compared to the reference for words tagged with context-dependent phenomena (lexical cohesion, formality, pronoun resolution and verb forms).

The human document-level evaluation will be performed only on the primary submission and will be used for the official rankings of the participating teams accounting for both directions. The automatic will be used as the secondary metric.


  • Wafaa Mohammed, UvA

  • Ana C. Farinha, Unbabel

  • M. Amin Farajian, Unbabel

  • José Souza, Unbabel

  • Sweta Agrawal, Instituto de Telecomunicações

  • Vera Cabarrão, Unbabel

  • Bryan Eikema, UvA

Previous Versions