EMNLP 2025

TENTH CONFERENCE ON
MACHINE TRANSLATION (WMT25)

November 5-9, 2025
Suzhou, China
 
TRANSLATION TASKS: GENERAL MT (NEWS) •︎ INDIC MT •︎ TERMINOLOGY •︎ CREOLE MT •︎ MODEL COMPRESSION
EVALUATION TASKS: MT TEST SUITES •︎ (UNIFIED) MT EVALUATION
OTHER TASKS: OPEN DATA
MULTILINGUAL TASKS: MULTILINGUAL INSTRUCTION •︎ LIMITED RESOURCES SLAVIC LLM

ANNOUNCEMENTS

  • 2025-05-19: Submission guidelines announced

  • 2025-04-30: The shared task announced

DESCRIPTION

Driven by growing attention to sustainable AI, the Model Compression task focuses on making Large Language Models (LLMs) suitable for deployment in translation within resource-constrained environments. Its primary goal is to assess the potential of model compression techniques to reduce the size of large, inherently general-purpose LLMs, thereby ensuring a balance between practical deployability and strong performance in targeted translation scenarios.

Overall objectives of the task include:

  • Fostering research towards efficient, accessible, and sustainable LLM deployment for MT.

  • Establishing a shared evaluation framework to track advances in LLM model compression for MT across a broad range of languages.

  • Allowing for sound comparison with state-of-the-art MT solutions through shared evaluation protocols.

Despite its focus on model compression, the task is naturally aligned with the General MT shared task in terms of the language directions covered, the test data, and the protocols used for automatic MT quality evaluation. Additionally, the task follows the same timeline as the flagship WMT task.

We encourage participation from academic teams and industry players willing either to apply existing model compression methods to MT or to experiment with novel, groundbreaking techniques.

Language pairs

The first round of the task will concentrate on the same language pairs covered by the General MT task, namely:

  • Czech to Ukrainian

  • Czech to German

  • Japanese to Chinese

  • Bhojpuri to English

  • Maasai to English

  • English to Arabic

  • English to Chinese

  • English to Czech

  • English to Estonian

  • English to Icelandic

  • English to Japanese

  • English to Korean

  • English to Russian

  • English to Serbian

  • English to Ukrainian

Depending on the track chosen for participation, participants can either focus on a small, pre-defined subset of language pairs (constrained track) or on any of the language pairs in the above list (unconstrained track).

IMPORTANT DATES

All dates are at the end of the day for Anywhere on Earth (AoE)

Finalized task details

Mid May 2025

Test data released

26th June 2025

Translation submission deadline

3rd July 2025

System description abstract paper

10th July 2025

System description submission

14th August 2025

TASK DESCRIPTION

The task is to effectively reduce the size of a general-purpose LLM while maintaining a balance between compactness and MT performance. Accordingly, the key evaluation criteria are model size (defined by the memory usage), translation quality (automatically measured by the same LLM-as-a-judge framework of the General MT task), and inference speed (the total time required to process test data).

The evaluation will take all three criteria into account, with particular attention to the trade-offs between translation quality and model size, as well as translation quality and inference speed. Therefore, participants are invited to optimize for either speed or model size, although ideally both have to be optimized, while maintaining high translation quality.

Participants are offered two tracks: constrained and unconstrained. The constrained track ensures a level playing field by focusing on a specific model and language setting, allowing for directly comparable assessments. The unconstrained track provides the freedom to explore the compression of any model across the preferred language setting(s).

Constrained/Unconstrained track

The constrained track establishes uniform conditions for all participants, focusing on compressing a specific model while targeting a limited set of predefined languages from which participants can choose. The original model to be compressed is Aya Expanse 8B, chosen for its permissive license (CC-BY-NC 4.0) and its already advantageous trade-off between size and performance across the selected language combinations. Any model compression technique (e.g., pruning, quantization, distillation) is allowed, provided that the final compressed model remains closely derived from the original model. For instance, in the case of distillation, “student” models must be derived from Aya Expanse 8B (e.g., through pruning) to be eligible for the constrained track.

The selected language combinations are:

  • Czech to German

  • Japanese to Chinese

  • English to Arabic

Submissions are allowed for any of these language combinations, with separate system rankings published for each.

The unconstrained track allows participation with any model and any of the 15 language directions covered by the General MT task. Also in this case, separate rankings will be published for each language direction.

DATA

Data usage policies align with those of the General MT task. Participants are thus allowed to calibrate and fine-tune their compressed models using the publicly available data released for this year’s round, as well as test sets from past WMT editions.

EVALUATION

The evaluation will be carried out separately for each track (constrained/unconstrained) and language, and will consider three key dimensions:

  1. Quality, automatically assessed based on a panel of LLM-as-judges to minimize metric bias.

  2. Size of the model on disk and vRAM usage.

  3. Speed of the decoding while processing the test data.

Results will be reported in a table showing all metrics. In addition, differences between systems in terms of the quality-speed and quality-size trade-offs will be presented using Pareto frontier ranking, through quality-speed and quality-size graphs highlighting the models that are not outperformed in both dimensions at once.

Systems that achieve competitive performance compared to those in the General MT task may also be considered for inclusion in an additional human evaluation process.

Multiple submissions are allowed for each track and language direction. If participants submit multiple systems for the track and language, they must explicitly designate one as the PRIMARY submission, while all others will be considered CONTRASTIVE submissions. If no submission is marked as PRIMARY, the most recent one (determined by the file timestamp) will automatically be used as the PRIMARY submission.

PARTICIPATION FORMAT

A submission to the shared task requires both a Dockerfile and a Docker image containing all necessary software and model files for translation, following the requirements below.

  • Include your team’s short name (no spaces) in both the Dockerfile and image name, e.g.: $Team-Dockerfile and $Team-dockerimage.tar

  • The image must contain a model directory at /model/$submission_id with all required files (model, vocabulary, etc.). Please keep $submission_id short as it will appear in reports.

  • You may include additional files, but do not use any paths starting with /wmt—these are reserved for the task evaluation.

  • Each model directory must include a run.sh script with the following interface:

/model/$submission_id/run.sh $lang_pair $batch_size < input.txt > output.txt
  - $lang_pair: Language pair in the format `ces-deu`
  - $batch_size: Positive integer
  - The script must run without accessing Internet.

Example Usage

image_name="$(docker load -i ${image_file_path} | cut -d ' ' -f 3)"
container_id="$(docker run -itd ${opt_memory} --memory-swap=0 ${image_name} bash)"
(time docker exec -i "${container_id}" /model/$submission_id/run.sh $lang_pair $batch_size < input.txt > output.txt 2> stderr.txt)

Baseline models and a Dockerfile for the constrained track are provided at github.com/thammegowda/wmt25-model-compression contains. The Dockerfile demonstrates an example submission for the original (bf16), and two compressed models (8bit and 4bit). The given Dockerfile can be modified to create submission.

Submission details will be collected via a web form. The link to the form will be provided here closer to the submission deadline.

CONTACT

For queries, please use the mailing list or contact Marco Gaido and Matteo Negri.

Organizers

  • Marco Gaido - mgaido@fbk.eu

  • Matteo Negri - negri@fbk.eu

  • Roman Grundkiewicz

  • TG Gowda