Shared Task: Model Compression

EMNLP 2026

ELEVENTH CONFERENCE ON
MACHINE TRANSLATION (WMT26)

28-29 October, 2026
Budapest, Hungary HOME

TRANSLATION TASKS:	GENERAL MT •︎ INDIC MT •︎ ARABIC-ASIAN MT •︎ CHINESE-SOUTHEAST ASIAN MT •︎ TERMINOLOGY •︎ MODEL COMPRESSION •︎ CREOLE MT •︎ VIDEO SUBTITLE TRANSLATION
EVALUATION TASKS:	MT TEST SUITES •︎︎ AUTOMATED MT EVALUATION
OTHER TASKS:	OPEN DATA •︎ MULTILINGUAL INSTRUCTION •︎ LIMITED RESOURCES LLM

ANNOUNCEMENTS

2026-07-01: Submission deadline postponed to July 15th, 2026!
2026-06-18: Submission information released
2026-04-10: The shared task announced

OVERVIEW

Driven by growing attention to sustainable AI, the Model Compression task focuses on making Large Language Models (LLMs) suitable for machine translation deployment within resource-constrained environments. Its primary goal is to assess the potential of model compression techniques to reduce the size of large, inherently general-purpose LLMs, thereby ensuring a balance between practical deployability and strong performance in targeted translation scenarios.

Overall objectives of the task include:

Fostering research towards efficient, accessible, and sustainable LLM deployment for MT.
Establishing a shared evaluation framework to track advances in LLM model compression for MT across a broad range of languages.
Allowing for a sound comparison with state-of-the-art MT solutions through shared evaluation protocols.

Despite its focus on model compression, the task is closely aligned with the General MT shared task in terms of the language directions covered, the test data, and the protocols used for automatic MT quality evaluation. Additionally, the task follows the same timeline as the flagship WMT task.

We encourage participation from academic teams and industry players willing either to apply existing model compression methods to MT or to experiment with novel, groundbreaking techniques.

Key Changes for 2026

Submission Format: Participants are required to submit their complete systems (including source code, environment installation instructions, and the model itself). These will be evaluated by processing the selected test sets on a predefined GPU hardware configuration hosted on the task organizers' premises.
Tracks: The constrained and unconstrained tracks share the same language directions but differ regarding the models subject to compression (Gemma 3 12B for constrained track, any model with an original size below 20B parameters for the unconstrained track).

Language pairs

This round of the task will concentrate on a subset of the languages covered by the General MT task, namely:

Czech to German
English to Chinese, Simplified
English to Arabic, Egyptian

Participants can focus on any of the above language pairs.

IMPORTANT DATES

All dates are at the end of the day Anywhere on Earth (AoE).

Finalized task details

April 2026

Test data released

June 18, 2026

Model Submission deadline

July 15, 2026

System description paper submission to WMT26

In line with WMT26

Camera-ready submission to WMT26

In line with WMT26

WMT 2026 Conference in Budapest, Hungary

November, 2026

TASK DESCRIPTION

The task is to effectively reduce the size of a general-purpose LLM while maintaining a balance between compactness and MT performance. Accordingly, the key evaluation criteria are model size (defined by the memory usage), translation quality (automatically measured in different ways, e.g. Comet, MetricX, and an LLM-as-a-judge framework), and inference speed (the total time required to process the test data).

The evaluation will take all three criteria into account, with particular attention to the trade-offs between translation quality and model size, as well as translation quality and inference speed. Therefore, participants are invited to optimize for either speed or model size, ideally both, while maintaining high translation quality.

Participants are offered two tracks: constrained and unconstrained. While both tracks share the same language directions, they differ regarding the models subject to compression and the data usable for calibration and fine-tuning.

Constrained/Unconstrained track

Both tracks focus on the following language combinations:

Czech to German
English to Chinese, Simplified
English to Arabic, Egyptian

Submissions to each track are allowed for any of these language combinations, with separate system rankings published for each.

The CONSTRAINED track establishes uniform conditions for all participants by focusing on compressing a specific model, and using a predefined pool of data for calibration and fine-tuning (if needed). The original model to be compressed is Gemma 3 12B, chosen for its permissive license and its strong trade-off between size and performance across the selected language combinations. Any model compression technique (e.g., pruning, quantization, distillation) is allowed, provided that the final compressed model remains directly derived from the original model. For instance, in the case of distillation, “student” models must be derived from Gemma 3 12B (e.g., through pruning) to be eligible for the constrained track. Data usage policies align with those of the General MT task. Participants are thus allowed to calibrate and fine-tune their compressed models using the publicly available data released for this year’s round, as well as test sets from past WMT editions.

The UNCONSTRAINED track maintains the same language setting but offers the freedom to: 1) explore the compression of any model, provided that its original size is below 20B parameters, and 2) use any additional data for calibration and fine-tuning. A non-exhaustive list of suggested LLMs falling into this category includes: Aya Expanse 8B, Cohere R 7B, GPT-OSS 20B, Llama 3.1 8B, Qwen 2.5 7B, Ministral 3 14B, Mistral 7B, EuroLLM.

EVALUATION

The evaluation will be carried out separately for each track (constrained/unconstrained) and language, considering three key dimensions:

Quality, automatically assessed in different ways, e.g. Comet, MetricX, and an LLM-as-a-judge framework.
Size of the model on disk and vRAM usage.
Speed of the decoding while processing the test data.

All evaluations will be performed on a standardized hardware environment comprising a single NVIDIA H100 GPU with up to 80 GB of VRAM, hosted on an Intel x86_64 system running Ubuntu 24.04. The task organizers will run the submitted models on the language pairs selected by the participants using this predefined GPU configuration. Results will be reported in a table showing automatic metric scores. In addition, differences between systems in terms of quality-speed and quality-size trade-offs will be presented using Pareto frontier ranking, with quality-speed and quality-size graphs highlighting the models that are not outperformed in both dimensions simultaneously.

Multiple submissions are allowed for each track and language direction. If participants submit multiple systems for the same track and language, they must explicitly designate one as the PRIMARY submission, while all others will be considered CONTRASTIVE submissions. If no submission is marked as PRIMARY, the most recent one (determined by the file timestamp) will automatically be selected as the PRIMARY submission.

PARTICIPATION FORMAT

A valid submission to the shared task requires:

Filling out a web form with all details about the submission (submit multiple forms for each submission in case of participation in different tracks, languages, or with different systems);
Sharing the compressed models to be run in the evaluation environment provided by the organizers. Specifically, for each submission, participants must provide the source code to run the inference, environment installation instructions (e.g., a requirements.txt file), and the model itself (e.g., via a Hugging Face model repository link).

The submission form can be found at this link.

Each submission should follow the rules described in github.com/thammegowda/wmt26-model-compression#submission-contract. The model evaluation will be run by the organizers on data formatted like the one prepared in the repo for last year test set, which can be used by participants as validation/development data for this year submissions: github.com/thammegowda/wmt26-model-compression/tree/main/data/wmt25. Please notice that what is passed to the run script for generation is anyway a list of source text lines, not a jsonl file (see github.com/thammegowda/wmt26-model-compression/blob/main/modelzip/setup.py and github.com/thammegowda/wmt26-model-compression/blob/main/modelzip/config.py). Participants can anyway follow the provided baselines and treat them as exaples for preparing their submission: github.com/thammegowda/wmt26-model-compression/tree/main/submissions.

CONTACT

For queries, please use the mailing list or contact Marco Gaido and Matteo Negri.

Organizers

Marco Gaido - mgaido@fbk.eu
Matteo Negri - negri@fbk.eu
Roman Grundkiewicz
TG Gowda

ELEVENTH CONFERENCE ON MACHINE TRANSLATION (WMT26)