Shared Task: Model Compression

EMNLP 2025

TENTH CONFERENCE ON
MACHINE TRANSLATION (WMT25)

November 8-9, 2025
Suzhou, China

HOME •︎ PROGRAM •︎ PAPERS •︎ AUTHORS

TRANSLATION TASKS:	GENERAL MT (NEWS) •︎ INDIC MT •︎ TERMINOLOGY •︎ CREOLE MT •︎ MODEL COMPRESSION
EVALUATION TASKS:	MT TEST SUITES •︎ (UNIFIED) MT EVALUATION
OTHER TASKS:	OPEN DATA
MULTILINGUAL TASKS:	MULTILINGUAL INSTRUCTION •︎ LIMITED RESOURCES SLAVIC LLM

ANNOUNCEMENTS

2025-07-16: Submission deadline further postponed to July 21, 2025!
2025-06-20: Submission web form available at this link, (see below for details)
2025-06-16: Submission period extended to four weeks in total: from June 19, 2025 to July 17, 2025
2025-06-10: We extended the submission period by a week: test data release brought forward to June 19, 2025
2025-06-10: Space available for uploading submissions available upon request (see below).
2025-05-19: Submission guidelines announced
2025-04-30: The shared task announced

DESCRIPTION

Driven by growing attention to sustainable AI, the Model Compression task focuses on making Large Language Models (LLMs) suitable for deployment in translation within resource-constrained environments. Its primary goal is to assess the potential of model compression techniques to reduce the size of large, inherently general-purpose LLMs, thereby ensuring a balance between practical deployability and strong performance in targeted translation scenarios.

Overall objectives of the task include:

Fostering research towards efficient, accessible, and sustainable LLM deployment for MT.
Establishing a shared evaluation framework to track advances in LLM model compression for MT across a broad range of languages.
Allowing for sound comparison with state-of-the-art MT solutions through shared evaluation protocols.

Despite its focus on model compression, the task is naturally aligned with the General MT shared task in terms of the language directions covered, the test data, and the protocols used for automatic MT quality evaluation. Additionally, the task follows the same timeline as the flagship WMT task.

We encourage participation from academic teams and industry players willing either to apply existing model compression methods to MT or to experiment with novel, groundbreaking techniques.

Language pairs

The first round of the task will concentrate on the same language pairs covered by the General MT task, namely:

Czech to Ukrainian
Czech to German
Japanese to Chinese
Bhojpuri to English
Maasai to English
English to Arabic
English to Chinese
English to Czech
English to Estonian
English to Icelandic
English to Japanese
English to Korean
English to Russian
English to Serbian
English to Ukrainian

Depending on the track chosen for participation, participants can either focus on a small, pre-defined subset of language pairs (constrained track) or on any of the language pairs in the above list (unconstrained track).

IMPORTANT DATES

All dates are at the end of the day for Anywhere on Earth (AoE)

Finalized task details

Mid May 2025

Test data released

19th June 2025

Submission deadline

21st July 2025

System description paper submission to WMT25

in-line with WMT25

Camera-ready submission to WMT25

in-line with WMT25

WMT 2025 Conference in Suzhou, China

November 5-9, 2025

TASK DESCRIPTION

The task is to effectively reduce the size of a general-purpose LLM while maintaining a balance between compactness and MT performance. Accordingly, the key evaluation criteria are model size (defined by the memory usage), translation quality (automatically measured by the same LLM-as-a-judge framework of the General MT task), and inference speed (the total time required to process test data).

The evaluation will take all three criteria into account, with particular attention to the trade-offs between translation quality and model size, as well as translation quality and inference speed. Therefore, participants are invited to optimize for either speed or model size, although ideally both have to be optimized, while maintaining high translation quality.

Participants are offered two tracks: constrained and unconstrained. The constrained track ensures a level playing field by focusing on a specific model and language setting, allowing for directly comparable assessments. The unconstrained track provides the freedom to explore the compression of any model across the preferred language setting(s).

Constrained/Unconstrained track

The constrained track establishes uniform conditions for all participants, focusing on compressing a specific model while targeting a limited set of predefined languages from which participants can choose. The original model to be compressed is Aya Expanse 8B, chosen for its permissive license (CC-BY-NC 4.0) and its already advantageous trade-off between size and performance across the selected language combinations. Any model compression technique (e.g., pruning, quantization, distillation) is allowed, provided that the final compressed model remains closely derived from the original model. For instance, in the case of distillation, “student” models must be derived from Aya Expanse 8B (e.g., through pruning) to be eligible for the constrained track.

The selected language combinations are:

Czech to German
Japanese to Chinese
English to Arabic

Submissions are allowed for any of these language combinations, with separate system rankings published for each.

The unconstrained track allows participation with any model and any of the 15 language directions covered by the General MT task. Also in this case, separate rankings will be published for each language direction.

DATA

Data usage policies align with those of the General MT task. Participants are thus allowed to calibrate and fine-tune their compressed models using the publicly available data released for this year’s round, as well as test sets from past WMT editions.

EVALUATION

The evaluation will be carried out separately for each track (constrained/unconstrained) and language, and will consider three key dimensions:

Quality, automatically assessed based on a panel of LLM-as-judges to minimize metric bias.
Size of the model on disk and vRAM usage.
Speed of the decoding while processing the test data.

The evaluation will be conducted using a single Nvidia A100 GPU with up to 80GB of VRAM. Results will be reported in a table showing all metrics. In addition, differences between systems in terms of the quality-speed and quality-size trade-offs will be presented using Pareto frontier ranking, through quality-speed and quality-size graphs highlighting the models that are not outperformed in both dimensions at once.

Systems that achieve competitive performance compared to those in the General MT task may also be considered for inclusion in an additional human evaluation process.

Multiple submissions are allowed for each track and language direction. If participants submit multiple systems for the track and language, they must explicitly designate one as the PRIMARY submission, while all others will be considered CONTRASTIVE submissions. If no submission is marked as PRIMARY, the most recent one (determined by the file timestamp) will automatically be used as the PRIMARY submission.

PARTICIPATION FORMAT

A valid submission to the shared task requires:

Filling out a web form with all details about the submission (submit multiple forms for multiple submissions in case of participation in different tracks, languages, or with different systems);
Creating a Docker image containing all necessary software and model files for translation.

Submission form

The web form can be found at this link

Please fill in all the mandatory fields, specifying the main submission details. These include a "Link to Docker image", which must remain valid for at least two months and be accessible for downloading your system.

Note: If your team lacks sufficient storage space for your submission(s), or prefers to upload them externally to your institution, please contact us (mgaido@fbk.eu, negri@fbk.eu). We’ll provide storage space for that!

Dockerfile image

The Docker image should include all necessary software and model files for translation, following the requirements below.

Include your team’s short name (no spaces) in the image name, e.g.: $Team-dockerimage.tar
The image must contain a model directory at /model/$submission_id with all required files (model, vocabulary, etc.). Please keep $submission_id short as it will appear in reports.
You may include additional files, but do not use any paths starting with /wmt—these are reserved for the task evaluation.
Each model directory must include a run.sh script with the following interface:

/model/$submission_id/run.sh $lang_pair $batch_size < input.txt > output.txt
  - $lang_pair: Language pair in the format `ces-deu`
  - $batch_size: Positive integer
  - The script must run without accessing Internet.

Example Usage

image_name="$(docker load -i ${image_file_path} | cut -d ' ' -f 3)"
container_id="$(docker run -itd ${opt_memory} --memory-swap=0 ${image_name} bash)"
(time docker exec -i "${container_id}" /model/$submission_id/run.sh $lang_pair $batch_size < input.txt > output.txt 2> stderr.txt)

Baseline models and a Dockerfile for the constrained track are provided at github.com/thammegowda/wmt25-model-compression. The Dockerfile demonstrates an example submission for the original (bf16), and two compressed models (8bit and 4bit). The given Dockerfile can be modified to create your submission.

CONTACT

For queries, please use the mailing list or contact Marco Gaido and Matteo Negri.

Organizers

Marco Gaido - mgaido@fbk.eu
Matteo Negri - negri@fbk.eu
Roman Grundkiewicz
TG Gowda

TENTH CONFERENCE ON MACHINE TRANSLATION (WMT25)