Multilingual Instruction Shared Task

EMNLP 2026

ELEVENTH CONFERENCE ON
MACHINE TRANSLATION (WMT26)

28-29 October, 2026
Budapest, Hungary HOME

TRANSLATION TASKS:	GENERAL MT •︎ INDIC MT •︎ ARABIC-ASIAN MT •︎ CHINESE-SOUTHEAST ASIAN MT •︎ TERMINOLOGY •︎ MODEL COMPRESSION •︎ CREOLE MT •︎ VIDEO SUBTITLE TRANSLATION
EVALUATION TASKS:	MT TEST SUITES •︎︎ AUTOMATED MT EVALUATION
OTHER TASKS:	OPEN DATA •︎ MULTILINGUAL INSTRUCTION •︎ LIMITED RESOURCES LLM

Announcements

January 2026: Multilingual Instruction Shared Task under preparation, will take place with WMT 2026

Important Dates

Train data release

coming soon

Dev/sample data release

late May

Test release

late June

Submission deadline

early August

Paper deadline

mid September

Description

The Multilingual Instruction Shared Task (MIST) focuses on evaluating and advancing models capable of following instructions across multiple languages and diverse task types. The objective, among others, is to establish a comprehensive evaluation framework that assesses multilingual models on a range of instruction-following capabilities.

The selected tasks are:

Comprehension

Context-based question answering
Summarization into English

Generation

Open-ended generation
Summarization into target language

Cross-lingual capabilities

Cross-lingual summarization

The general format will be to provide LLM outputs given fixed prompts. The list of languages is being finalized and will be announced soon.

We encourage participation from both research groups and industry practitioners.

Tracks

There is a single constrained track: models must have fewer than 10B parameters.

Evaluation

Submissions will be evaluated using a mix of automatic metrics and human evaluation. Further details will be announced closer to the test release.

Submission Format

Details on the submission format will be announced soon.

Individual Subtasks

Task: Context-based Question Answering

Given a document in language X, the model is asked questions about the content of that document, also in language X. This task tests whether the model can comprehend and reason over text in a given language.

Train and dev data will be prepared from BELEBELE, a machine reading comprehension dataset spanning 122 language variants, where each question is linked to a short passage.

Example 1. Example from BELEBELE (Arabic)

Passage: وحصل الفلم اللي شاركو بي رايان غوسلينغ وإيما ستون، ترشيحات بجميع الفئات الرئيسية. حصل جوسلينغ وستون ترشيحات لأفضل ممثل وممثلة على التوالي.

Question: أي جائزة ترشحت إلها أيما ستون؟

Answer: افضل ممثلة

Task: Summarization into English

The model is provided a document in a non-English language and asked to summarize it in English. This task tests whether the model can understand content written in another language.

Train and dev data will be prepared from CrossSum, a large-scale cross-lingual summarization dataset comprising 1.70 million article-summary pairs across 1,500+ language pairs constituting 45 languages.

Example 2. Example from CrossSum (Japanese source → English summary)

Source (Japanese): 英オックスフォード大学の研究チームによると、低用量のデキサメタゾンは新型ウイルスとの戦いで画期的な突破口になる。新型コロナウイルスに対し、様々な既存の治療法の効果を試す世界的規模の臨床試験の一貫として、デキサメタゾンが試された。その結果、人工呼吸器を必要とする重症患者の致死率が3割下がり…

Target summary (English): Scientists say the cheap and widely available drug dexamethasone will help save the lives of patients who are seriously ill with coronavirus.

Task: Open-ended Generation

Systems will be tested on miscellaneous prompts in various languages, targeting instruction following, on-targetness of the output language, and overall usefulness and naturalness of the response. Outputs will be evaluated by both humans and automatic metrics.

Train and dev data will be prepared from the Aya Dataset, a multilingual instruction fine-tuning dataset containing 204k human-annotated prompt-completion pairs across many languages.

Example 3. Example from Aya Dataset (French)

Prompt: Quels président des États-Unis ne s’est jamais marié ?

Response: James Buchanan est le seul président qui ne s’est jamais marié.

Example 4. Example from Aya Dataset (Turkish)

Prompt: Yayın balığının tadı nasıl?

Response: Suni göllerde de üretilen yayın balığı yenilen bir balıktır. Ama yaşlı balıkların eti tatlı olmadığı için belli bir yaşa kadar yenilmesi gerekir; o zaman tadı dana etini andırır.

Task: Summarization into Target Language

The model is asked to summarize provided content in a specified target language. This task tests whether the model can fluently articulate and produce content in the target language.

Train and dev data will be prepared from CrossSum.

Example 5. Example from CrossSum (English source → Arabic summary)

Source (English): Scientists say the cheap and widely available drug dexamethasone will help save the lives of patients who are seriously ill with coronavirus. The results showed the death rate of critically ill patients on ventilators fell by 30%, and by 20% for those requiring oxygen…

Target summary (Arabic): يقول العلماء إن عقار ديكساميثازون الرخيص وسهل الوصول سيساعد في إنقاذ حياة المرضى الذين يعانون من مرض خطير بسبب فيروس كورونا.

Task: Cross-lingual Summarization

The model is provided a document in one language and asked to summarize it in a different language. This task tests both comprehension of the source language and fluent, faithful generation in the target language.

Train and dev data will be prepared from CrossSum and MCIF, a multilingual human-annotated benchmark based on scientific talks designed to evaluate instruction-following in cross-lingual settings, spanning English, German, Italian, and Chinese.

Example 6. Example from CrossSum (Japanese source → Vietnamese summary)

Source (Japanese): 英オックスフォード大学の研究チームによると、低用量のデキサメタゾンは新型ウイルスとの戦いで画期的な突破口になる…

Target summary (Vietnamese): Các nhà khoa học cho biết loại thuốc rẻ tiền và dễ kiếm dexamethasone sẽ giúp cứu sống những bệnh nhân đang trong tình trạng nguy kịch vì virus corona.

Contact

For questions, please contact Patrícia Schmidtová at schmidtova@ufal.mff.cuni.cz

Organizers

Pinzhen Chen
Patrícia Schmidtová
Katia Artemova
Seth Aycock
Niyati Bafna
Tom Kocmi
Philipp Koehn
Danni Liu
Nam Luu
Sara Papi
Eduardo Sánchez
Mariya Shmatova
Vilém Zouhar

ELEVENTH CONFERENCE ON MACHINE TRANSLATION (WMT26)