Announcements
-
January 2026: Multilingual Instruction Shared Task under preparation, will take place with WMT 2026
Important Dates
Train data release |
coming soon |
Dev/sample data release |
late May |
Test release |
late June |
Submission deadline |
early August |
Paper deadline |
mid September |
Description
The Multilingual Instruction Shared Task (MIST) focuses on evaluating and advancing models capable of following instructions across multiple languages and diverse task types. The objective, among others, is to establish a comprehensive evaluation framework that assesses multilingual models on a range of instruction-following capabilities.
The selected tasks are:
Comprehension
-
Context-based question answering
-
Summarization into English
Generation
-
Open-ended generation
-
Summarization into target language
Cross-lingual capabilities
-
Cross-lingual summarization
The general format will be to provide LLM outputs given fixed prompts. The list of languages is being finalized and will be announced soon.
We encourage participation from both research groups and industry practitioners.
Tracks
There is a single constrained track: models must have fewer than 10B parameters.
Evaluation
Submissions will be evaluated using a mix of automatic metrics and human evaluation. Further details will be announced closer to the test release.
Submission Format
Details on the submission format will be announced soon.
Individual Subtasks
Task: Context-based Question Answering
Given a document in language X, the model is asked questions about the content of that document, also in language X. This task tests whether the model can comprehend and reason over text in a given language.
Train and dev data will be prepared from BELEBELE, a machine reading comprehension dataset spanning 122 language variants, where each question is linked to a short passage.
Passage: وحصل الفلم اللي شاركو بي رايان غوسلينغ وإيما ستون، ترشيحات بجميع الفئات الرئيسية. حصل جوسلينغ وستون ترشيحات لأفضل ممثل وممثلة على التوالي.
Question: أي جائزة ترشحت إلها أيما ستون؟
Answer: افضل ممثلة
Task: Summarization into English
The model is provided a document in a non-English language and asked to summarize it in English. This task tests whether the model can understand content written in another language.
Train and dev data will be prepared from CrossSum, a large-scale cross-lingual summarization dataset comprising 1.70 million article-summary pairs across 1,500+ language pairs constituting 45 languages.
Source (Japanese): 英オックスフォード大学の研究チームによると、低用量のデキサメタゾンは新型ウイルスとの戦いで画期的な突破口になる。新型コロナウイルスに対し、様々な既存の治療法の効果を試す世界的規模の臨床試験の一貫として、デキサメタゾンが試された。その結果、人工呼吸器を必要とする重症患者の致死率が3割下がり…
Target summary (English): Scientists say the cheap and widely available drug dexamethasone will help save the lives of patients who are seriously ill with coronavirus.
Task: Open-ended Generation
Systems will be tested on miscellaneous prompts in various languages, targeting instruction following, on-targetness of the output language, and overall usefulness and naturalness of the response. Outputs will be evaluated by both humans and automatic metrics.
Train and dev data will be prepared from the Aya Dataset, a multilingual instruction fine-tuning dataset containing 204k human-annotated prompt-completion pairs across many languages.
Prompt: Quels président des États-Unis ne s’est jamais marié ?
Response: James Buchanan est le seul président qui ne s’est jamais marié.
Prompt: Yayın balığının tadı nasıl?
Response: Suni göllerde de üretilen yayın balığı yenilen bir balıktır. Ama yaşlı balıkların eti tatlı olmadığı için belli bir yaşa kadar yenilmesi gerekir; o zaman tadı dana etini andırır.
Task: Summarization into Target Language
The model is asked to summarize provided content in a specified target language. This task tests whether the model can fluently articulate and produce content in the target language.
Train and dev data will be prepared from CrossSum.
Source (English): Scientists say the cheap and widely available drug dexamethasone will help save the lives of patients who are seriously ill with coronavirus. The results showed the death rate of critically ill patients on ventilators fell by 30%, and by 20% for those requiring oxygen…
Target summary (Arabic): يقول العلماء إن عقار ديكساميثازون الرخيص وسهل الوصول سيساعد في إنقاذ حياة المرضى الذين يعانون من مرض خطير بسبب فيروس كورونا.
Task: Cross-lingual Summarization
The model is provided a document in one language and asked to summarize it in a different language. This task tests both comprehension of the source language and fluent, faithful generation in the target language.
Train and dev data will be prepared from CrossSum and MCIF, a multilingual human-annotated benchmark based on scientific talks designed to evaluate instruction-following in cross-lingual settings, spanning English, German, Italian, and Chinese.
Source (Japanese): 英オックスフォード大学の研究チームによると、低用量のデキサメタゾンは新型ウイルスとの戦いで画期的な突破口になる…
Target summary (Vietnamese): Các nhà khoa học cho biết loại thuốc rẻ tiền và dễ kiếm dexamethasone sẽ giúp cứu sống những bệnh nhân đang trong tình trạng nguy kịch vì virus corona.
Contact
For questions, please contact Patrícia Schmidtová at schmidtova@ufal.mff.cuni.cz
Organizers
-
Pinzhen Chen
-
Patrícia Schmidtová
-
Katia Artemova
-
Seth Aycock
-
Niyati Bafna
-
Tom Kocmi
-
Philipp Koehn
-
Danni Liu
-
Nam Luu
-
Sara Papi
-
Eduardo Sánchez
-
Mariya Shmatova
-
Vilém Zouhar