Multitask LLMs with Limited Resources

EMNLP 2026

ELEVENTH CONFERENCE ON
MACHINE TRANSLATION (WMT26)

28-29 October, 2026
Budapest, Hungary HOME

TRANSLATION TASKS:	GENERAL MT •︎ INDIC MT •︎ ARABIC-ASIAN MT •︎ CHINESE-SOUTHEAST ASIAN MT •︎ TERMINOLOGY •︎ MODEL COMPRESSION •︎ CREOLE MT •︎ VIDEO SUBTITLE TRANSLATION
EVALUATION TASKS:	MT TEST SUITES •︎︎ AUTOMATED MT EVALUATION
OTHER TASKS:	OPEN DATA •︎ MULTILINGUAL INSTRUCTION •︎ LIMITED RESOURCES LLM

Announcements

2026-05-08 Shared task details published
2026-04-30 Shared task summary published
2026-03-21 Shared task announced

Summary

This Shared Task follows the WMT2025 Shared Task on LLMs with Limited Resources for Slavic Languages: MT and QA. We add three new tasks to last year’s edition: spell-checking, grammar-checking, and maths reasoning. The GitHub repository will contain the released datasets.

The novelties (in blue) are as follows:

More tasks to perform jointly: Machine Translation, Question Answering, Spell Checking, Grammar Checking, and Maths Reasoning
Spell checking: finding the spelling error in a sentence (i.e., typo) and correcting it (there might also be no error in the sentence)
Grammar checking: finding the grammatical error in a sentence and correcting it (there might also be no error in the sentence)
Maths Reasoning: finding the correct answer to maths problems from two difficulty levels of Qwen PolyMath
Sorbian MT specific: one model for all six translation directions of the three language pairs: Upper Sorbian–German, Lower Sorbian–German, and Upper Sorbian–Lower Sorbian
Merging the two Sorbian language tracks for a unified model for both languages
Upper Sorbian QA specific: new hidden QA dataset for the test phase
Ukrainian QA specific: additionally considering the Massive Multitask Language Understanding (MMLU) in Ukrainian
Updated model from the Qwen family: we restrict the model to Qwen3.5 2B to remain below the 3B threshold.
Submission of the model: either publicly on HuggingFace (recommended, when possible) or privately to us (we will not publish it) for hidden evaluation datasets.

What does not change from last year:

Three Slavic languages: Ukrainian, Upper Sorbian, and Lower Sorbian
All previous MT language pairs are kept: English to Ukrainian, Czech to Ukrainian, German to Upper Sorbian, and German to Lower Sorbian
Additional data can be used for training, if publicly available (for reproducibility).

Main Task Description

We present a shared task to train LLMs with Limited Resources for three Slavic languages: Ukrainian (uk), Upper Sorbian (hsb), and Lower Sorbian (dsb).

Ukrainian has roughly 40 million first-language (L1) speakers spread all over the world and is a mid-resource language in NLP. Upper and Lower Sorbian are very low-resource, Slavic minority languages, spoken in the eastern part of Germany, with 30k and 7k L1 speakers, respectively. This task aims to test and improve the performance of LLMs on these languages.

Models will be tested jointly on five tasks: machine translation, multiple-choice question answering, spell checking, grammar checking, and maths reasoning. Thus, we want to observe the synergy between the different tasks in LLMs: How does training for MT impact performance on auxiliary tasks, such as question answering? Is it possible to improve machine translation while keeping stable capabilities on other tasks? To what extent is linguistic knowledge gained from one task (e.g., grammar checking) beneficial to machine translation?

We set this shared task in a restricted context with limited resources: the base LLM is fixed to the Qwen 3.5 model with 2B parameters.

Submissions tracks

Participants may submit outputs for one or both leaderboards: (i) Ukrainian track (uk) and (ii) Sorbian languages track (hsb & dsb).

For each track, you must submit answers for all five tasks: MT, QA, SC, GC, and MR. You MAY NOT submit outputs only for MT or only for one of the tasks. Only submissions that follow this rule will count towards the final leaderboard.

The submissions for the MT, QA, SC, GC, and MR tasks must be generated from the same model. In order to enable participation even with few computational resources, we constrain the base models to the Qwen 3.5 family with 2B parameters:

2B: huggingface.co/Qwen/Qwen3.5-2B

You are also permitted to use any of the quantized versions or unsloth’d versions found here (provided they are 2B or less):

Tasks

Machine Translation

Ukrainian

As in the previous edition, the MT task for Ukrainian will be focused on the English→Ukrainian and Czech→Ukrainian directions. The development data is identical to last year’s: the compilation of WMT datasets from 2022-2024 editions.

Upper and Lower Sorbian

This year, the MT task will focus on three language pairs for both translation directions: German ↔ Upper Sorbian, German ↔ Lower Sorbian and Upper Sorbian ↔ Lower Sorbian. All three language pairs were considered in the previous editions of WMT Shared Tasks on Unsupervised MT and Very Low Resource Supervised MT.

Here are their respective links where you will have additional contextual information and resources:

Since some links may be broken, you can find the previously published data in this repository: - github.com/mariondimarco/WMT22_UnsupVeryLowResMT_Data/tree/main .

Additionally, datasets released in the previous edition (WMT2025) are also allowed, including monolingual corpora for both Sorbian languages (the Leipzig Corpora Collection (Goldhahn et al., 2012)).

We provide new corpora for this Shared Task edition thanks to the Witaj-Sprachzentrum for all three language pairs: Upper Sorbian–German, Lower Sorbian–German, and Upper Sorbian–Lower Sorbian. We also release monolingual corpora for both Sorbian languages. More details are accessible at: github.com/TUM-NLP/llms-limited-resources2026/tree/main/Sorbian/MT.

As the two Sorbian languages are from the West Slavic language family, Czech (for Upper Sorbian) and Polish (Lower Sorbian) are two closely related, better-resourced languages. Both Czech and Polish have been languages in previous MT Shared Tasks.

Question Answering

Ukrainian

For Ukrainian, we use the same dataset as in the previous edition (from the UNLP2024 Share Task). More details in last year’s edition.

The questions are taken from the Ukrainian External Independent Evaluation (called ЗНО/ZNO in Ukrainian) from various subjects: language, literature, history, geography, and other general knowledge. The training and development data is compiled from UNLP2024 Shared Task: huggingface.co/datasets/osyvokon/zno

In this edition, we also add another existing dataset to our evaluation and include the Massive Multitask Language Understanding (MMLU) dataset for Ukrainian: huggingface.co/datasets/INSAIT-Institute/mmlu_ukr .

Upper and Lower Sorbian

For both Sorbian languages, we consider the language certificate questions from last year’s edition. More details on the datasets are available in last year’s edition.

Spell Checking 🆕

The goal of the task is to identify a spelling mistake in a sentence (e.g., a typo). Each sentence can have up to two mistakes in one word. There can be no mistake in the sentence, in which case the model should leave the sentence as it is. If there is a mistake, the word should be identified and the correct form given.

The task will follow a standardised format for all three languages, as follows:

Input sentence: 30.000 opozicionelnych bcuhu zajeći a do lěhwow dowjezeni, statysacy ćeknychu do wukraja.

Wrong word (detection): bcuhu

Correct word (correction): buchu

We provide a development dataset based on sentences from the MT development dataset of the WMT2025 edition. The test sentence will be of equal size but from different sources.

Grammar Checking 🆕

The goal of this task is to identify a grammatical mistake in a sentence (e.g., wrong case or number). Each sentence can have up to one mistake in a word. There can be no mistake in the sentence, in which case the model should leave the sentence as it is. If there is a mistake, the word should be identified and the correct form given.

The task will follow a standardised format for all three languages, as follows:

Input sentence: Ze synowym powjazom wuwjazana kruwje ležeše při tym na dworje.

Wrong word: (detection): kruwje

Correct word: (correction): kruwa

We will provide a development dataset based on sentences from the MT development dataset of the WMT2025 edition. The test sentence will be of equal size but from different sources.

Maths Reasoning 🆕

This task aims to assess the LLM capability in solving maths problems of two difficulty levels: low and medium. Our evaluation dataset is a translated and manually verified version of the Qwen PolyMath benchmark (huggingface.co/datasets/Qwen/PolyMath).

During the development phase, we will provide 12 questions for the low and medium difficulty levels. The test phase will be on the full dataset at both levels.

For fair evaluation of the models’ performance in our shared task, we kindly ask participants to avoid using the original or a translation of the original PolyMath benchmark (for training or inference).

Evaluation Methods

We will use chrF++ to evaluate machine translation. For consistency with the previous WMT 2025 Shared Task, we also report BLEU for MT.

For the question answering and maths reasoning tasks, we use the standard accuracy.

Finally, for spell checking and grammar checking, we use the F1-score to assess both detection (finding the incorrect word) and correction (outputting the correct word).

The final ranking in the leaderboard will consider the scores from all five tasks equally.

Important Dates

All dates are Anywhere-on-Earth (AoE).

Registration Open (via Google Group)

30/04/2026

Finalised task details

30/04/2026

Release of training and development data

beginning of May 2026

Release of test data

end of June (29/06/2026?)

Outputs submission deadline

beginning of July

System description paper submission

TBD

Camera-ready due

check with MT

MT workshop co-located with EMNLP2026

check with MT

Contact/Organisers

Main contact: join our Google group

groups.google.com/g/llms-with-limited-resources-2026

All names are sorted in alphabetical order.

TUM NLP

Daryna Dementieva
Marion Di Marco
Lukas Edman
Alexander Fraser
Kathy Hämmerl
Shu Okabe

Witaj-Sprachzentrum (for both Upper and Lower Sorbian)

Beate Brězan
Anita Hendrichowa
Marko Měškank
Tomaš Šołta (language certificate)

Acknowledgements

We express our deepest gratitude to the Ukrainian Shared Task 2024 team (Roman Kyslyi, Mariana Romanyshyn, Oleksiy Syvokon) for kindly sharing the Ukrainian QA resources. Please, acknowledge their work by citing the following paper:

Mariana Romanyshyn, Oleksiy Syvokon, and Roman Kyslyi. 2024. The UNLP 2024 Shared Task on Fine-Tuning Large Language Models for Ukrainian. In Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024, pages 67–74, Torino, Italia. ELRA and ICCL.

We also thank the INSAIT-institute team — especially, Hanna Yukhymenko — that works on technologies for underrepresented languages, including Ukrainian. Please cite the following work that introduced MMLU_UKR:

Hanna Yukhymenko, Anton Alexandrov, and Martin Vechev. Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets. In ACL 2026.

This work was partly funded by the European Union (ERC, EPICAL, 101141712). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them.

ELEVENTH CONFERENCE ON MACHINE TRANSLATION (WMT26)