LLMs with Limited Resources for Slavic Languages: MT and QA

EMNLP 2025

TENTH CONFERENCE ON
MACHINE TRANSLATION (WMT25)

November 8-9, 2025
Suzhou, China

HOME •︎ PROGRAM •︎ PAPERS •︎ AUTHORS

TRANSLATION TASKS:	GENERAL MT (NEWS) •︎ INDIC MT •︎ TERMINOLOGY •︎ CREOLE MT •︎ MODEL COMPRESSION
EVALUATION TASKS:	MT TEST SUITES •︎ (UNIFIED) MT EVALUATION
OTHER TASKS:	OPEN DATA
MULTILINGUAL TASKS:	MULTILINGUAL INSTRUCTION •︎ LIMITED RESOURCES SLAVIC LLM

Announcements

2025-07-28 Results: each team’s primary submissions
2025-07-22 Extended submission deadline
2025-07-18 Outputs submission deadline
2025-06-30 Release of the test data
2025-05-13 Release of new Sorbian data
2025-04-02 Shared task announced
2025-04-02 Registration open via Google Group

Overview

We present a shared task to train LLMs under limited data and compute resources for three Slavic languages: Ukrainian (uk), Upper Sorbian (hsb) and Lower Sorbian (dsb).

The objective of this Shared Task is to develop and improve LLMs for these languages. We consider two tasks that are to be evaluated jointly: Machine Translation (MT) and Multiple-Choice Question Answering (QA).

Ukrainian has roughly 40 million first language (L1) speakers spread all over the world and is a mid-resource language in NLP.

Upper and Lower Sorbian are very low-resource, Slavic minority languages, spoken in the Eastern part of Germany with 30k and 7k L1 speakers, respectively.

In this task, we aim to test and improve the performance of LLMs on these languages.

Task Description

Models will be tested jointly on two tasks: machine translation and multiple-choice question answering. Our main goal is to observe the synergy between the two different tasks in LLM: How does training for MT impact performance on a secondary task, here QA? Is it possible to improve machine translation while keeping question-answering capabilities stable?

We set this shared task in a restricted context with limited resources: the base LLM is fixed to the Qwen 2.5 family and a maximum of 3B parameters.

For Machine Translation, we focus on the following directions, which are currently favoured by the respective communities:

English to Ukrainian (en→uk; subset of the general MT test set)
Czech to Ukrainian (cs→uk; subset of the general MT test set)
German to Upper Sorbian (de→hsb)
German to Lower Sorbian (de→dsb)

For Question Answering, we selected multiple choice datasets from education and language certification:

For Ukrainian QA, we will use multiple-choice exam questions from the UNLP 2024 Shared Task on LLM Instruction-Tuning for Ukrainian which is compiled from school graduation examinations on various subjects: language, literature, history, and other general topics.
For Upper and Lower Sorbian QA, we base our evaluation on the actual language certificate, which follows the CEFR scheme. We will test the models on questions from A1 up to C1.

Submissions Tracks

Submissions are language-specific; submit to one or multiple language tracks

Ukrainian MT & QA (translating both en→uk and cs→uk)
Upper Sorbian MT & QA
Lower Sorbian MT & QA

You MAY NOT submit outputs only for MT or only for QA. Only submissions that follow this rule will count towards the final leaderboard.

The submissions for the MT and QA tasks must be generated from the same model per language.

In order to enable participation even with few computational resources, we constrain the base models to a maximum of 3B parameters. Base models are from the Qwen 2.5 family:

You are also permitted to use any of the quantized versions or unsloth’d versions found here (provided they are 3B or less):

Training Data

We provide the following datasets:

Upper and Lower Sorbian Machine Translation Data

This year, the MT task will focus on two translation directions: German→Upper Sorbian and German→Lower Sorbian. Both language pairs were considered in the previous editions of WMT Shared Tasks on Unsupervised MT and Very Low Resource Supervised MT.

For Upper and Lower Sorbian-German Machine Translation, we provide the data from the WMT2022 Unsupervised MT and Very Low Resource Supervised MT Shared Task for both languages.

The WMT22 datasets (including dev and validation sets) can be found here: Upper and Lower Sorbian MT data (parallel and monolingual data)

In addition, we provide the following new datasets thanks to the Witaj-Sprachzentrum:

HSB and DE-HSB parallel datasets: github.com/TUM-NLP/llms-limited-resources2025/tree/main/Sorbian/hsb/MT
DSB and DE-DSB parallel datasets: github.com/TUM-NLP/llms-limited-resources2025/tree/main/Sorbian/dsb/MT

More details on the datasets are available on: github.com/TUM-NLP/llms-limited-resources2025/

The Leipzig Corpora Collection (Goldhahn et al., 2012) contains monolingual corpora for both Sorbian languages:

Upper Sorbian news (1999): wortschatz-leipzig.de/en/download/Upper%20Sorbian#hsb_news_1999
Upper Sorbian mixed (2012): wortschatz-leipzig.de/en/download/Upper%20Sorbian#hsb_mixed_2012
Upper Sorbian Wikipedia (2021): corpora.uni-leipzig.de/en?corpusId=hsb_wikipedia_2021 (may contain other languages and other noise)
Lower Sorbian Wikipedia (2021): corpora.uni-leipzig.de/en?corpusId=dsb_wikipedia_2021 (may contain other languages and other noise)

As the two Sorbian languages are from the West Slavic language family, Czech (for Upper Sorbian) and Polish (Lower Sorbian) are two closely related, better-resourced languages. Both Czech and Polish have been languages in previous MT Shared Tasks; cs→de is one of the language pairs in this year’s general MT task.

Upper and Lower Sorbian QA Dataset

The Witaj-Sprachzentrum provides language certificates for both Upper and Lower Sorbian, from the A1 to C1 levels, according to the CEFR (Common European Framework of Reference for Languages) scheme. We will use a mix of questions from all five levels (A1, A2, B1, B2, and C1, from beginner to advanced) for our task.

These language certificates assess a candidate’s language proficiency according to four pillars: listening comprehension, reading comprehension, written expression, and oral expression. For this Shared Task, we will use questions from the reading and listening parts using reference transcription of the audio material for the latter.

For each language level, there are different formats of exercises; this means that the question types in A1 are different from those in C1, for instance. If the beginner levels have true or false questions regarding a small text, the advanced exercises consist of multiple-choice questions with longer texts and statements.

For QA, the datasets are available at the following links:

Ukrainian Machine Translation Data

Machine Translation: The MT task for Ukrainian will be focused on the English→Ukrainian and Czech→Ukrainian directions. The development data is the compilation of WMT datasets from 2022-2024 editions. For training, we recommend you refer to the WMT2025 main track setup with the details on various datasets useful for WMT systems fine-tuning.

Ukrainian QA Dataset

Question Answering: The questions are taken from the Ukrainian External Independent Evaluation (called ЗНО/ZNO in Ukrainian) from various subjects: language, literature, history, geography, and other general knowledge. The training and development data is compiled from UNLP2024 Share Task: huggingface.co/datasets/osyvokon/zno

Additional Datasets

External datasets can be used on top of the provided corpora. For fairness and reproducibility, they should, however, be publicly available.

Test Data

For the test sets, please check our GitHub repo: github.com/TUM-NLP/llms-limited-resources2025

The outputs submission deadline will be July 18.

Details on how to submit will follow soon.

Evaluation Methods

We will use chrF++ to evaluate machine translation and accuracy to evaluate question answering.

The final ranking in the leaderboard will consider the scores from MT and QA equally.

For consistency with the previous WMT 2022 Shared Task, we also report BLEU for MT.

For information purposes, we provide the baseline results of Qwen2.5-3B-Instruct on all the development datasets (MT & QA) in the GitHub repository. We additionally provide the following repository to help with the evaluation: github.com/TUM-NLP/wmt25-lrsl-evaluation . It is a fork of lm-evaluation-harness and can be used to reproduce the baseline results and evaluate other models. More details are available on the GitHub repository.

Submission Procedures

Output Format

The output format for the Shared Task is a standardised JSONL file across languages and tasks, with four fields per instance:

For MT, each output should contain: dataset_id ('wmtslavicllm2025_{lang_pair}'), sent_id, source, and pred
For QA, each output should contain: dataset_id ('wmtslavicllm2025_qa_{lang}'), question_id, question, and pred

Below is the description of the four fields:

dataset_id: this field is needed to assign your submission to the right track (i.e., language+task). It consists of ‘wmtslavicllm2025_’ with the language pair for MT (de-{hsb|dsb} or {cs|en}-uk) and the language for QA (qa_{hsb|dsb|uk})
sent_id or question_id: this is a unique ID per instance. In some datasets, it is already present (Upper and Lower Sorbian QA). For the others, it is simply an ascending ID
source or question: this comes from the input file with the source sentence (for MT) or the question (for QA) to check the correspondence between inputs and outputs
pred: the output from your system (string or integer for Sorbian QA)

Moreover, please note that the Upper and Lower Sorbian QA outputs should be concatenated into one file in order of difficulty (A1, A2, B1, B2, C1).

To make the output format conversion easier, we provide two resources. First, dummy outputs are present in the following folder of our GitHub repository: github.com/TUM-NLP/llms-limited-resources2025/tree/main/dummy_submission . Second, if you are using our fork of lm-evaluation-harness (see above), the conversion script has been updated to the output format: github.com/TUM-NLP/wmt25-lrsl-evaluation/blob/main/testphase-eval/convert_output_formats.py .

Summary of the modifications to make in the output:

Ukrainian MT: changing the dataset_id for our shared task (wmtslavicllm2025_{cs-uk|en-uk}), adding a simple sent_id (e.g., cs-uk-XXXXX), changing the ‘src_text’ field name into ‘source’
Ukrainian QA: adding the dataset_id (wmtslavicllm2025_qa_ukr) and question_id (e.g., question-XXXX); the prediction should be a string (not a list)
Upper and Lower Sorbian MT: adding the dataset_id (wmtslavicllm2025_de-{hsb|dsb}) and sent_id (e.g., de-hsb-XXXXX)
Upper and Lower Sorbian QA: adding the dataset_id (wmtslavicllm2025_qa_{hsb|dsb}), concatenating all QA files in increasing order of difficulty (A1, A2, B1, B2, C1)

For a full participation in our Shared Task, there will be seven files: 3 files for the Ukrainian track (CS-UK, EN-UK MT & UK QA), 2 files each for Upper and Lower Sorbian (DE-DSB|HSB MT & DSB|HSB QA).

Submission platform

Thanks to the main Shared Task organisers, the submissions for our Shared Task can also be handled by the OCELoT platform: ocelot-wmt.azurewebsites.net/

Instructions:

After selecting our Shared Task, please register your team (yellow button). You need a team name and an email. You will then receive an email with a unique token to use (akin to a password).
Output files can be uploaded using the ‘create submission’ button (green button). Please select the corresponding test file (i.e., task + language) for your submission. You can choose whether this is your primary submission or not (it can be changed later).
Each output file must be submitted separately. Please remember that a submission is valid for us only when both MT and QA outputs are uploaded.
Once you have submitted your files, you will see a publication details section to fill in (with the institution name, the system name, and a small description of the system). The platform also needs a short paragraph describing your system, which we will use for our findings article. Please detail the models, datasets, and main techniques that you relied on.

To check whether your submissions are correctly taken into account by the system, OCELoT displays some automatic metrics: BLEU and chrF for MT and accuracy for QA (except for CS-UK MT). Please note that these leaderboards are not the final ones; we will provide the final ranking shortly after the submissions phase is closed.

For better reproducibility, we highly recommend providing a link to your model in the system description paragraph by uploading it to HuggingFace (and mentioning external public datasets, when used).

Shared Task Results

We show the results for each team’s primary submissions: github.com/TUM-NLP/llms-limited-resources2025/blob/main/results.md

System description details

The paper submission process is managed on SoftConf: softconf.com/emnlp2025/wmt2025/

Please follow the general paper guidelines from the conference: www2.statmt.org/wmt25/index.html#_paper_submission_information

Important Dates

Finalized task details

end of April, 2025

Release of training data for shared tasks

end of April, 2025

Release of test data

30th June 2025

Outputs submission deadline

22nd July 2025

System description paper submission

14th August 2025

WMT camera-ready deadline

25th September 2025

All deadlines are in AoE (Anywhere on Earth). Dates are specified with respect to EMNLP 2025.

Contact/Organisers

Main contact: join our google group

TUM Heilbronn:

Daryna Dementieva
Lukas Edman
Alexander Fraser
Kathy Hämmerl
Marion Di Marco
Shu Okabe

Witaj-Sprachzentrum (for both Upper and Lower Sorbian):

Beate Brězan
Anita Hendrichowa
Marko Měškank
Tomaš Šołta (language certificate)

Acknowledgements

We thank the UNLP 2024 Shared Task 2024 team

Roman Kyslyi
Mariana Romanyshyn
Oleksiy Syvokon

for kindly sharing the Ukrainian QA resources. Please, acknowledge their work by citing the following paper:

Mariana Romanyshyn, Oleksiy Syvokon, and Roman Kyslyi. 2024. The UNLP 2024 Shared Task on Fine-Tuning Large Language Models for Ukrainian. In Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024, pages 67–74, Torino, Italia. ELRA and ICCL.

TENTH CONFERENCE ON MACHINE TRANSLATION (WMT25)