Announcements
-
January 2026: Terminology shared task under preparation, will take place with WMT 2026
-
June 2026: We have published the main information about the shared task! Fill in our pre-registration form, and we’ll remind you of the important dates!
More information will be announced soon.
Important Dates
All dates are end of Anywhere on Earth (AoE).
Pre-registration form released |
1th June 2026 |
Data snippets and evaluation measures released |
5th June 2026 |
Task details finalized |
15th June 2026 |
Test data released and task starts |
25th June 2026 |
Validation code released, submission form opened |
25th June 2026 |
Submission deadline |
20th July 2026 |
Paper submission to WMT25 |
in-line with WMT26 |
Camera-ready submission to WMT25 |
in-line with WMT26 |
Conference in Suzhou, China |
28-29 October 2026 |
Overview
While general-purpose MT systems have been showing nearly-human performance at least for high-resourced languages, translation of the terminology-heavy texts (science, technology, legal texts) is still far from saturation. The results of the last year’s shared task have shown that, in particular, it is hard to find a trade-off between the general MT quality and accurate terminology translation; additionally, while the systems show nearly-perfect performance at the sentence-level, the document-level terminology-aware MT leaves much to be desired. To assess the progress in the field of the terminology-aware machine translation, we organize the fourth Terminology Shared Task, that will take place at the WMT26. Compared to the previous three iterations (in 2021, 2023 and 2025), this year’s competition will finally expand to lower-resourced languages with rich morphology (including Polish and Basque). For continuity with last year’s shared task, one of the competition tracks follows the same setup (document-level MT with an explicit dictionary); at the same time, we replaced the simpler sentence-level MT task with a more challenging "terminology retrieval + MT" track.
Task Description
We are arranging two tracks that differ by the formats of the input data and the number of necessary steps required from the participants.
Track №1: Document-Level Translation with Explicit Dictionary
You will be provided with chunks of input text (each chunk corresponds to approx. 2000 words, i.e., a short document), and terminology and the dictionaries correspond to the whole set of input texts (i.e. they are corpus-level). The data snippet is shown below. Your system is expected to work both given explicit terminology from the dictionary and given no terminology. You are expected to run your system in three modes: given no terminology, given proper terminology, and given random terminology (see the explanation below).
Data Snippet:
[
{
"en": "1.0 Primary Actuation and Structural Support The powertrain regulates the foundational kinetic element of the entire air-fuel exchange mechanism with a camshaft.\nThe geometry of the camshaft governs torque delivery, peak horsepower, and idle stability by strictly controlling valve events.\nEngineers extract precise valve lifts from an eccentric cam, which is machined directly into the camshaft at meticulously calculated intervals...",
"pl": "1.1 Główne uruchomienie i wsparcie strukturalne Układ napędowy kontroluje podstawowy element kinetyczny w całym mechanizmie wymiany mieszanki powietrzno- paliwowej za pomocą wału rozrządu.\nGeometria wału rozrządu wpływa na sposób przekazania momentu obrotowego, maksymalną moc silnika oraz stabilność pracy na biegu jałowym poprzez dokładne sterowanie pracą zaworów.\nInżynierowie uzyskują dokładne uniesienie zaworów dzięki mimośrodowej krzywce, która jest skrawana bezpośrednio w wale rozrządu w dokładnie obliczonych odstępach...",
"proper_terms": [
{"en": "camshaft", "pl": "wał rozrządu"}
{"en": "cam", "pl_base": "krzywka", "pl_inflected": "krzywce"},
...
],
"random_terms": [
{"en": "entire", "pl": "cały"},
{"en": "by", "pl": "poprzez"},
{"en": "Structural Support", "pl": "wsparcie strukturalne"},
...
]
},
]
Track №2: Document-Level Term Extraction + Translation
In this track, you will not be provided with the explicit term dictionary. Instead, you will be provided with two sets of sentences: the first group of sentences will be the "seed" bitexts from which your system will be supposed to extract relevant terms; while for the second group of sentences, you will be supposed to use the terms extracted from the first group.
Data Snippet - Term Extraction:
[
{
"en": "1.0 Primary Actuation and Structural Support The powertrain regulates the foundational kinetic element of the entire air-fuel exchange mechanism with a camshaft.",
"pl": "1.1 Główne uruchomienie i wsparcie strukturalne Układ napędowy kontroluje podstawowy element kinetyczny w całym mechanizmie wymiany mieszanki powietrzno- paliwowej za pomocą wału rozrządu."
},
...
]
Data Snippet - Term-Aware Translation:
NB: the "extracted_terms" dictionary is created by the participants (it’s an output from the first step).
[
{
"en": "The geometry of the camshaft governs torque delivery, peak horsepower, and idle stability by strictly controlling valve events.\nEngineers extract precise valve lifts from an eccentric cam, which is machined directly into the camshaft at meticulously calculated intervals...",
"pl": "Geometria wału rozrządu wpływa na sposób przekazania momentu obrotowego, maksymalną moc silnika oraz stabilność pracy na biegu jałowym poprzez dokładne sterowanie pracą zaworów.\nInżynierowie uzyskują dokładne uniesienie zaworów dzięki mimośrodowej krzywce, która jest skrawana bezpośrednio w wale rozrządu w dokładnie obliczonych odstępach...",
"extracted_terms": [
{"en": "camshaft", "pl": "wał rozrządu"}
{"en": "cam", "pl_base": "krzywka", "pl_inflected": "krzywce"},
...
]
},
...
]
Data Description
We tried to align the data between the two tracks as much as possible.
Language Pairs
The language pairs are the same for both tracks:
-
es-eu (Spanish → Basque)
-
en-pl (English → Polish)
-
en-zh-Hant (English → Traditional Chinese)
-
zh-Hant-en (Traditional Chinese → English)
Domains
Domain sample varies for the two tracks, but the general population of the domains is as follows:
-
es-eu: Engineering and Technology
-
en-pl: Engineering and Technology, Medicine
-
en-zh-Hant, zh-Hant-en: Finance
Evaluation
Terminology Modes
To estimate the causal effect of the proper terminology, we distinguish between three modes of translation of the terminology-heavy texts:
-
No terminology: the system is only provided with input sentences/documents.
-
Proper terminology: the system is provided with input texts (same as 1.) and dictionaries of the format {source_term: target_term}.
-
Random terminology: the system is provided with input texts and translation dictionaries of the same format as in 2. The difference is that the dictionary items are not special terms but words randomly drawn from input texts. This mode is of special interest since we want to measure to what extent the proper term translations help to improve the system performance (2.), as opposed to an arbitrary broader input that does not contain the domain-specific terminology.
For Track 1, you will be provided with both proper and random terminology dictionaries in the JSON files. Thus, for mode 1, you need to ignore them, and for modes 2 and 3, you need to use the corresponding dictionary. For Track 2, you are not provided with the terminology dictionaries, therefore you only compare modes 1 and 2.
Metrics
The submissions will be evaluated based on:
-
Overall Translation Quality: we will evaluate the general aspects of machine translation outputs such as fluency (incl. grammaticality of terms) and adequacy. This includes two aspects of evaluation:
-
general translation quality: We will do that with the general MT automatic metrics such as BLEU or COMET.
-
grammaticality of the term usage in context: since the target languages in our samples have rich morphology, we will assess how grammatically coherent are the occurrences of their translations in the texts.
-
-
Terminology-Oriented Metrics: This group of metrics assesses the ability of the system to accurately translate technical terms given the specialized vocabulary. We will assess two aspects of it:
-
terminology success rate measures the percentage of the correct term translations in the target texts. This will be carried out by comparing the occurrences of the correct term translations (i.e. the ones present in the dictionary) to the output terms. The goal is to have a higher success rate that will show adherence to dictionary translations.
-
Terminology Consistency: for domains such as science or legal texts, the consistent use of an introduced term throughout the text is crucial. In other words, we want a system to not only pick up a correct term in a target language, but to use it consistently once it is chosen. This will be evaluated by comparing all translations of a given source term in a text and measuring the percentage of deviations from the most consistent translation.
-
Usage of the two groups of metrics makes the comparison multidimensional. To minimize the dimensionality, we are planning to use the Pareto optimal between the Overall Translation Quality and Terminology Success Rate. Therefore, the solutions which will end up at the frontier will be considered optimal.
Participation
As terminology translation is a highly applicable task, we encourage participation from both academic researchers and industrial practitioners. You may choose to participate in any translation direction with any modeling approach as you prefer, including but not limited to:
-
lexical constrained decoding,
-
large language model fine-tuning and/or prompting,
-
translation editing and refinement,
-
multi-agent approaches.
Participants will have the option to publish a system description paper (4 to 6 pages) at WMT. In this case, the participants are expected to submit their system descriptions according to the WMT guidelines (see the main page).
Otherwise, we kindly ask you to provide a brief description of your approach with your test submission.
Data
-
there is no dev data for this shared task.
-
the test data (for both tracks) will be published here soon.
-
the validation code will be published here soon.
Submission Guidelines
0. Please notify us about your participation prior to submission (optional)
This is not a required action, but we’d appreciate a lot if you contact us once your team decides to participate in the competition. With this, we will have a better understanding of our workload after submission. Also, we will be able to send you a gentle reminder before the deadline. The easiest way to do that is through our short Google Form; but you can also contact organizers via email.
1. Check your submission files with the validation script
The validation script will be published soon. The systems with the outputs that are not compatible with the validation scripts will be desk-rejected.
2. Write a description of your system (optional) First, in the submission form (see 3) you will be required to provide a short text description of your system (4-6 sentences).
Additionally, we’d appreciate more detailed descriptions of your systems. You have several options for it:
-
If you are already submitting long papers about your system for WMT, please mention your submission name; the organizers will provide us with it.
-
If you have already published something about the system previously, feel free to attach the links to such documents, you do not have to tailor your detailed description to our shared task specifically.
-
If you wish to submit a description on its own AND you want it to be published in the WMT proceedings, you are invited to submit a short system description paper (4 to 6 pages) to WMT describing your system. Please submit it according to the guidelines of the main conference (i.e., with respect to all deadlines, formats, etc.), as we are NOT responsible for handling the publications.
-
If you do not wish to publish your system details but still have something to say about your system, you are very welcome to attach the PDF in a free format. We will carefully consider it for our analysis, but we will NOT publish it in WMT proceedings (see the point above).
3. Submit your system via Google Forms
The Google form with all necessary sumbission details will published soon.
Organizers (in the alphabetical order)
-
Nathaniel Berger (Amazon)
-
Adrian Charkiewicz (Laniqo & Adam Mickiewicz University in Poznań)
-
Pinzhen Chen (Queen’s University Belfast & Aveni.ai)
-
Thierry Etchegoyhen (Vicomtech)
-
Harritxu Gete Ugarte (Vicomtech)
-
Kamil Guttmann (Laniqo & Adam Mickiewicz University in Poznań)
-
Xu Huang (Nanjing University)
-
David Ponce (Vicomtech)
-
Artur Nowakowski (Laniqo & Adam Mickiewicz University in Poznań)
-
Frédéric Odermatt (44ai)
-
Arturo Oncevay (independent)
-
Kirill Semenov (University of Zurich), main contact:
firstname.lastname@uzh.ch -
Dawei Zhu (Amazon)
-
Vilém Zouhar (ETH Zurich)
F.A.Q.
1. Do I have to submit results for all language pairs?
No, but we highly encourage it as it makes comparisons fairer. If you already have a system for a specific language pair, replicating it (without optimizing hyperparameters) for another language pair should be very easy and yield a good demonstration of your method. Also, since terminology consistency is an under-studied field, we are especially interested in understanding the replicability of the systems’ performance given the languages from the different families and writing systems. Thus, by submitting results for all language pairs, you have a chance to say a new word in the general understanding of terminology-assisted translation!
2. Do I have to register for the competition or follow any other resource?
You should follow only this website, all updates (such as data releases) will be noted on the top of the webpage. For your (and our) convenience, we kindly ask you to fill in the pre-registration form: with it, we’d be able to notify you about important dates. At the publication of the test data, we will also attach the Google Form for registration in the competition.
3. I cannot see the dev set for this shared task.
Unfortunately, we are unable to publish the dev part of the dataset. You will only be able to see the test set soon.
4. Can you share the code or the exact guidelines on how to compute the metrics for the submitted files?
We are in the process of updating the metrics compared to the last year (for example, we are re-examining the Terminology Success Rate computation and developing the metric for term grammaticality evaluation). However, you can look at the metric implementations from the last year.
5. Do we have constraints on data or models that we can use for our submitted systems?
No, contrary to the General MT task, we do not impose restrictions on the usage of particular resources or instruments within our task. However, for the sake of comparability and interpretation of systems, we’d appreciate if you explicitly note the data that you used for your submission.