ANNOUNCEMENTS
-
February 2026 - The shared task is announced
-
April 2026 - Adjustments on post-editing policies and quality estimation data
-
August 2026 - Deadline for submitting a paper
DEADLINES
Paper and data submission deadline |
August |
Notification of acceptance |
September |
Camera-ready deadline |
September (follows WMT/EMNLP) |
Conference |
November (follows WMT/EMNLP) |
All deadlines are in AoE (Anywhere on Earth).
INTRODUCTION
The Open Language Data Initiative (OLDI) aims to empower language communities to contribute to key datasets. These datasets are essential for expanding the reach of language technology to more language varieties.
Progress made in translation quality has largely been directed at high-resource languages. Recently, focus has started to shift to under-served languages, and foundational datasets such as FLORES and NTREX have made it easier to develop and evaluate machine translation (MT) models for an increasing amount of languages. The high impact of these components left some in the research community wondering: how do we add more languages to these existing open-source datasets?
Machine translation increasingly relies on automatic evaluation of quality (example uses include filtering of training data, ranking multiple translation candidates, reinforcement learning, and benchmarking MT models). Automatic MT evaluation methods (especially reference-free ones) are data-hungry, but human-annotated datasets of translation quality exist only for a limited number of languages — and we do not know how well existing quality estimation methods generalize to new languages.
GOALS
The primary goal of this shared task is to expand OLDI’s open datasets to more languages. We are soliciting contributions to the following:
Contributions may consist of either the addition of entirely new languages, varieties or dialects to the above datasets, substantial improvements to existing datasets, or creation of entirely new massively multilingual open translation datasets.
Additionally, to encourage creation of machine translation evaluation datasets in new languages, we are accepting contributions of new or extended datasets of translation quality annotations covering one or more under-served languages. Such datasets include source texts, their machine or human translations, and human judgements of quality of these translations.
To describe and publicise their contributions, task participants will be asked to submit a 4-6 page paper to be presented at the WMT 2026 conference.
TASK DESCRIPTION
To help us to gauge interest and co-ordinate efforts, we ask prospective participants to email the organisers <info@oldi.org>.
FLORES+ and Seed contribution guidelines
-
Workflow: Contributing a new language to FLORES+ and Seed typically involves starting from the original English data and translating it. Starting from a different language is also possible, but this choice should be clearly documented. Translations should be performed, wherever possible, by qualified, native speakers of the target language. We strongly encourage verification of the data by at least one additional native speaker.
-
Dataset card: dataset cards should be attached to new data submissions, detailing precise language information and the translation workflow that was employed. In particular, we ask participants to identify the language with both an ISO 639-3 individual language tag and a Glottocode. The script should be identified with an ISO 15924 script code.
-
Use of MT:
-
The FLORES+ dataset is used to evaluate MT systems. For this reason, new contributions require human translation. Using or even referencing machine translation output is generally forbidden, and this includes post-editing. In some exceptional cases, we may allow submitting post-edited machine translations, but all such cases should be discussed with us in advance (using email or our Discord chat).
-
For Seed data, the use of post-edited machine translated content is allowed, as long as all data is manually verified. Raw, unverified machine translated outputs are not allowed. If using MT, you must ensure that the terms of service of the model you use allow re-using its outputs to train other machine translation models (as an example, popular commercial systems such as DeepL, Google Translate and ChatGPT disallow this).
-
In case of post-editing MT outputs, we encouage the participants to release the traces of this process, such as the original machine translations and human judgements of their quality (as it was done e.g. with ACReFOSC, a French translation of the Seed dataset). Such data is valuable for preference optimization and automatic post-editing.
-
-
Data validation: Participants are strongly encouraged to provide experimental validation of the quality of the data they are submitting. For Seed data contributions, where applicable, this may include training a simple MT model and evaluating it on FLORES+.
-
License: As both FLORES+ and Seed are open datasets released under CC BY-SA 4.0, new contributions must also be released under this same license. By contributing data to this shared task, participants agree to have this data released under these terms.
For further information, please consult the OLDI translation guidelines.
Other massively parallel datasets
We also accept extensions and improvements to other foundational multilingual datasets, e.g. WMT24+, SMOL, BOUQuET. In keeping with the aims of OLDI, these datasets should be massively parallel, open source, and useful to under-served language communities. The contribution workflow should follow that for FLORES and Seed as closely as possible to ensure data quality and documentation. If you want to make this kind of contribution, please contact the organizers to discuss.
Datasets of translation quality evaluation
-
Languages: For the translation quality annotation data track, we accept extensions or new datasets that include languages previously underrepresented in such data.
-
Openness: The source texts and human quality annotations should be opensource and, importantly, the translation systems involved should allow reusing their outputs for developing other systems.
-
Parallelism: The "multiway parallel data" requirement is not mandatory for this track, as the main phenomenon of interest, the variations in translation quality, is going to be language-specific anyway.
-
Annotation: The protocols for human annotation can be based on DA (direct assessment), MQM (multidimentional quality metrics), ESA (error span annotation), XSTS (cross-lingual semantic text similarity), or another methodology. This methodogoly should be clearly documented and, if the contribution is extending an exising dataset, the annotation protocol should be consistent with its original approach.
As a source of inspiration, we suggest exploring such datasets as SiniticMTError (translation into 4 Cinitic languages), IndicMT-Eval (translation into 5 Indic languages), AfriMTE (13 translation directions with African languages), or Met-BOUQuET (159 translation directions).
Note that unlike the "Challenge Set" subtask of the Evaluation shared task, we require including under-served languages but do not require the contributed datasets to necessarily "break" the existing evaluation systems.
Submission format
Submissions consist of two main components, both of which should be submitted by early September 2026 (exact deadline to be confirmed):
-
A dataset accompanied by its corresponding data card, following the guidelines above and the in-depth instructions on the OLDI website. These should be shared with the organisers via email at <info@oldi.org>.
-
A system paper, prepared and submitted according to the WMT instructions.
Papers should cover the following topics:
-
Language overview. Provide background information on the language, highlighting dialectal and spelling variations, as well as the specific variety used in the submission. If applicable, mention any particular attributes which need special consideration when developing NLP applications e.g. morphology, commonly confused languages, writing system(s). For under-resourced languages, include a review of existing datasets, reference materials, and other relevant resources.
-
Data collection. Offer an in-depth description of the data acquisition process. For example, when submitting a translated dataset, provide details such as the source language, number of translators, their expertise (native speakers, proficiency, professional experience), and whether any portion was independently reviewed by third parties.
-
Experimental validation. Provide experimental validation of the submitted data’s quality. For a seed translation dataset, this might involve training a translation model on new data and evaluating it on existing benchmark data, comparing it to pre-existing models or those trained with pre-existing data (where applicable). For benchmark data, this could include computing automatic metrics of data quality that are known to correlate with human judgements (e.g. applying machine translation quality estimation metrics to human translation outputs).
-
Data sample. Provide a short excerpt of the data available in the dataset to demonstrate its format and content. This should take up no more than half a page. If possible, provide a translation in English.
CONTACT
OLDI Organisers <info@oldi.org>
ORGANIZERS
-
Idris Abdulmumin (University of Pretoria)
-
Antonios Anastasopoulos (George Mason University)
-
Laurie Burchell (Common Crawl Foundation)
-
Isaac Caswell (Google)
-
David Dale (Meta FAIR)
-
Jean Maillard (Meta FAIR)
-
Philipp Koehn (Johns Hopkins University)
-
Skyler Wang (McGill University)