EMNLP 2026

ELEVENTH CONFERENCE ON
MACHINE TRANSLATION (WMT26)

November, 2026
Budapest, Hungary
HOME

TRANSLATION TASKS: GENERAL MT •︎ INDIC MT •︎ TERMINOLOGY •︎ VIDEO SUBTITLE TRANSLATION
EVALUATION TASKS: MT TEST SUITES •︎︎ AUTOMATED MT EVALUATION (tba)
OTHER TASKS: OPEN DATA •︎ MULTILINGUAL INSTRUCTION

This document lists the WMT26 General MT task datasets for the constrained track and explains how to download them using mtdata.

ANNOUNCEMENTS

  • 2026-04-05: Initial WMT26 constrained recipe release.

MTData

Setup

mtdata 0.5.0-dev is not yet released on PyPI, so please install it from the develop branch:

pip install "git+https://github.com/thammegowda/mtdata.git@develop"   # Python 3.10+

Recipes Config File

Config file for the constrained track:

wget https://www.statmt.org/wmt26/mtdata/mtdata.recipes.wmt26-constrained.yml

By default, mtdata loads files matching mtdata.recipes*.yml from the current working directory. If you prefer to keep the recipe file elsewhere, set:

export MTDATA_RECIPES=/path/to/recipesdir

List All Recipes

mtdata list-recipe -id | grep '^wmt26-'

Download Recipes

Download one recipe:

mtdata get-recipe -ri wmt26-eng-ces -o wmt26-eng-ces --compress --no-merge -j 8

Download all currently supported WMT26 recipes:

# optional: prefetch/cache datasets referenced by all WMT26 recipes
# this step parallelizes caching across all recipes and reduces total time
mtdata -no-pb cache -j 8 -ri "wmt26-*"

# materialize every supported WMT26 recipe into its own directory
for id in $(mtdata list-recipe -id | grep '^wmt26-'); do
  mtdata get-recipe -ri "$id" -o "$id" --compress --no-merge -j 8
done
  • mtdata stores its cache under $HOME/.mtdata by default. To change it, set export MTDATA=/path/to/cache.

  • If pigz is available in PATH, mtdata will use it for faster compression and decompression.

  • To inspect datasets before downloading, use mtdata list -id -l <lang1>-<lang2> for parallel data and mtdata list -id -l <lang> for monolingual data.

WMT26 Recipe IDs

  • wmt26-ces-ukr

  • wmt26-ces-deu

  • wmt26-jpn-zho

  • wmt26-eng-ara

  • wmt26-eng-zho

  • wmt26-eng-ces

  • wmt26-eng-est

  • wmt26-eng-isl

  • wmt26-eng-jpn

  • wmt26-eng-kor

  • wmt26-ces-vie

  • wmt26-eng-hye

  • wmt26-eng-bel

  • wmt26-eng-zho_TW

  • wmt26-eng-deu

  • wmt26-eng-ind

  • wmt26-eng-kaz

  • wmt26-eng-lld

  • wmt26-eng-lij_Latn

  • wmt26-eng-sme

  • wmt26-eng-tha

Constrained Task Datasets

The selected dataset IDs for the constrained track are as follows:

# Setup: pip install "git+https://github.com/thammegowda/mtdata.git@develop"
# To list all the available datasets, use the following commands
#   mtdata list -id -l <lang1>-<lang2>   # parallel
#   mtdata list -id -l <lang>            # monolingual
# To get a dataset
#   mtdata echo <data_id>

########## CES-UKR ##########
- id: wmt26-ces-ukr
  langs: ces-ukr
  train:
    - Facebook-wikimatrix-1-ces-ukr
    - ELRC-acts_ukrainian-1-ces-ukr
    - OPUS-ccmatrix-v1-ces-ukr
    - OPUS-elrc_5179_acts_ukrainian-v1-ces-ukr
    - OPUS-elrc_wikipedia_health-v1-ces-ukr
    - OPUS-eubookshop-v2-ces-ukr
    - OPUS-gnome-v1-ces-ukr
    - OPUS-kde4-v2-ces-ukr
    - OPUS-multiccaligned-v1.1-ces-ukr
    - OPUS-multiparacrawl-v9b-ces-ukr
    - OPUS-opensubtitles-v2024-ces-ukr
    - OPUS-qed-v2.0a-ces-ukr
    - OPUS-ted2020-v1-ces-ukr
    - OPUS-ubuntu-v14.10-ces-ukr
    - OPUS-bible_uedin-v1-ces-ukr
    - OPUS-multihplt-v3-ces-ukr
    - OPUS-neulab_tedtalks-v1-ces-ukr
    - OPUS-nllb-v1-ces-ukr
    - OPUS-tatoeba-v20230412-ces-ukr
    - OPUS-tldr_pages-v20251124-ces-ukr
    - OPUS-wikimedia-v20230407-ces-ukr
    - OPUS-xlent-v1.2-ces-ukr
  mono_train:
    - Statmt-news_crawl-2023-ukr
    - LangUk-news-1-ukr
    - LangUk-wiki_dump-1-ukr
    - LangUk-fiction-1-ukr
    - LangUk-ubercorpus-1-ukr
    - LangUk-laws-1-ukr
    - Leipzig-news-2022_1m-ukr
    - Leipzig-newscrawl-2018_1m-ukr
    - Leipzig-web-2019_1m-ukr_UA
    - Leipzig-wikipedia-2021_1m-ukr

########## CES-DEU ##########
- id: wmt26-ces-deu
  langs: ces-deu
  train:
    - Statmt-news_commentary-18.1-ces-deu
    - Tilde-eesc-2017-ces-deu
    - Tilde-ema-2016-ces-deu
    - Tilde-ecb-2017-ces-deu
    - Tilde-rapid-2016-ces-deu
    - Facebook-wikimatrix-1-ces-deu
    - LinguaTools-wikititles-2014-ces-deu
    - OPUS-ccmatrix-v1-ces-deu
    - OPUS-dgt-v4-ces-deu
    - OPUS-ecb-v1-ces-deu
    - OPUS-ecdc-v20160316-ces-deu
    - OPUS-elitr_eca-v1-ces-deu
    - OPUS-elrc_417_swedish_work_environ-v1-ces-deu
    - OPUS-elrc_ec_europa-v1-ces-deu
    - OPUS-elrc_emea-v1-ces-deu
    - OPUS-elrc_euipo_2017-v1-ces-deu
    - OPUS-elrc_europarl_covid-v1-ces-deu
    - OPUS-elrc_eur_lex-v1-ces-deu
    - OPUS-elrc_eu_publications-v1-ces-deu
    - OPUS-elrc_information_portal-v1-ces-deu
    - OPUS-elrc_antibiotic-v1-ces-deu
    - OPUS-elrc_presscorner_covid-v1-ces-deu
    - OPUS-elrc_vaccination-v1-ces-deu
    - OPUS-elrc_wikipedia_health-v1-ces-deu
    - OPUS-emea-v3-ces-deu
    - OPUS-eubookshop-v2-ces-deu
    - OPUS-euconst-v1-ces-deu
    - OPUS-gnome-v1-ces-deu
    - OPUS-globalvoices-v2018q4-ces-deu
    - OPUS-jrc_acquis-v3.0-ces-deu
    - OPUS-kde4-v2-ces-deu
    - OPUS-multiccaligned-v1.1-ces-deu
    - OPUS-multiparacrawl-v9b-ces-deu
    - OPUS-nllb-v1-ces-deu
    - OPUS-neulab_tedtalks-v1-ces-deu
    - OPUS-opensubtitles-v2024-ces-deu
    - OPUS-php-v1-ces-deu
    - OPUS-qed-v2.0a-ces-deu
    - OPUS-ted2020-v1-ces-deu
    - OPUS-tanzil-v1-ces-deu
    - OPUS-tatoeba-v20230412-ces-deu
    - OPUS-tildemodel-v2018-ces-deu
    - OPUS-ubuntu-v14.10-ces-deu
    - OPUS-xlent-v1.2-ces-deu
    - OPUS-bible_uedin-v1-ces-deu
    - OPUS-wikimedia-v20230407-ces-deu
    - OPUS-tldr_pages-v20251124-ces-deu
  mono_train:
    - Statmt-news_crawl-2023-deu
    - Statmt-europarl-10-deu
    - Statmt-news_commentary-18.1-deu
    - Statmt-commoncrawl-wmt22-deu
    - Leipzig-wikipedia-2021_1m-deu
    - Leipzig-comweb-2021_1m-deu
    - Leipzig-mixed_typical-2011_1m-deu
    - Leipzig-news-2022_30k-deu
    - Leipzig-newscrawl-2020_1m-deu
    - Leipzig-web-2021_100k-deu_DE

########## JPN-ZHO ##########
- id: wmt26-jpn-zho
  langs: jpn-zho
  train:
    - Statmt-news_commentary-18.1-jpn-zho
    - KECL-paracrawl-2wmt24-zho-jpn
    - Facebook-wikimatrix-1-jpn-zho
    - Neulab-tedtalks_train-1-jpn-zho
    - LinguaTools-wikititles-2014-jpn-zho
    - OPUS-ccmatrix-v1-jpn-zho
    - OPUS-php-v1-jpn-zho
    - OPUS-qed-v2.0a-jpn-zho
    - OPUS-ted2020-v1-jpn-zho
    - OPUS-tanzil-v1-jpn-zho
    - OPUS-ubuntu-v14.10-jpn-zho
    - OPUS-bible_uedin-v1-jpn-zho
    - OPUS-alt-v20191206-jpn-zho
    - OPUS-eubookshop-v2-jpn-zho
    - OPUS-jparacrawl-v3.0-jpn-zho
    - OPUS-nllb-v1-jpn-zho
    - OPUS-opensubtitles-v2016-jpn-zho
    - OPUS-tldr_pages-v20251124-jpn-zho
    - OPUS-wikimedia-v20230407-jpn-zho
    - OPUS-xlent-v1.2-jpn-zho
  mono_train:
    - Statmt-news_crawl-2023-zho
    - Statmt-news_commentary-18.1-zho
    - Statmt-commoncrawl-wmt22-zho
    - Leipzig-wikipedia-2018_1m-zho
    - Leipzig-web-2016_1m-zho_MO
    - Leipzig-tradnewscrawl-2011_1m-zho
    - Leipzig-news-2020_300k-zho

########## ENG-ARA ##########
- id: wmt26-eng-ara
  langs: eng-ara
  train:
    - Statmt-news_commentary-18.1-ara-eng
    - Statmt-tedtalks-2_clean-eng-ara
    - Statmt-ccaligned-1-ara_AR-eng
    - Facebook-wikimatrix-1-ara-eng
    - LinguaTools-wikititles-2014-ara-eng
    - OPUS-ccmatrix-v1-ara-eng
    - OPUS-elrc_3083_wikipedia_health-v1-ara-eng
    - OPUS-elrc_wikipedia_health-v1-ara-eng
    - OPUS-elrc_2922-v1-ara-eng
    - OPUS-eubookshop-v2-ara-eng
    - OPUS-gnome-v1-ara-eng
    - OPUS-globalvoices-v2018q4-ara-eng
    - OPUS-hplt-v2-ara-eng
    - OPUS-kde4-v2-ara-eng
    - OPUS-multiccaligned-v1-ara-eng
    - OPUS-multihplt-v2-ara-eng
    - OPUS-multiun-v1-ara-eng
    - OPUS-nllb-v1-ara-eng
    - OPUS-opensubtitles-v2024-ara-eng
    - OPUS-qed-v2.0a-ara-eng
    - OPUS-ted2020-v1-ara-eng
    - OPUS-tatoeba-v20230412-ara-eng
    - OPUS-ubuntu-v14.10-ara-eng
    - OPUS-wikipedia-v1.0-ara-eng
    - OPUS-xlent-v1.2-ara-eng
    - OPUS-bible_uedin-v1-ara-eng
    - OPUS-infopankki-v1-ara-eng
    - OPUS-tico_19-v20201028-ara-eng
    - OPUS-wikimedia-v20230407-ara-eng
    - OPUS-neulab_tedtalks-v1-ara-eng
    - OPUS-opus100_train-1-ara-eng
    - OPUS-tanzil-v1-ara-eng
    - OPUS-ted2013-v1.1-ara-eng
    - OPUS-tldr_pages-v20251124-ara-eng
    - OPUS-unpc-v1.0-ara-eng
  mono_train:
    - Statmt-news_crawl-2023-ara
    - Statmt-news_commentary-18.1-ara
    - Leipzig-news-2020_1m-ara
    - Leipzig-wikipedia-2021_1m-ara

########## ENG-ZHO ##########
- id: wmt26-eng-zho
  langs: eng-zho
  train:
    - Statmt-news_commentary-18.1-eng-zho
    - Statmt-wikititles-3-zho-eng
    - Statmt-ccaligned-1-eng-zho_CN
    - ParaCrawl-paracrawl-1_bonus-eng-zho
    - Facebook-wikimatrix-1-eng-zho
    - Neulab-tedtalks_train-1-eng-zho
    - ELRC-wikipedia_health-1-eng-zho
    - ELRC-hrw_dataset_v1-1-eng-zho
    - LinguaTools-wikititles-2014-eng-zho
    - OPUS-ccmatrix-v1-eng-zho
    - OPUS-elrc_3056_wikipedia_health-v1-eng-zho
    - OPUS-elrc_wikipedia_health-v1-eng-zho
    - OPUS-elrc_2922-v1-eng-zho
    - OPUS-eubookshop-v2-eng-zho
    - OPUS-multiun-v1-eng-zho
    - OPUS-nllb-v1-eng-zho
    - OPUS-php-v1-eng-zho
    - OPUS-qed-v2.0a-eng-zho
    - OPUS-spc-v1-eng-zho
    - OPUS-ted2020-v1-eng-zho
    - OPUS-tanzil-v1-eng-zho
    - OPUS-ubuntu-v14.10-eng-zho
    - OPUS-xlent-v1.2-eng-zho
    - OPUS-bible_uedin-v1-eng-zho
    - OPUS-infopankki-v1-eng-zho
    - OPUS-tico_19-v20201028-eng-zho
    - OPUS-wikimedia-v20230407-eng-zho
    - OPUS-alt-v20191206-eng-zho
    - OPUS-opensubtitles-v2016-eng-zho
    - OPUS-opus100_train-1-eng-zho
    - OPUS-paracrawl_bonus-v9-eng-zho
    - OPUS-ted2013-v1.1-eng-zho
    - OPUS-tldr_pages-v20251124-eng-zho
    - OPUS-unpc-v1.0-eng-zho
  mono_train:
    - Statmt-news_crawl-2023-zho
    - Statmt-news_commentary-18.1-zho
    - Statmt-commoncrawl-wmt22-zho
    - Leipzig-wikipedia-2018_1m-zho
    - Leipzig-web-2016_1m-zho_MO
    - Leipzig-tradnewscrawl-2011_1m-zho
    - Leipzig-news-2020_300k-zho

########## ENG-CES ##########
- id: wmt26-eng-ces
  langs: eng-ces
  train:
    - Statmt-commoncrawl_wmt13-1-ces-eng
    - Statmt-news_commentary-18.1-ces-eng
    - Statmt-wikititles-3-ces-eng
    - Statmt-europarl-10-ces-eng
    - Statmt-ccaligned-1-ces_CZ-eng
    - ParaCrawl-paracrawl-9-eng-ces
    - Tilde-eesc-2017-ces-eng
    - Tilde-ema-2016-ces-eng
    - Tilde-ecb-2017-ces-eng
    - Tilde-rapid-2019-ces-eng
    - Facebook-wikimatrix-1-ces-eng
    - Neulab-tedtalks_train-1-eng-ces
    - ELRC-information_portal_czech_president_czech_castle-1-ces-eng
    - ELRC-electronic_exchange_social_security_information-1-ces-eng
    - ELRC-euipo_2017-1-ces-eng
    - ELRC-czech_supreme_audit_office_2018_reports-1-ces-eng
    - ELRC-czech_supreme_audit_office_2008_2017_reports-1-ces-eng
    - ELRC-czech_supreme_audit_office_2003_2017_press_releases-1-ces-eng
    - ELRC-czech_supreme_audit_office_2018_press_releases-1-ces-eng
    - ELRC-emea-1-ces-eng
    - ELRC-vaccination-1-ces-eng
    - ELRC-eu_publications_medical_v2-1-ces-eng
    - ELRC-wikipedia_health-1-ces-eng
    - ELRC-antibiotic-1-ces-eng
    - ELRC-europarl_covid-1-ces-eng
    - ELRC-ec_europa_covid-1-ces-eng
    - ELRC-eur_lex_covid-1-ces-eng
    - ELRC-presscorner_covid-1-ces-eng
    - ELRC-scipar-1-ces-eng
    - ELRC-web_acquired_data_related_to_scientific_research-1-eng-ces
    - ELRC-hrw_dataset_v1-1-eng-ces
    - ELRC-cef_data_marketplace-1-eng-ces
    - EU-ecdc-1-eng-ces
    - EU-eac_forms-1-ces-eng
    - EU-eac_reference-1-ces-eng
    - EU-dcep-1-ces-eng
    - LinguaTools-wikititles-2014-ces-eng
    - OPUS-ccmatrix-v1-ces-eng
    - OPUS-dgt-v4-ces-eng
    - OPUS-ecb-v1-ces-eng
    - OPUS-ecdc-v20160316-ces-eng
    - OPUS-elitr_eca-v1-ces-eng
    - OPUS-elrc_2012_euipo_2017-v1-ces-eng
    - OPUS-elrc_2404_czech_supreme_audit-v1-ces-eng
    - OPUS-elrc_2405_czech_supreme_audit-v1-ces-eng
    - OPUS-elrc_2406_czech_supreme_audit-v1-ces-eng
    - OPUS-elrc_2407_czech_supreme_audit-v1-ces-eng
    - OPUS-elrc_2713_emea-v1-ces-eng
    - OPUS-elrc_2749_vaccination-v1-ces-eng
    - OPUS-elrc_2874_eu_publications_medi-v1-ces-eng
    - OPUS-elrc_3062_wikipedia_health-v1-ces-eng
    - OPUS-elrc_3201_antibiotic-v1-ces-eng
    - OPUS-elrc_3292_europarl_covid-v1-ces-eng
    - OPUS-elrc_3463_ec_europa_covid-v1-ces-eng
    - OPUS-elrc_3564_eur_lex_covid-v1-ces-eng
    - OPUS-elrc_3605_presscorner_covid-v1-ces-eng
    - OPUS-elrc_40_information_portal_c-v1-ces-eng
    - OPUS-elrc_427_electronic_exchange_-v1-ces-eng
    - OPUS-elrc_5067_scipar-v1-ces-eng
    - OPUS-elrc_ec_europa-v1-ces-eng
    - OPUS-elrc_emea-v1-ces-eng
    - OPUS-elrc_euipo_2017-v1-ces-eng
    - OPUS-elrc_europarl_covid-v1-ces-eng
    - OPUS-elrc_eur_lex-v1-ces-eng
    - OPUS-elrc_eu_publications-v1-ces-eng
    - OPUS-elrc_information_portal-v1-ces-eng
    - OPUS-elrc_antibiotic-v1-ces-eng
    - OPUS-elrc_presscorner_covid-v1-ces-eng
    - OPUS-elrc_vaccination-v1-ces-eng
    - OPUS-elrc_wikipedia_health-v1-ces-eng
    - OPUS-elrc_2682-v1-ces-eng
    - OPUS-elrc_2922-v1-ces-eng
    - OPUS-elrc_2923-v1-ces-eng
    - OPUS-elrc_3382-v1-ces-eng
    - OPUS-emea-v3-ces-eng
    - OPUS-eubookshop-v2-ces-eng
    - OPUS-euconst-v1-ces-eng
    - OPUS-gnome-v1-ces-eng
    - OPUS-globalvoices-v2018q4-ces-eng
    - OPUS-jrc_acquis-v3.0-ces-eng
    - OPUS-kde4-v2-ces-eng
    - OPUS-multiccaligned-v1-ces-eng
    - OPUS-multiparacrawl-v7.1-ces-eng
    - OPUS-nllb-v1-ces-eng
    - OPUS-opensubtitles-v2024-ces-eng
    - OPUS-php-v1-ces-eng
    - OPUS-qed-v2.0a-ces-eng
    - OPUS-ted2020-v1-ces-eng
    - OPUS-tanzil-v1-ces-eng
    - OPUS-tatoeba-v20230412-ces-eng
    - OPUS-tildemodel-v2018-ces-eng
    - OPUS-ubuntu-v14.10-ces-eng
    - OPUS-wikipedia-v1.0-ces-eng
    - OPUS-xlent-v1.2-ces-eng
    - OPUS-bible_uedin-v1-ces-eng
    - OPUS-wikimedia-v20230407-ces-eng
    - OPUS-hplt-v3-ces-eng
    - OPUS-multihplt-v3-ces-eng
    - OPUS-opus100_train-1-ces-eng
    - OPUS-tldr_pages-v20251124-ces-eng
  mono_train:
    - Statmt-news_crawl-2023-ces
    - Statmt-europarl-10-ces
    - Statmt-news_commentary-18.1-ces
    - Statmt-commoncrawl-wmt22-ces
    - Leipzig-news-2022_1m-ces
    - Leipzig-newscrawl-2019_1m-ces
    - Leipzig-wikipedia-2021_1m-ces
    - Leipzig-web_public-2019_1m-ces_CZ

########## ENG-EST ##########
- id: wmt26-eng-est
  langs: eng-est
  train:
    - Statmt-europarl-7-est-eng
    - Statmt-ccaligned-1-eng-est_EE
    - ParaCrawl-paracrawl-9-eng-est
    - Tilde-eesc-2017-eng-est
    - Tilde-ema-2016-eng-est
    - Tilde-airbaltic-1-eng-est
    - Tilde-ecb-2017-eng-est
    - Tilde-rapid-2016-eng-est
    - Facebook-wikimatrix-1-eng-est
    - Neulab-tedtalks_train-1-eng-est
    - ELRC-estonian_cabinet_ministers-1-eng-est
    - ELRC-bank_estonia-1-eng-est
    - ELRC-legal_estonian_justice-1-eng-est
    - ELRC-estonian_foreign_affairs-1-eng-est
    - ELRC-parliament_estonia-1-eng-est
    - ELRC-finnish_information_bank-1-eng-est
    - ELRC-national_security_defence-1-eng-est
    - ELRC-akadeemia.ee-1-eng-est
    - ELRC-vp1992_2001.president.ee-1-eng-est
    - ELRC-vp2001_2006.president.ee-1-eng-est
    - ELRC-vp2006_2016.president.ee-1-eng-est
    - ELRC-president.ee-1-eng-est
    - ELRC-www.visitestonia.com-1-eng-est
    - ELRC-euipo_2017-1-eng-est
    - ELRC-estonian_classification_economic_activities-1-eng-est
    - ELRC-press_releases_foreign_affairs_estonia-1-eng-est
    - ELRC-emea-1-eng-est
    - ELRC-vaccination-1-eng-est
    - ELRC-eu_publications_medical_v2-1-eng-est
    - ELRC-wikipedia_health-1-eng-est
    - ELRC-antibiotic-1-eng-est
    - ELRC-europarl_covid-1-eng-est
    - ELRC-ec_europa_covid-1-eng-est
    - ELRC-www.kriis.ee-1-eng-est
    - ELRC-eur_lex_covid-1-eng-est
    - ELRC-presscorner_covid-1-eng-est
    - ELRC-nteu_tiera-1-eng-est
    - ELRC-nteu_tierb-1-eng-est
    - ELRC-scipar-1-eng-est
    - ELRC-web_acquired_data_related_to_scientific_research-1-eng-est
    - EU-ecdc-1-eng-est
    - EU-eac_forms-1-eng-est
    - EU-eac_reference-1-eng-est
    - EU-dcep-1-eng-est
    - OPUS-ccmatrix-v1-eng-est
    - OPUS-dgt-v4-eng-est
    - OPUS-ecb-v1-eng-est
    - OPUS-ecdc-v20160316-eng-est
    - OPUS-elitr_eca-v1-eng-est
    - OPUS-elra_w0154-v1-eng-est
    - OPUS-elra_w0167-v1-eng-est
    - OPUS-elra_w0168-v1-eng-est
    - OPUS-elra_w0215-v1-eng-est
    - OPUS-elra_w0218-v1-eng-est
    - OPUS-elra_w0265-v1-eng-est
    - OPUS-elrc_1129_www.visitestonia.com-v1-eng-est
    - OPUS-elrc_2016_euipo_2017-v1-eng-est
    - OPUS-elrc_2457_estonian_classificat-v1-eng-est
    - OPUS-elrc_2461_press_releases_forei-v1-eng-est
    - OPUS-elrc_2723_emea-v1-eng-est
    - OPUS-elrc_2751_vaccination-v1-eng-est
    - OPUS-elrc_2882_eu_publications_medi-v1-eng-est
    - OPUS-elrc_3079_wikipedia_health-v1-eng-est
    - OPUS-elrc_3211_antibiotic-v1-eng-est
    - OPUS-elrc_3300_europarl_covid-v1-eng-est
    - OPUS-elrc_3471_ec_europa_covid-v1-eng-est
    - OPUS-elrc_3554_www.kriis.ee-v1-eng-est
    - OPUS-elrc_3572_eur_lex_covid-v1-eng-est
    - OPUS-elrc_3613_presscorner_covid-v1-eng-est
    - OPUS-elrc_393_estonian_cabinet_min-v1-eng-est
    - OPUS-elrc_411_bank_estonia-v1-eng-est
    - OPUS-elrc_4271_nteu_tiera-v1-eng-est
    - OPUS-elrc_429_legal_estonian_justi-v1-eng-est
    - OPUS-elrc_431_estonian_foreign_aff-v1-eng-est
    - OPUS-elrc_5067_scipar-v1-eng-est
    - OPUS-elrc_714_parliament_estonia-v1-eng-est
    - OPUS-elrc_717_finnish_information_-v1-eng-est
    - OPUS-elrc_770_national_security_de-v1-eng-est
    - OPUS-elrc_919_akadeemia.ee-v1-eng-est
    - OPUS-elrc_937_vp1992_2001.presiden-v1-eng-est
    - OPUS-elrc_938_vp2001_2006.presiden-v1-eng-est
    - OPUS-elrc_939_vp2006_2016.presiden-v1-eng-est
    - OPUS-elrc_940_president.ee-v1-eng-est
    - OPUS-elrc_ec_europa-v1-eng-est
    - OPUS-elrc_emea-v1-eng-est
    - OPUS-elrc_euipo_2017-v1-eng-est
    - OPUS-elrc_europarl_covid-v1-eng-est
    - OPUS-elrc_eur_lex-v1-eng-est
    - OPUS-elrc_eu_publications-v1-eng-est
    - OPUS-elrc_finnish_information-v1-eng-est
    - OPUS-elrc_antibiotic-v1-eng-est
    - OPUS-elrc_presscorner_covid-v1-eng-est
    - OPUS-elrc_vaccination-v1-eng-est
    - OPUS-elrc_wikipedia_health-v1-eng-est
    - OPUS-elrc_www.visitestonia.com-v1-eng-est
    - OPUS-elrc_2682-v1-eng-est
    - OPUS-elrc_2922-v1-eng-est
    - OPUS-elrc_2923-v1-eng-est
    - OPUS-elrc_3382-v1-eng-est
    - OPUS-emea-v3-eng-est
    - OPUS-eopc-v2022-eng-est
    - OPUS-eubookshop-v2-eng-est
    - OPUS-euconst-v1-eng-est
    - OPUS-gnome-v1-eng-est
    - OPUS-jrc_acquis-v3.0-eng-est
    - OPUS-kde4-v2-eng-est
    - OPUS-kdedoc-v1-eng_GB-est
    - OPUS-multiccaligned-v1-eng-est
    - OPUS-multiparacrawl-v7.1-eng-est
    - OPUS-nllb-v1-eng-est
    - OPUS-qed-v2.0a-eng-est
    - OPUS-ted2020-v1-eng-est
    - OPUS-tatoeba-v20230412-eng-est
    - OPUS-tildemodel-v2018-eng-est
    - OPUS-ubuntu-v14.10-eng-est
    - OPUS-xlent-v1.2-eng-est
    - OPUS-bible_uedin-v1-eng-est
    - OPUS-infopankki-v1-eng-est
    - OPUS-wikimedia-v20230407-eng-est
    - OPUS-hplt-v3-eng-est
    - OPUS-multihplt-v3-eng-est
    - OPUS-opensubtitles-v2024-eng-est
    - OPUS-opus100_train-1-eng-est
  mono_train:
    - Statmt-news_crawl-2023-est
    - Leipzig-web-2015_1m-est_EE
    - Leipzig-news-2020_300k-est
    - Leipzig-newscrawl-2017_1m-est

########## ENG-ISL ##########
- id: wmt26-eng-isl
  langs: eng-isl
  train:
    - Statmt-wikititles-3-isl-eng
    - Statmt-ccaligned-1-eng-isl_IS
    - ParaCrawl-paracrawl-9-eng-isl
    - Tilde-eesc-2017-eng-isl
    - Tilde-ema-2016-eng-isl
    - Tilde-rapid-2016-eng-isl
    - Facebook-wikimatrix-1-eng-isl
    - ParIce-eea_train-20.05-eng-isl
    - ParIce-ema_train-20.05-eng-isl
    - EU-ecdc-1-eng-isl
    - EU-eac_forms-1-eng-isl
    - EU-eac_reference-1-eng-isl
    - OPUS-ccmatrix-v1-eng-isl
    - OPUS-elrc_2718_emea-v1-eng-isl
    - OPUS-elrc_3206_antibiotic-v1-eng-isl
    - OPUS-elrc_4295_www.malfong.is-v1-eng-isl
    - OPUS-elrc_4324_government_offices_i-v1-eng-isl
    - OPUS-elrc_4327_government_offices_i-v1-eng-isl
    - OPUS-elrc_4334_rkiskaup_2020-v1-eng-isl
    - OPUS-elrc_4338_university_iceland-v1-eng-isl
    - OPUS-elrc_502_icelandic_financial_-v1-eng-isl
    - OPUS-elrc_504_www.iceida.is-v1-eng-isl
    - OPUS-elrc_505_www.pfs.is-v1-eng-isl
    - OPUS-elrc_506_www.lanamal.is-v1-eng-isl
    - OPUS-elrc_5067_scipar-v1-eng-isl
    - OPUS-elrc_508_tilde_statistics_ice-v1-eng-isl
    - OPUS-elrc_509_gallery_iceland-v1-eng-isl
    - OPUS-elrc_510_harpa_reykjavik_conc-v1-eng-isl
    - OPUS-elrc_511_bokmenntaborgin_is-v1-eng-isl
    - OPUS-elrc_516_icelandic_medicines-v1-eng-isl
    - OPUS-elrc_517_icelandic_directorat-v1-eng-isl
    - OPUS-elrc_597_www.nordisketax.net-v1-eng-isl
    - OPUS-elrc_718_statistics_iceland-v1-eng-isl
    - OPUS-elrc_728_www.norden.org-v1-eng-isl
    - OPUS-elrc_emea-v1-eng-isl
    - OPUS-elrc_antibiotic-v1-eng-isl
    - OPUS-elrc_www.norden.org-v1-eng-isl
    - OPUS-elrc_www.nordisketax.net-v1-eng-isl
    - OPUS-eubookshop-v2-eng-isl
    - OPUS-multiccaligned-v1-eng-isl
    - OPUS-multiparacrawl-v7.1-eng-isl
    - OPUS-opensubtitles-v2024-eng-isl
    - OPUS-ted2020-v1-eng-isl
    - OPUS-ubuntu-v14.10-eng-isl
    - OPUS-bible_uedin-v1-eng-isl
    - OPUS-ecdc-v20160316-eng-isl
    - OPUS-gnome-v1-eng-isl
    - OPUS-hplt-v3-eng-isl
    - OPUS-kde4-v2-eng-isl
    - OPUS-macocu-v2-eng-isl
    - OPUS-multihplt-v3-eng-isl
    - OPUS-multimacocu-v2-eng-isl
    - OPUS-nllb-v1-eng-isl
    - OPUS-opus100_train-1-eng-isl
    - OPUS-parice-v1-eng-isl
    - OPUS-qed-v2.0a-eng-isl
    - OPUS-tatoeba-v20230412-eng-isl
    - OPUS-tildemodel-v2018-eng-isl
    - OPUS-wikimedia-v20230407-eng-isl
    - OPUS-xlent-v1.2-eng-isl
  mono_train:
    - Statmt-news_crawl-2023-isl
    - Leipzig-web-2020_1m-isl_IS
    - Leipzig-web_public-2019_1m-isl_IS
    - Leipzig-news-2020_30k-isl
    - Leipzig-newscrawl-2019_300k-isl
    - Leipzig-wikipedia-2021_100k-isl

########## ENG-JPN ##########
- id: wmt26-eng-jpn
  langs: eng-jpn
  train:
    - Statmt-news_commentary-18.1-eng-jpn
    - Statmt-wikititles-3-jpn-eng
    - Statmt-ted-wmt20-eng-jpn
    - Statmt-ccaligned-1-eng-jpn
    - KECL-paracrawl-3-eng-jpn
    - Facebook-wikimatrix-1-eng-jpn
    - Phontron-kftt_train-1-eng-jpn
    - StanfordNLP-jesc_train-1-eng-jpn
    - Neulab-tedtalks_train-1-eng-jpn
    - LinguaTools-wikititles-2014-eng-jpn
    - OPUS-ccmatrix-v1-eng-jpn
    - OPUS-eubookshop-v2-eng-jpn
    - OPUS-gnome-v1-eng-jpn
    - OPUS-globalvoices-v2018q4-eng-jpn
    - OPUS-hplt-v2-eng-jpn
    - OPUS-kde4-v2-eng-jpn
    - OPUS-mdn_web_docs-v20230925-eng-jpn
    - OPUS-multiccaligned-v1-eng-jpn
    - OPUS-multihplt-v2-eng-jpn
    - OPUS-nllb-v1-eng-jpn
    - OPUS-openoffice-v3-eng_GB-jpn
    - OPUS-opensubtitles-v2024-eng-jpn
    - OPUS-php-v1-eng-jpn
    - OPUS-qed-v2.0a-eng-jpn
    - OPUS-ted2020-v1-eng-jpn
    - OPUS-tanzil-v1-eng-jpn
    - OPUS-tatoeba-v20230412-eng-jpn
    - OPUS-ubuntu-v14.10-eng-jpn
    - OPUS-xlent-v1.2-eng-jpn
    - OPUS-bible_uedin-v1-eng-jpn
    - OPUS-wikimedia-v20230407-eng-jpn
    - OPUS-alt-v20191206-eng-jpn
    - OPUS-jesc-v20191205-eng-jpn
    - OPUS-jparacrawl-v3.0-eng-jpn
    - OPUS-kftt-v1.0-eng-jpn
    - OPUS-openoffice-v2-eng-jpn
    - OPUS-opus100_train-1-eng-jpn
    - OPUS-tldr_pages-v20251124-eng-jpn
  mono_train:
    - Statmt-news_crawl-2023-jpn
    - Statmt-news_commentary-18.1-jpn
    - Statmt-commoncrawl-wmt22-jpn
    - Leipzig-web-2020_1m-jpn_JP
    - Leipzig-comweb-2018_1m-jpn
    - Leipzig-web_public-2019_1m-jpn_JP
    - Leipzig-news-2020_100k-jpn
    - Leipzig-newscrawl-2019_1m-jpn
    - Leipzig-wikipedia-2021_1m-jpn

########## ENG-KOR ##########
- id: wmt26-eng-kor
  langs: eng-kor
  train:
    - Statmt-ccaligned-1-eng-kor_KR
    - ParaCrawl-paracrawl-1_bonus-eng-kor
    - Facebook-wikimatrix-1-eng-kor
    - Neulab-tedtalks_train-1-eng-kor
    - ELRC-wikipedia_health-1-eng-kor
    - ELRC-hrw_dataset_v1-1-eng-kor
    - LinguaTools-wikititles-2014-eng-kor
    - OPUS-ccmatrix-v1-eng-kor
    - OPUS-elrc_3070_wikipedia_health-v1-eng-kor
    - OPUS-elrc_wikipedia_health-v1-eng-kor
    - OPUS-elrc_2922-v1-eng-kor
    - OPUS-gnome-v1-eng-kor
    - OPUS-globalvoices-v2018q4-eng-kor
    - OPUS-hplt-v2-eng-kor
    - OPUS-multihplt-v2-eng-kor
    - OPUS-kde4-v2-eng-kor
    - OPUS-mdn_web_docs-v20230925-eng-kor
    - OPUS-multiccaligned-v1-eng-kor
    - OPUS-nllb-v1-eng-kor
    - OPUS-opensubtitles-v2024-eng-kor
    - OPUS-php-v1-eng-kor
    - OPUS-qed-v2.0a-eng-kor
    - OPUS-ted2020-v1-eng-kor
    - OPUS-tanzil-v1-eng-kor
    - OPUS-tatoeba-v20230412-eng-kor
    - OPUS-ubuntu-v14.10-eng-kor
    - OPUS-xlent-v1.2-eng-kor
    - OPUS-bible_uedin-v1-eng-kor
    - OPUS-wikimedia-v20230407-eng-kor
    - OPUS-opus100_train-1-eng-kor
    - OPUS-tldr_pages-v20251124-eng-kor
    - OPUS-translatewiki-v20250101-eng-kor
  mono_train:
    - Statmt-news_crawl-2023-kor
    - Leipzig-web-2020_1m-kor_KR
    - Leipzig-news-2020_1m-kor
    - Leipzig-wikipedia-2021_1m-kor

########## CES-VIE (new) ##########
- id: wmt26-ces-vie
  langs: ces-vie
  train:
    - Facebook-wikimatrix-1-ces-vie
    - Neulab-tedtalks_train-1-vie-ces
    - OPUS-bible_uedin-v1-ces-vie
    - OPUS-ccmatrix-v1-ces-vie
    - OPUS-elrc_wikipedia_health-v1-ces-vie
    - OPUS-gnome-v1-ces-vie
    - OPUS-kde4-v2-ces-vie
    - OPUS-multiccaligned-v1.1-ces-vie
    - OPUS-nllb-v1-ces-vie
    - OPUS-opensubtitles-v2024-ces-vie
    - OPUS-qed-v2.0a-ces-vie
    - OPUS-tatoeba-v20230412-ces-vie
    - OPUS-ted2020-v1-ces-vie
    - OPUS-ubuntu-v14.10-ces-vie
    - OPUS-wikimedia-v20230407-ces-vie
    - OPUS-xlent-v1.2-ces-vie
  mono_train:
    - Leipzig-web-2013_10k-vie_KH
    - Leipzig-mixed-2014_1m-vie
    - Leipzig-news-2020_1m-vie
    - Leipzig-newscrwal-2011_1m-vie
    - Leipzig-web-2015_1m-vie_VN
    - Leipzig-wikipedia-2021_1m-vie

########## ENG-HYE (new) ##########
- id: wmt26-eng-hye
  langs: eng-hye
  train:
    - Neulab-tedtalks_train-1-eng-hye
    - OPUS-bible_uedin-v1-eng-hye
    - OPUS-gnome-v1-eng-hye
    - OPUS-kde4-v2-eng-hye
    - OPUS-multiccaligned-v1-eng-hye
    - OPUS-nllb-v1-eng-hye
    - OPUS-opensubtitles-v2024-eng-hye
    - OPUS-opus100_train-1-eng-hye
    - OPUS-paracrawl_bonus-v9-eng-hye
    - OPUS-qed-v2.0a-eng-hye
    - OPUS-tatoeba-v20230412-eng-hye
    - OPUS-ted2020-v1-eng-hye
    - OPUS-ubuntu-v14.10-eng-hye
    - OPUS-wikimedia-v20230407-eng-hye
    - OPUS-xlent-v1.2-eng-hye
    - Statmt-ccaligned-1-eng-hye_AM
  mono_train:
    - Leipzig-web-2017_1m-hye_AM
    - Leipzig-community-2017-hy
    - Leipzig-news-2021_30k-hye
    - Leipzig-wikipedia-2021_1m-hye

########## ENG-BEL (new) ##########
- id: wmt26-eng-bel
  langs: eng-bel
  train:
    - ELRC-wikipedia_health-1-bel-eng
    - Facebook-wikimatrix-1-bel-eng
    - Neulab-tedtalks_train-1-eng-bel
    - OPUS-ccmatrix-v1-bel-eng
    - OPUS-elrc_2922-v1-bel-eng
    - OPUS-elrc_3046_wikipedia_health-v1-bel-eng
    - OPUS-elrc_wikipedia_health-v1-bel-eng
    - OPUS-eubookshop-v2-bel-eng
    - OPUS-gnome-v1-bel-eng
    - OPUS-hplt-v2-bel-eng
    - OPUS-kde4-v2-bel-eng
    - OPUS-kde4-v2-bel-eng_GB
    - OPUS-multiccaligned-v1-bel-eng
    - OPUS-multihplt-v2-bel-eng
    - OPUS-nllb-v1-bel-eng
    - OPUS-opensubtitles-v2024-bel-eng
    - OPUS-opus100_train-1-bel-eng
    - OPUS-qed-v2.0a-bel-eng
    - OPUS-tatoeba-v20230412-bel-eng
    - OPUS-ted2020-v1-bel-eng
    - OPUS-ubuntu-v14.10-bel-eng
    - OPUS-wikimedia-v20230407-bel-eng
    - OPUS-xlent-v1.2-bel-eng
    - Statmt-ccaligned-1-bel_BY-eng
  mono_train:
    - Leipzig-web-2013_1m-bel_BY
    - Leipzig-web-2015_300k-bel_BY
    - Leipzig-news-2020_100k-bel
    - Leipzig-newscrawl-2015_1m-bel
    - Leipzig-newscrawl-2017_300k-bel
    - Leipzig-wikipedia-2021_300k-bel

########## ENG-ZHO_TW (new) ##########
- id: wmt26-eng-zho_TW
  langs: eng-zho_TW
  train:
    - ELRC-hrw_dataset_v1-1-eng-zho
    - ELRC-wikipedia_health-1-eng-zho
    - Facebook-wikimatrix-1-eng-zho

    - LinguaTools-wikititles-2014-eng-zho
    - Neulab-tedtalks_train-1-eng-zho
    - OPUS-bible_uedin-v1-eng-zho
    - OPUS-ccmatrix-v1-eng-zho
    - OPUS-elrc_2922-v1-eng-zho
    - OPUS-elrc_3056_wikipedia_health-v1-eng-zho
    - OPUS-elrc_wikipedia_health-v1-eng-zho
    - OPUS-eubookshop-v2-eng-zho
    - OPUS-infopankki-v1-eng-zho
    - OPUS-multiun-v1-eng-zho
    - OPUS-nllb-v1-eng-zho
    - OPUS-opensubtitles-v2016-eng-zho
    - OPUS-opus100_train-1-eng-zho
    - OPUS-paracrawl_bonus-v9-eng-zho
    - OPUS-php-v1-eng-zho
    - OPUS-qed-v2.0a-eng-zho
    - OPUS-spc-v1-eng-zho
    - OPUS-tanzil-v1-eng-zho
    - OPUS-ted2013-v1.1-eng-zho
    - OPUS-ted2020-v1-eng-zho
    - OPUS-tico_19-v20201028-eng-zho
    - OPUS-ubuntu-v14.10-eng-zho
    - OPUS-unpc-v1.0-eng-zho
    - OPUS-wikimedia-v20230407-eng-zho
    - OPUS-xlent-v1.2-eng-zho
    - ParaCrawl-paracrawl-1_bonus-eng-zho
    #- Statmt-backtrans_enzh-wmt20-eng-zho
    - Statmt-ccaligned-1-eng-zho_TW
    - Statmt-news_commentary-18.1-eng-zho
    - Statmt-wikititles-3-zho-eng
    - OPUS-alt-v20191206-eng-zho
    - OPUS-gnome-v1-eng-zho_TW
    - OPUS-gnome-v1-eng_AU-zho_TW
    - OPUS-gnome-v1-eng_CA-zho_TW
    - OPUS-gnome-v1-eng_GB-zho_TW
    - OPUS-gnome-v1-eng_NZ-zho_TW
    - OPUS-gnome-v1-eng_US-zho_TW
    - OPUS-kde4-v2-eng-zho_TW
    - OPUS-kde4-v2-eng_GB-zho_TW
    - OPUS-kdedoc-v1-eng_GB-zho_TW
    - OPUS-mdn_web_docs-v20230925-eng-zho_TW
    - OPUS-multiccaligned-v1-eng-zho_TW
    - OPUS-nllb-v1-eng-zho_TW
    - OPUS-opensubtitles-v2024-eng-zho_TW
    - OPUS-php-v1-eng-zho_TW
    - OPUS-ted2020-v1-eng-zho_TW
    - OPUS-tldr_pages-v20251124-eng-zho
    - OPUS-ubuntu-v14.10-eng-zho_TW
    - OPUS-ubuntu-v14.10-eng_AU-zho_TW
    - OPUS-ubuntu-v14.10-eng_CA-zho_TW
    - OPUS-ubuntu-v14.10-eng_GB-zho_TW
    - OPUS-ubuntu-v14.10-eng_NZ-zho_TW
    - OPUS-ubuntu-v14.10-eng_US-zho_TW
    - OPUS-wikimedia-v20230407-eng-zho_TW
  mono_train:
    - Statmt-news_crawl-2023-zho
    - Statmt-news_commentary-18.1-zho
    - Statmt-commoncrawl-wmt22-zho
    - Leipzig-web-2015_1m-zho_CN
    - Leipzig-news-2007_2009_1m-zho
    - Leipzig-news-2020_300k-zho
    - Leipzig-simp_twweb-2014_300k-zho
    - Leipzig-tradnewscrawl-2011_1m-zho
    - Leipzig-wikipedia-2018_1m-zho

########## ENG-DEU (new) ##########
- id: wmt26-eng-deu
  langs: eng-deu
  train:
    - EU-dcep-1-deu-eng
    - EU-eac_forms-1-deu-eng
    - EU-eac_reference-1-deu-eng
    - EU-ecdc-1-eng-deu
    - Facebook-wikimatrix-1-deu-eng
    - LinguaTools-wikititles-2014-deu-eng
    - Neulab-tedtalks_train-1-eng-deu
    - OPUS-bible_uedin-v1-deu-eng
    - OPUS-books-v1-deu-eng
    - OPUS-ccaligned-v1-deu-eng
    - OPUS-ccmatrix-v1-deu-eng
    - OPUS-dgt-v4-deu-eng
    - OPUS-ecb-v1-deu-eng
    - OPUS-ecdc-v20160316-deu-eng
    - OPUS-elitr_eca-v1-deu-eng
    - OPUS-elra_w0143-v1-deu-eng
    - OPUS-elra_w0197-v1-deu-eng_GB
    - OPUS-elra_w0198-v1-deu-eng_GB
    - OPUS-elra_w0199-v1-deu-eng_GB
    - OPUS-elra_w0200-v1-deu-eng_GB
    - OPUS-elra_w0201-v1-deu-eng
    - OPUS-elra_w0301-v1-deu-eng
    - OPUS-elrc_1077_euipo_law-v1-deu-eng
    - OPUS-elrc_1086_information_portal_g-v1-deu-eng
    - OPUS-elrc_1088_german_foreign_offic-v1-deu-eng
    - OPUS-elrc_1089_german_foreign_offic-v1-deu-eng
    - OPUS-elrc_1090_german_foreign_offic-v1-deu-eng
    - OPUS-elrc_1092_euipo_list-v1-deu-eng
    - OPUS-elrc_1117_cordis_news-v1-deu-eng
    - OPUS-elrc_1121_cordis_results_brief-v1-deu-eng
    - OPUS-elrc_1238_energy_report_city-v1-deu-eng
    - OPUS-elrc_1240_austrian_research_te-v1-deu-eng
    - OPUS-elrc_1241_2017_activity_report-v1-deu-eng
    - OPUS-elrc_1243_vienna_environmental-v1-deu-eng
    - OPUS-elrc_2014_euipo_2017-v1-deu-eng
    - OPUS-elrc_2410_portal_oficial_turis-v1-deu-eng
    - OPUS-elrc_2612_artigos_visitportuga-v1-deu-eng
    - OPUS-elrc_2614_localidades_2007-v1-deu-eng
    - OPUS-elrc_2616_museus_2007-v1-deu-eng
    - OPUS-elrc_2622_arquitectura_2007-v1-deu-eng
    - OPUS-elrc_2623_patrimnio_aores_2006-v1-deu-eng
    - OPUS-elrc_2638_monumentos_2007-v1-deu-eng
    - OPUS-elrc_2639_parques_e_reservas-v1-deu-eng
    - OPUS-elrc_2641_praias_2007-v1-deu-eng
    - OPUS-elrc_2682-v1-deu-eng
    - OPUS-elrc_2714_emea-v1-deu-eng
    - OPUS-elrc_2736_vaccination-v1-deu-eng
    - OPUS-elrc_2875_eu_publications_medi-v1-deu-eng
    - OPUS-elrc_2922-v1-deu-eng
    - OPUS-elrc_2923-v1-deu-eng
    - OPUS-elrc_3063_wikipedia_health-v1-deu-eng
    - OPUS-elrc_3202_antibiotic-v1-deu-eng
    - OPUS-elrc_3293_europarl_covid-v1-deu-eng
    - OPUS-elrc_3382-v1-deu-eng
    - OPUS-elrc_3464_ec_europa_covid-v1-deu-eng
    - OPUS-elrc_3565_eur_lex_covid-v1-deu-eng
    - OPUS-elrc_3606_presscorner_covid-v1-deu-eng
    - OPUS-elrc_3852_development_funds_re-v1-deu-eng
    - OPUS-elrc_401_swedish_labour_part2-v1-deu-eng
    - OPUS-elrc_403_rights_arrested-v1-deu-eng
    - OPUS-elrc_406_swedish_labour_part1-v1-deu-eng
    - OPUS-elrc_416_swedish_social_secur-v1-deu-eng
    - OPUS-elrc_417_swedish_work_environ-v1-deu-eng
    - OPUS-elrc_4992_customer_support_mt-v1-deu-eng
    - OPUS-elrc_5067_scipar-v1-deu-eng
    - OPUS-elrc_5220_information_crime_vi-v1-deu-eng
    - OPUS-elrc_621_federal_constitution-v1-deu-eng
    - OPUS-elrc_630_bmvi_publications-v1-deu-eng
    - OPUS-elrc_631_bmvi_website-v1-deu-eng
    - OPUS-elrc_632_bmi_brochure_civil-v1-deu-eng
    - OPUS-elrc_633_bmi_brochures_2016-v1-deu-eng
    - OPUS-elrc_634_bmi_brochures_2011-v1-deu-eng
    - OPUS-elrc_637_sip-v1-deu-eng
    - OPUS-elrc_638_luxembourg.lu-v1-deu-eng
    - OPUS-elrc_642_federal_foreign_berl-v1-deu-eng
    - OPUS-elrc_774_presidency-v1-deu-eng
    - OPUS-elrc_775_by_presidency_counci-v1-deu-eng
    - OPUS-elrc_776_by_presidency_counci-v1-deu-eng
    - OPUS-elrc_832_charter_values_citiz-v1-deu-eng
    - OPUS-elrc_antibiotic-v1-deu-eng
    - OPUS-elrc_arquitectura_2007-v1-deu-eng
    - OPUS-elrc_artigos_visitportuga-v1-deu-eng
    - OPUS-elrc_cordis_news-v1-deu-eng
    - OPUS-elrc_cordis_results-v1-deu-eng
    - OPUS-elrc_ec_europa-v1-deu-eng
    - OPUS-elrc_emea-v1-deu-eng
    - OPUS-elrc_eu_publications-v1-deu-eng
    - OPUS-elrc_euipo_2017-v1-deu-eng
    - OPUS-elrc_euipo_law-v1-deu-eng
    - OPUS-elrc_euipo_list-v1-deu-eng
    - OPUS-elrc_eur_lex-v1-deu-eng
    - OPUS-elrc_europarl_covid-v1-deu-eng
    - OPUS-elrc_federal_foreign-v1-deu-eng_GB
    - OPUS-elrc_german_foreign-v1-deu-eng_GB
    - OPUS-elrc_information_portal-v1-deu-eng
    - OPUS-elrc_localidades_2007-v1-deu-eng
    - OPUS-elrc_museus_2007-v1-deu-eng
    - OPUS-elrc_parques_e-v1-deu-eng
    - OPUS-elrc_patrimnio_aores-v1-deu-eng
    - OPUS-elrc_praias_2007-v1-deu-eng
    - OPUS-elrc_presscorner_covid-v1-deu-eng
    - OPUS-elrc_swedish_labour-v1-deu-eng
    - OPUS-elrc_termitur-v1-deu-eng
    - OPUS-elrc_vaccination-v1-deu-eng
    - OPUS-elrc_wikipedia_health-v1-deu-eng
    - OPUS-emea-v3-deu-eng
    - OPUS-eubookshop-v2-deu-eng
    - OPUS-euconst-v1-deu-eng
    - OPUS-europat-v3-deu-eng
    - OPUS-globalvoices-v2018q4-deu-eng
    - OPUS-gnome-v1-deu-eng
    - OPUS-gnome-v1-deu_CH-eng
    - OPUS-jrc_acquis-v3.0-deu-eng
    - OPUS-kdedoc-v1-deu-eng_GB
    - OPUS-mpc1-v1-deu-eng
    - OPUS-multiccaligned-v1-deu-eng
    - OPUS-multiparacrawl-v7.1-deu-eng
    - OPUS-multiun-v1-deu-eng
    - OPUS-nllb-v1-deu-eng
    - OPUS-openoffice-v3-deu-eng_GB
    - OPUS-opensubtitles-v2024-deu-eng
    - OPUS-opus100_train-1-deu-eng
    - OPUS-php-v1-deu-eng
    - OPUS-qed-v2.0a-deu-eng
    - OPUS-rf-v1-deu-eng
    - OPUS-salome-v1-deu-eng
    - OPUS-stanfordnlp_nmt-v1.0-eng-deu
    - OPUS-tanzil-v1-deu-eng
    - OPUS-tatoeba-v20230412-deu-eng
    - OPUS-ted2013-v1.1-deu-eng
    - OPUS-ted2020-v1-deu-eng
    - OPUS-tildemodel-v2018-deu-eng
    - OPUS-wikimedia-v20230407-deu-eng
    - OPUS-wikipedia-v1.0-deu-eng
    - OPUS-xlent-v1.2-deu-eng
    - ParaCrawl-paracrawl-9-eng-deu
    - Statmt-commoncrawl_wmt13-1-deu-eng
    - Statmt-europarl-9-deu-eng
    - Statmt-europarl_wmt13-7-deu-eng
    - Statmt-news_commentary-18.1-deu-eng
    - Statmt-news_commentary_wmt18-13-deu-eng
    - Statmt-wiki_titles-2-deu-eng
    - Statmt-wikititles-3-deu-eng
    - Tilde-airbaltic-1-deu-eng
    - Tilde-czechtourism-1-deu-eng
    - Tilde-ecb-2017-deu-eng
    - Tilde-eesc-2017-deu-eng
    - Tilde-ema-2016-deu-eng
    - Tilde-rapid-2019-deu-eng
    - OPUS-kde4-v2-deu-eng
    - OPUS-openoffice-v2-deu-eng
    - OPUS-tldr_pages-v20251124-deu-eng
    - OPUS-ubuntu-v14.10-deu-eng
  mono_train:
    - Statmt-news_crawl-2023-deu
    - Statmt-europarl-10-deu
    - Statmt-news_commentary-18.1-deu
    - Statmt-commoncrawl-wmt22-deu
    - Leipzig-web-2013_10k-deu_BE
    - Leipzig-web-2002_1m-deu_CH
    - Leipzig-comweb-2021_1m-deu
    - Leipzig-web-2021_1m-deu_DE
    - Leipzig-web_public-2019_1m-deu_DE
    - Leipzig-euweb-2015_300k-deu
    - Leipzig-euweb-2017_1m-deu
    - Leipzig-mixed_typical-2011_1m-deu
    - Leipzig-news-2022_1m-deu
    - Leipzig-newscrawl-2020_1m-deu
    - Leipzig-newscrawl_public-2019_1m-deu
    - Leipzig-web-2002_1m-deu
    - Leipzig-web-2011_1m-deu
    - Leipzig-wikipedia-2021_1m-deu

########## ENG-IND (new) ##########
- id: wmt26-eng-ind
  langs: eng-ind
  train:
    - ELRC-hrw_dataset_v1-1-eng-ind
    - ELRC-wikipedia_health-1-eng-ind
    - Facebook-wikimatrix-1-eng-ind
    - Neulab-tedtalks_train-1-eng-ind
    - OPUS-bible_uedin-v1-eng-ind
    - OPUS-ccmatrix-v1-eng-ind
    - OPUS-elrc_2922-v1-eng-ind
    - OPUS-elrc_3049_wikipedia_health-v1-eng-ind
    - OPUS-elrc_wikipedia_health-v1-eng-ind
    - OPUS-globalvoices-v2018q4-eng-ind
    - OPUS-gnome-v1-eng-ind
    - OPUS-kde4-v2-eng-ind
    - OPUS-multiccaligned-v1-eng-ind
    - OPUS-nllb-v1-eng-ind
    - OPUS-opensubtitles-v2024-eng-ind
    - OPUS-opus100_train-1-eng-ind
    - OPUS-paracrawl_bonus-v9-eng-ind
    - OPUS-qed-v2.0a-eng-ind
    - OPUS-tanzil-v1-eng-ind
    - OPUS-tatoeba-v20230412-eng-ind
    - OPUS-ted2020-v1-eng-ind
    - OPUS-tico_19-v20201028-eng-ind
    - OPUS-ubuntu-v14.10-eng-ind
    - OPUS-wikimedia-v20230407-eng-ind
    - OPUS-xlent-v1.2-eng-ind
    - Statmt-ccaligned-1-eng-ind_ID
    - Statmt-news_commentary-18.1-eng-ind
    - OPUS-alt-v20191206-eng-ind
    - OPUS-tldr_pages-v20251124-eng-ind
  mono_train:
    - Statmt-news_crawl-2023-ind
    - Statmt-news_commentary-18.1-ind
    - Leipzig-comweb-2018_1m-ind
    - Leipzig-web-2015_1m-ind_IN
    - Leipzig-mixed-2013_1m-ind
    - Leipzig-mixed_tufs4-2012_1m-ind
    - Leipzig-news-2022_1m-ind
    - Leipzig-newscrawl-2016_1m-ind
    - Leipzig-newscrawl_tufs6-2012_3m-ind
    - Leipzig-web_tufs13-2012_3m-ind
    - Leipzig-wikipedia-2010_300k-ind
    - Leipzig-wikipedia-2021_1m-ind

########## ENG-KAZ (new) ##########
- id: wmt26-eng-kaz
  langs: eng-kaz
  train:
    - ELRC-kazakh_legal_mt_test_set-1-eng-kaz
    - Facebook-wikimatrix-1-eng-kaz
    - Neulab-tedtalks_train-1-eng-kaz
    - OPUS-elrc_5042_kazakh_legal_mt-v1-eng-kaz
    - OPUS-gnome-v1-eng-kaz
    - OPUS-hplt-v2-eng-kaz
    - OPUS-kde4-v2-eng-kaz
    - OPUS-multiccaligned-v1-eng-kaz
    - OPUS-multihplt-v2-eng-kaz
    - OPUS-nllb-v1-eng-kaz
    - OPUS-opensubtitles-v2024-eng-kaz
    - OPUS-opus100_train-1-eng-kaz
    - OPUS-qed-v2.0a-eng-kaz
    - OPUS-tatoeba-v20230412-eng-kaz
    - OPUS-ted2020-v1-eng-kaz
    - OPUS-ubuntu-v14.10-eng-kaz
    - OPUS-wikimedia-v20230407-eng-kaz
    - OPUS-xlent-v1.2-eng-kaz
    - Statmt-ccaligned-1-eng-kaz_KZ
    - Statmt-news_commentary-18.1-eng-kaz
    - Statmt-wiki_titles-1-kaz-eng
    - OPUS-translatewiki-v20250101-eng-kaz
  mono_train:
    - Statmt-news_crawl-2023-kaz
    - Statmt-news_commentary-18.1-kaz
    - Leipzig-news-2020_30k-kaz
    - Leipzig-newscrawl-2016_1m-kaz
    - Leipzig-wikipedia-2021_300k-kaz

########## ENG-LLD (new) ##########
- id: wmt26-eng-lld
  langs: eng-lld
  train:
    - OPUS-qed-v2.0a-eng-lld
    - OPUS-tatoeba-v20230412-eng-lld
    - OPUS-ubuntu-v14.10-eng-lld
    - OPUS-ubuntu-v14.10-eng_AU-lld
    - OPUS-ubuntu-v14.10-eng_CA-lld
    - OPUS-ubuntu-v14.10-eng_GB-lld
    - OPUS-wikimedia-v20230407-eng-lld
    - OPUS-translatewiki-v20250101-eng-lld
    - OPUS-translatewiki-v20250101-eng_CA-lld
  mono_train:
    - Sfrontull-la_usc_valbadia_loresmt24-1-lld
    - Sfrontull-south_tyrol_weather_lld-1-lld

########## ENG-LIJ_LATN (new) ##########
- id: wmt26-eng-lij_Latn
  langs: eng-lij_Latn
  train:
    - AllenAi-nllb-1-eng-lij_Latn
    - Conseggioligure-zenamt_eng_train-1-eng-lij_Latn
    - Openlanguagedata-oldi_seed-1-eng-lij_Latn
  mono_train:
    - Conseggioligure-linc-1-lij_Latn

########## ENG-SME (new) ##########
- id: wmt26-eng-sme
  langs: eng-sme
  train:
    - OPUS-kde4-v2-eng-sme
    - OPUS-kde4-v2-eng_GB-sme
    - OPUS-opensubtitles-v2024-eng-sme
    - OPUS-opus100_train-1-eng-sme
    - OPUS-tatoeba-v20230412-eng-sme
    - OPUS-translatewiki-v20250101-eng_CA-sme
    - OPUS-ubuntu-v14.10-eng-sme
    - OPUS-ubuntu-v14.10-eng_AU-sme
    - OPUS-ubuntu-v14.10-eng_CA-sme
    - OPUS-ubuntu-v14.10-eng_GB-sme
    - OPUS-ubuntu-v14.10-eng_NZ-sme
    - OPUS-ubuntu-v14.10-eng_US-sme
    - OPUS-wikimedia-v20230407-eng-sme
  mono_train:
    - Leipzig-news-2015_10k-sme_NO
    - Leipzig-web-2013_10k-sme_NO
    - Leipzig-wikipedia-2021_10k-sme

########## ENG-THA (new) ##########
- id: wmt26-eng-tha
  langs: eng-tha
  train:
    - ELRC-hrw_dataset_v1-1-eng-tha
    - ELRC-wikipedia_health-1-eng-tha
    - Neulab-tedtalks_train-1-eng-tha
    - OPUS-bible_uedin-v1-eng-tha
    - OPUS-elrc_2922-v1-eng-tha
    - OPUS-elrc_3048_wikipedia_health-v1-eng-tha
    - OPUS-elrc_wikipedia_health-v1-eng-tha
    - OPUS-gnome-v1-eng-tha
    - OPUS-hplt-v2-eng-tha
    - OPUS-kde4-v2-eng-tha
    - OPUS-multiccaligned-v1-eng-tha
    - OPUS-multihplt-v2-eng-tha
    - OPUS-opensubtitles-v2024-eng-tha
    - OPUS-opus100_train-1-eng-tha
    - OPUS-paracrawl_bonus-v9-eng-tha
    - OPUS-qed-v2.0a-eng-tha
    - OPUS-scb_mt_en_th-v1.0-eng-tha
    - OPUS-tanzil-v1-eng-tha
    - OPUS-tatoeba-v20230412-eng-tha
    - OPUS-ted2020-v1-eng-tha
    - OPUS-ubuntu-v14.10-eng-tha
    - OPUS-wikimedia-v20230407-eng-tha
    - OPUS-xlent-v1.2-eng-tha
    - Statmt-ccaligned-1-eng-tha_TH
    - OPUS-alt-v20191206-eng-tha
    - OPUS-tldr_pages-v20251124-eng-tha
  mono_train:
    - Leipzig-community-2021-tha
    - Leipzig-news-2020_30k-tha
    - Leipzig-newscrawl-2011_100k-tha
    - Leipzig-web-2018_1m-tha_TH
    - Leipzig-wikipedia-2021_10k-tha

Issues / Bugs

Please report issues at github.com/thammegowda/mtdata/issues and mention the relevant recipe ID.