ANNOUNCEMENTS
March 18, 2025 - Website released, and the task is announced! |
Mach 18, 2025 - Team Registration Open
April 20, 2025 - Team Registration Close
Please register your team for the participation REGISTER NOW! |
We are still updating the page. Please keep your eye on it! |
OVERVIEW AND TASK DESCRIPTION
Building upon the resounding success of “Shared Task: Low-Resource Indic Language Translation” in WMT 2023 and WMT24, which saw enthusiastic participation from around the world, we are excited to announce the “Shared Task: Low-Resource Indic Language Translation” in WMT 2025. Recent advances in machine translation (MT) have significantly improved performance. Techniques such as multilingual translation and transfer learning are expanding MT’s reach beyond well-resourced languages. Yet, extending coverage to diverse, low-resource languages remains a challenge due to the limited availability of parallel data for training robust systems. The WMT 2024 Indic Machine Translation Shared Task tackles this challenge by focusing on low-resource Indic languages from diverse language families. The focus will be on languages like Assamese (an Indo-Aryan language spoken mainly in the north-eastern Indian state of Assam), Mizo (a Sino-Tibetan language spoken primarily in the Mizoram state of India), Khasi (an Austroasiatic language spoken in Meghalaya, India), Manipuri (also known as Meiteilon, a Sino-Tibetan language and the official language of Manipur, India), Nyishi (a Sino-Tibetan language of Arunachal Pradesh, India), and kokborok language (Tibeto-Burman language spoken primarily by the Tripuri people).
Building upon the resounding success of “Shared Task: Low-Resource Indic Language Translation” in WMT 2023 and WMT 2024, which saw enthusiastic participation from around the world, we are excited to announce the “Shared Task: Low-Resource Indic Language Translation” in WMT 2025. Recent advances in machine translation (MT) have significantly improved performance. Techniques such as multilingual translation and transfer learning are expanding MT’s reach beyond well-resourced languages. Yet, extending coverage to diverse, low-resource languages remains a challenge due to the limited availability of parallel data for training robust systems. The WMT 2025 Indic Machine Translation Shared Task tackles this challenge by focusing on low-resource Indic languages from diverse language families.
The focus will be on North Eastern languages like Assamese (State: Assam), Bodo (State: Assam), Mizo (State: Mizoram), Khasi (State: Meghalaya), Manipuri (State: Manipur), Kokborok (State: Tripura) and Nyishi (State: Arunachal Pradesh).
This year’s task features two categories:
Category 1: (Moderate Training Data Available)
-
en-as: English ⇔ Assamese
-
en-lus: English ⇔ Mizo
-
en-kha: English ⇔ Khasi
-
en-mni: English ⇔ Manipuri
-
en-njz: English ⇔ Nyishi
Category 2: (Very Limited Training Data)
-
en-bodo: English ⇔ Bodo
-
en-trp: English ⇔ Kokborok
GOAL
The central objective is to develop MT systems that produce high-quality translations despite the constraints of data availability. Participants are encouraged to explore:
-
Monolingual Data Utilization: Leveraging monolingual data effectively for improved translation.
-
Multilingual Approaches: Investigating whether cross-lingual transfer benefits low-resource pairs.
-
Transfer Learning: Adapting models trained on richer language pairs to the target languages.
-
Innovative Techniques: Experimenting with novel methods specifically tailored for low-resource settings.
DEADLINES
March 18, 2025 |
Website released, and the task is announced! |
Mach 20, 2025 |
Team Registration Open |
April 20, 2025 |
Team Registration Close |
April 25, 2025 |
Training Data Release only registered participants |
May 25, 2025 |
Test Data Release only registered participants |
June 01, 2025 |
Run Submission deadline (AoE)! Please dont forget to send a brief system description. |
June 25, 2025 |
Result Declaration to individual team |
TBD |
System Paper Submission |
November 5-9, 2025 |
Under EMNLP Conference |
All deadlines are in AoE (Anywhere on Earth). Dates are specified with respect to EMNLP 2025. |
DATA
-
Assamese, Khasi, Mizo, Manipuri [DOWNLOAD LINK WILL BE ENABLE SOON]
-
Nyshi, Bodo, [DOWNLOAD LINK WILL BE ENABLE SOON]
CITATIONS
If you are using this data, please cite:
-
Santanu Pal, Partha Pakray, Sahinur Rahman Laskar, Lenin Laitonjam, Vanlalmuansangi Khenglawt, Sunita Warjri, Pankaj Kundan Dadure and Sandeep Kumar Dash, Findings of the WMT 2023 Shared Task on Low-Resource Indic Language Translation In Proceedings of the Eighth Conference on Machine Translation (WMT), pages 682–694. [2023]
-
Nabam Kakum, Sahinur Rahman Laskar, Koj Sambyo, Partha Pakray, Neural machine translation for limited resources English-Nyishi pair, Sādhanā, Springer [2023].
-
Partha Pakray, Santanu Pal, Advaitha Vetagiri, Reddi Krishna, Arnab Kumar Maji, Sandeep Dash, Lenin Laitonjam, Lyngdoh Sarah, and Riyanka Manna. [Findings of wmt 2024 shared task on low-resource indic languages translation], In Proceedings of the Ninth Conference on Machine Translation, pp. 654-668. 2024, link: aclanthology.org/2024.wmt-1.54.pdf.
Submission Process:
We will notify soon.
EVALUATION
Systems will undergo both automatic evaluation (using BLEU, TER, RIBES, COMET, ChrF) and human evaluation by native speakers for a comprehensive assessment of translation quality.
CONTACT
PAPER SUBMISSION
Your system paper submission should be prepared according to the WMT instructions and uploaded to START before TBA, 2025 (WMT MAIN PAGE).
ORGANIZERS
-
Santanu Pal, Wipro AI Lab, Kolkata, India/ London, UK
-
Partha Pakray, National Institute of Technology, Silchar, India
-
Sandeep Kumar Dash, National Institute of Technology, Mizoram, India
-
Lenin Laitonjam, National Institute of Technology, Mizoram, India
-
Arnab Maji, North-Eastern Hill University, India
-
Saralin A Lyngdoh, North-Eastern Hill University, India
-
Riyanka Manna, Amrita Vishwa Vidyapeetham, Andhra Pradesh, India
-
Ajit Das, Bodoland University, India
-
Anupam Jamatia, National Institute of Technology, Agartala, India
-
Koj Sambyo, National Institute of Technology, Arunachal Pradesh, India
STUDENT COORDINATORS
-
Advaitha Vetagiri, National Institute of Technology, Silchar, India
-
Shyambabu Pandey, National Institute of Technology, Silchar, India
-
Kshetrimayum Boynao Singh, National Institute of Technology, Silchar, India
-
Annepaka Yadagiri, National Institute of Technology, Silchar, India
-
Reddi Mohana Krishna, National Institute of Technology, Silchar, India
LANGUAGE RESOURCE CONTRIBUTORS
-
Raju Narzary, Bodoland University, Assam, India
-
Baneswar Baro, Bodoland University, Assam, India
-
Shwdwmshri Basumatary, Bodoland University, Assam, India
-
Jekolin Machahary, Bodoland University, Assam, India
-
Khusbu Basumatary, Bodoland University, Assam, India
-
Boney Moshahary, Bodoland University, Assam, India
-
Dwima Basumatary, Bodoland University, Assam, India
-
Jewel Basumatary, Bodoland University, Assam, India
-
Eusebius Lawai Lyngdoh, North-Eastern Hill University, Meghalaya, India
-
Ibadonbok Syiemlieh, North-Eastern Hill University, Meghalaya, India
-
Many more…will add soon