Announcements
-
March 22, 2023 - Website released!
- Results available in Evaluation Section.
Task Description
In the past few years, machine translation (MT) performance has been improved significantly. With the development of new techniques such as multilingual translation and transfer learning, the use of MT is no longer a privilege for users of popular languages. Consequently, there has been an increasing interest in the community to expand the coverage to more languages with different geographical presences, degrees of diffusion and digitalization. However, MT coverage for more users speaking diverse languages is limited because the MT methods demand vast amounts of parallel data to train quality systems, which has posed a significant obstacle for low-resource translation. Therefore, developing MT systems with relatively small parallel datasets is still highly desirable. In this shared task, four distinct low-resource Indic languages are considered that belongs to different language families, namely, Assamese (Indo-Aryan), Mizo (Sino-Tibetan), Khasi (Austroasiatic) and Manipuri (Sino-Tibetan). The main challenge here is how to efficiently utilize monolingual data or techniques such as multilingual, transfer learning, or language model to improve translation performance for English-to-Assamese/Mizo/Khasi/Manipuri and Assamese/Mizo/Khasi/Manipuri-to-English. The evaluation will be carried out using automatic evaluation metrics (BLEU, TER, RIBES, COMET, ChrF) and human evaluation.
Language Pairs
We focus on the following language pairs (both direction for each):- en-as: English ⇔ Assamese
- en-lus: English ⇔ Mizo
- en-kha: English ⇔ Khasi
- en-mni: English ⇔ Manipuri
There will be four subtasks:
- Subtask-1: English ⇔ Assamese (English-to-Assamese and Assamese-to-English Machine Translation)
- Subtask-2: English ⇔ Mizo (English-to-Mizo and Mizo-to-English Machine Translation)
- Subtask-3: English ⇔ Khasi (English-to-Khasi and Khasi-to-English Machine Translation)
- Subtask-4: English ⇔ Manipuri (English-to-Manipuri and Manipuri-to-English Machine Translation)
Parallel data
No additional parallel data is allowed for training. Constrained submissions only.Monolingual data
You are encouraged to develop novel solutions to utilize monolingual corpora to improve translation quality.Important Dates
Release of training/dev data | 25 May, 2023 (Please register) (Registration is closed.) |
Test data release | 13 July, 2023 (Please register) |
Run Submission deadline (Please upload a brief/abstract (mandatory) of your system description) | 28 July, 2023 (EXTENDED) |
System description/workshop paper submission deadline | 5 Sept, 2023 |
Notification of Acceptance | 6 Oct, 2023 |
Camera-ready | 18 Oct, 2023 |
Workshop Dates | December,6-7, 2023 |
Data
Data is available for download.
Citation
If you are using this data, please cite:
bib | Findings of the WMT 2023 Shared Task on Low-Resource Indic Language Translation Santanu Pal, Partha Pakray, Sahinur Rahman Laskar, Lenin Laitonjam, Vanlalmuansangi Khenglawt, Sunita Warjri, Pankaj Kundan Dadure and Sandeep Kumar Dash |
pp. 680‑692 |
Test Set Submission
The test data is available at the same repository as the training data and it can be accessed using the same password sent via e-mail. You are allowed to submit 1 CONSTRAINT, 1 PRIMARY and up to 2 CONTRASTIVE systems for each language pair/translation direction.
You should submit your results by TBA, 2023 (anywhere in the world)
Evaluation
The evaluation was carried out automatically using BLEU (Papieni et al., 2002) and TER (Snover et al., 2006), RIBES (Isozaki et al., 2010), COMET, and ChrF.
PRIMARY
English-To-Assamese (PRIMARY)
Assamese-To-English (PRIMARY)
English-To-Mizo (PRIMARY)
Mizo-To-English (PRIMARY)
English-To-Khasi (PRIMARY)
Khasi-To-English (PRIMARY)
English-To-Manipuri (PRIMARY)
Manipuri-To-English (PRIMARY)
CONTRASTIVE
English-To-Assamese (CONTRASTIVE)
Assamese-To-English (CONTRASTIVE)
English-To-Mizo (CONTRASTIVE)
Mizo-To-English (CONTRASTIVE)
English-To-Khasi (CONTRASTIVE)
Khasi-To-English (CONTRASTIVE)
English-To-Manipuri (CONTRASTIVE)
Manipuri-To-English (CONTRASTIVE)
Contact
Paper Submission
Your system paper submission should be prepared according to the WMT instructions and uploaded to START before TBA, 2023.
Organizers
- Santanu Pal, Wipro AI Lab, London, UK
- Partha Pakray, National Institute of Technology, Silchar, India
- Sahinur Rahman Laskar, University of Petroleum and Energy Studies, Dehradun, India
- Sandeep Kumar Dash, National Institute of Technology, Mizoram, India
- Lenin Laitonjam, National Institute of Technology, Mizoram, India
- Vanlalmuansangi Khenglawt, Mizoram University, India
- Sunita Warji, Gandhi Institute of Technology and Management, India
- Pankaj Kundan Dadure, University of Petroleum and Energy Studies, Dehradun, India