Shared Task: Low-Resource Indic Language Translation

Announcements

  • March 22, 2023 - Website released!

  • Results available in Evaluation Section.

Task Description

In the past few years, machine translation (MT) performance has been improved significantly. With the development of new techniques such as multilingual translation and transfer learning, the use of MT is no longer a privilege for users of popular languages. Consequently, there has been an increasing interest in the community to expand the coverage to more languages with different geographical presences, degrees of diffusion and digitalization. However, MT coverage for more users speaking diverse languages is limited because the MT methods demand vast amounts of parallel data to train quality systems, which has posed a significant obstacle for low-resource translation. Therefore, developing MT systems with relatively small parallel datasets is still highly desirable. In this shared task, four distinct low-resource Indic languages are considered that belongs to different language families, namely, Assamese (Indo-Aryan), Mizo (Sino-Tibetan), Khasi (Austroasiatic) and Manipuri (Sino-Tibetan). The main challenge here is how to efficiently utilize monolingual data or techniques such as multilingual, transfer learning, or language model to improve translation performance for English-to-Assamese/Mizo/Khasi/Manipuri and Assamese/Mizo/Khasi/Manipuri-to-English. The evaluation will be carried out using automatic evaluation metrics (BLEU, TER, RIBES, COMET, ChrF) and human evaluation.

Language Pairs

We focus on the following language pairs (both direction for each):
  • en-as: English ⇔ Assamese
  • en-lus: English ⇔ Mizo
  • en-kha: English ⇔ Khasi
  • en-mni: English ⇔ Manipuri

There will be four subtasks:

  • Subtask-1: English ⇔ Assamese (English-to-Assamese and Assamese-to-English Machine Translation)
  • Subtask-2: English ⇔ Mizo (English-to-Mizo and Mizo-to-English Machine Translation)
  • Subtask-3: English ⇔ Khasi (English-to-Khasi and Khasi-to-English Machine Translation)
  • Subtask-4: English ⇔ Manipuri (English-to-Manipuri and Manipuri-to-English Machine Translation)

Parallel data

No additional parallel data is allowed for training. Constrained submissions only.

Monolingual data

You are encouraged to develop novel solutions to utilize monolingual corpora to improve translation quality.

Important Dates

Release of training/dev data 25 May, 2023 (Please register) (Registration is closed.)
Test data release 13 July, 2023 (Please register)
Run Submission deadline (Please upload a brief/abstract (mandatory) of your system description) 28 July, 2023 (EXTENDED)
System description/workshop paper submission deadline 5 Sept, 2023
Notification of Acceptance 6 Oct, 2023
Camera-ready 18 Oct, 2023
Workshop Dates December,6-7, 2023

Data

Data is available for download.

Citation

If you are using this data, please cite:

pdf bib Findings of the WMT 2023 Shared Task on Low-Resource Indic Language Translation
Santanu Pal, Partha Pakray, Sahinur Rahman Laskar, Lenin Laitonjam, Vanlalmuansangi Khenglawt, Sunita Warjri, Pankaj Kundan Dadure and Sandeep Kumar Dash
pp. 680‑692

Test Set Submission

The test data is available at the same repository as the training data and it can be accessed using the same password sent via e-mail. You are allowed to submit 1 CONSTRAINT, 1 PRIMARY and up to 2 CONTRASTIVE systems for each language pair/translation direction.

You should submit your results by TBA, 2023 (anywhere in the world)

Evaluation

Contact

lrilt.wmt23@gmail.com

Paper Submission

Your system paper submission should be prepared according to the WMT instructions and uploaded to START before TBA, 2023.

Organizers

  • Santanu Pal, Wipro AI Lab, London, UK
  • Partha Pakray, National Institute of Technology, Silchar, India
  • Sahinur Rahman Laskar, University of Petroleum and Energy Studies, Dehradun, India
  • Sandeep Kumar Dash, National Institute of Technology, Mizoram, India
  • Lenin Laitonjam, National Institute of Technology, Mizoram, India
  • Vanlalmuansangi Khenglawt, Mizoram University, India
  • Sunita Warji, Gandhi Institute of Technology and Management, India
  • Pankaj Kundan Dadure, University of Petroleum and Energy Studies, Dehradun, India

Acknowledgements