EMNLP 2024

NINTH CONFERENCE ON
MACHINE TRANSLATION (WMT24)

November 12-13, 2024
Miami, Florida, USA
 
[HOME]
TRANSLATION TASKS: [GENERAL MT (NEWS)] [LOW-RESOURCE LANGUAGES OF SPAIN] [INDIC MT] [CHAT TASK] [BIOMEDICAL]
EVALUATION TASKS: [METRICS TASK] [MT TEST SUITES] [QUALITY ESTIMATION]
OTHER TASKS: [OPEN LANGUAGE DATA INITIATIVE]

ANNOUNCEMENTS

Feb 25, 2024 - Website released, and the task is announced!

Please register your team for the participation REGISTER NOW!
We are still updating the page. Please keep your eye on it!

OVERVIEW AND TASK DESCRIPTION

Building upon the resounding success of “Shared Task: Low-Resource Indic Language Translation” in WMT 2023, which saw enthusiastic participation from around the world, we are excited to announce the “Shared Task: Low-Resource Indic Language Translation” in WMT 2024. Recent advances in machine translation (MT) have significantly improved performance. Techniques such as multilingual translation and transfer learning are expanding MT’s reach beyond well-resourced languages. Yet, extending coverage to diverse, low-resource languages remains a challenge due to the limited availability of parallel data for training robust systems. The WMT 2024 Indic Machine Translation Shared Task tackles this challenge by focusing on low-resource Indic languages from diverse language families. The focus will be on languages like Assamese (an Indo-Aryan language spoken mainly in the north-eastern Indian state of Assam), Mizo (a Sino-Tibetan language spoken primarily in the Mizoram state of India), Khasi (an Austroasiatic language spoken in Meghalaya, India), Manipuri (also known as Meiteilon, a Sino-Tibetan language and the official language of Manipur, India), and Nyishi (a Sino-Tibetan language of Arunachal Pradesh, India).

This year’s task features two categories:

Category 1: (Moderate Training Data Available)

  • en-as: English ⇔ Assamese

  • en-lus: English ⇔ Mizo

  • en-kha: English ⇔ Khasi

  • en-mni: English ⇔ Manipuri

  • en-nshi: English ⇔ Nyishi

Category 2: (Very Limited Training Data)

  • en-bodo: English ⇔ Bodo

  • en-mrp: English ⇔ Mising

  • en-trp: English ⇔ Kokborok

The specific language pairs for Category 2 will be determined based on training data availability and will be announced soon.

GOAL

The central objective is to develop MT systems that produce high-quality translations despite the constraints of data availability. Participants are encouraged to explore:

  • Monolingual Data Utilization: Leveraging monolingual data effectively for improved translation.

  • Multilingual Approaches: Investigating whether cross-lingual transfer benefits low-resource pairs.

  • Transfer Learning: Adapting models trained on richer language pairs to the target languages.

  • Innovative Techniques: Experimenting with novel methods specifically tailored for low-resource settings.

DEADLINES

Release of training/dev data

25 May, 2024

Test data release

13 July, 2024

Run Submission deadline

NOTE: Please upload a brief/abstract (mandatory) of your system description

28 July, 2024

System description/workshop paper submission deadline

TBA, 2024 (follow EMNLP/WMT page)

Notification of Acceptance

TBA, 2024 (follow EMNLP/WMT page)

Camera-ready

TBA, 2024 (follow EMNLP/WMT page)

Workshop Dates

follow EMNLP/WMT main page

All deadlines are in AoE (Anywhere on Earth). Dates are specified with respect to EMNLP 2024.

DATA

  • Assamese, Khasi, Mizo, Manipuri for WMT 2023: DOWNLOAD.

  • Nyshi: [DOWNLOAD LINK WILL BE ENABLE SOON]

CITATIONS

If you are using this data, please cite:

TEST SET OUTPUT SUBMISSION

The test data will be available at the same repository as the training data and it can be accessed using the same password sent via e-mail. You are allowed to submit 1 CONSTRAINT, 1 PRIMARY and up to 2 CONTRASTIVE systems for each language pair/translation direction.

You should submit your results by TBA, 2024 (anywhere in the world)

EVALUATION

Systems will undergo both automatic evaluation (using BLEU, TER, RIBES, COMET, ChrF) and human evaluation by native speakers for a comprehensive assessment of translation quality.

CONTACT

PAPER SUBMISSION

Your system paper submission should be prepared according to the WMT instructions and uploaded to START before TBA, 2024 (WMT MAIN PAGE).

ORGANIZERS

  • Santanu Pal, Wipro AI Lab, London, UK

  • Partha Pakray, National Institute of Technology, Silchar, India

  • Sandeep Kumar Dash, National Institute of Technology, Mizoram, India

  • Lenin Laitonjam, National Institute of Technology, Mizoram, India

  • Pankaj Kundan Dadure, University of Petroleum and Energy Studies, Dehradun, India

  • Arnab Maji, North-Eastern Hill University, India

  • Lyngdoh Sarah, North-Eastern Hill University, India

  • Anupam Jamatia, National Institute of Technology Agartala, India

  • Koj Sambyo, National Institute of Technology Arunachal Pradesh, India

  • Riyanka Manna, Amrita Vishwa Vidyapeetham, Amaravati Campus, Andhra Pradesh, India

TECHNICAL MEMBERS

  • Pankaj Dadure, University of Petroleum and Energy Studies, India

  • Advaitha Vetagiri, National Institute of Technology, Silchar, India

  • Shyambabu Pandey, National Institute of Technology, Silchar, India

  • Annepaka Yadagiri, National Institute of Technology, Silchar, India