Low-Resource Arabic-Asian Language Translation

EMNLP 2026

ELEVENTH CONFERENCE ON
MACHINE TRANSLATION (WMT26)

28-29 October, 2026
Budapest, Hungary HOME

TRANSLATION TASKS:	GENERAL MT •︎ INDIC MT •︎ ARABIC-ASIAN MT •︎ CHINESE-SOUTHEAST ASIAN MT •︎ TERMINOLOGY •︎ MODEL COMPRESSION •︎ CREOLE MT •︎ VIDEO SUBTITLE TRANSLATION
EVALUATION TASKS:	MT TEST SUITES •︎︎ AUTOMATED MT EVALUATION
OTHER TASKS:	OPEN DATA •︎ MULTILINGUAL INSTRUCTION •︎ LIMITED RESOURCES LLM

ANNOUNCEMENTS

Please register your team for the participation in shared task: [REGISTRATION CLOSED!]

April 8, 2026 - Website released, and the task is announced!
June 20, 2026 - Evaluation Result Declared!

We are still updating the page. Please keep your eye on it!

OVERVIEW AND TASK DESCRIPTION

Arabic (عربي) is one of the most widely spoken languages in the world, serving as a primary means of communication across multiple countries and playing a crucial role in global domains such as education, commerce, culture, and international relations. Developing robust and accurate Arabic machine translation (MT) systems is therefore essential for enabling cross-lingual communication, improving information accessibility, and fostering digital inclusion for millions of speakers.

In recent years, machine translation has achieved significant progress due to advances in multilingual modeling and transfer learning. These innovations have expanded MT capabilities beyond high-resource languages, encouraging broader coverage of linguistically and geographically diverse languages. However, many language pairs involving Arabic still suffer from limited availability of parallel corpora, which remains a key bottleneck in building high-quality translation systems. This challenge is particularly evident in low-resource scenarios, where the lack of sufficient training data restricts the performance of conventional MT approaches. Consequently, developing efficient MT systems that can operate effectively with relatively small datasets is of great importance.

To address this need, this shared task focuses on Arabic-centric low-resource translation, covering both forward and backward translation directions between Arabic and selected Asian languages, namely, Bangla, Hindi, Indonesian, Urdu and English.

Low-Resource Arabic-Asian language translation task features two categories of tasks:

Task 1: Translation into Arabic

Sub-Task 1A: English → Arabic
Sub-Task 1B: Hindi → Arabic
Sub-Task 1C: Bangla → Arabic
Sub-Task 1D: Indonesian → Arabic
Sub-Task 1E: Urdu → Arabic

Task 2: Translation from Arabic

Sub-Task 2A: Arabic → English
Sub-Task 2B: Arabic → Hindi
Sub-Task 2C: Arabic → Bangla
Sub-Task 2D: Arabic → Indonesian
Sub-Task 2E: Arabic → Urdu

GOAL

The primary objective of this task is to develop machine translation (MT) systems capable of delivering high-quality translations under limited data conditions. Participants are encouraged to explore and experiment with the following approaches:

Transfer Learning: Adapting knowledge from models trained on high-resource language pairs to low-resource language pair translation .
Multilingual Approaches: Examining the impact of cross-lingual transfer on improving performance for low-resource language pairs.
Monolingual Data Utilization: Effectively leveraging monolingual corpora to enhance translation quality using limited low-resource parallel data.
Innovative Techniques: Designing and applying novel methods specifically suited for low-resource translation scenarios.

IMPORTANT DATES

April 8, 2026

Website released, and the task is announced!

April 8, 2026

Team Registration Open

May 20, 2026

Team Registration (Closed)

May 16, 2026

Training, Development/validation, DevTest data release (only registered participants)

June 1, 2026

Beginning of the evaluation cycle (Challenge Test Set) release and run submission)

June 10, 2026

End of the evaluation cycle (Extended)

June 20, 2026

Result Declaration to individual team

in-line with WMT26

System Paper Submission

November, 2026

Under EMNLP Conference

DATA

Releasing Soon: [DOWNLOAD 2026 TASK DATA]

TEST DATA (Challenge Test Set)

Released to registered participants

System Submission Guidelines

📝Test Set Output Submission Instructions

For each language pair (e.g., English → Arabic), you may submit up to:

Primary system
- A constrained system (uses only official data), or
- A system that uses additional monolingual resources, pretrained, etc., depending on your setup (should be publicly available).
Contrastive systems (Optional — may use external or additional data beyond the task set.) (Up to 2 submission allowed)

🚨 Maximum: 3 submissions per language pair.

📄Submission File Naming Convention

Each submission file must be named using this format:

<TEAM_NAME>_<SUBMISSION_TYPE>_<SUB-Task No.>.txt

Where:

TEAM_NAME — Your registered team name (e.g., BAHASH-AI)
SUBMISSION_TYPE — One of: primary or contrastive
Sub-Task No. — Use format like Sub-Task-1A for English → Arabic

🔍 Example For a primary system submission from team BAHASH-AI for English → Arabic:

BAHASH-AI_primary_Sub-Task-1A.txt

Repeat this pattern for all your submissions.

📄System Description File (Mandatory!)

You must include a brief abstract system description in one of the following formats:

.pdf
.doc/.docx

This file is mandatory. Submissions without it will not be published in the final results.

📦 Packaging Your Submission

Place the following files in a single .zip archive:

All your system output files
The abstract system description file

Name the zip file using your team name:

<TEAM_NAME>.zip

📌 Example For team BAHASH-AI, name the file:

BAHASH-AI.zip

📄 Email Submission

Send your .zip file to the following address:

Email: : asianmt.wmt@gmail.com

Subject Line:

<TEAM_NAME>: Submission File for Shared Task: Low-Resource Arabic-Asian Language Translation

Example Subject Line:

BAHASH-AI: Submission File for Shared Task: Low-Resource Arabic-Asian Language Translation

Thank you for participating in the shared task! We look forward to your submissions and system descriptions.

EVALUATION

The evaluation was carried out using automatic evaluation metrics (BLEU, ChrF, TER, COMET, and BERT/mBERT) for a comprehensive assessment of translation quality.

Task 1: PRIMARY

Sub-Task 1A: English → Arabic Result
Sub-Task 1B: Hindi → Arabic Result
Sub-Task 1C: Bangla → Arabic Result
Sub-Task 1D: Indonesian → Arabic Result
Sub-Task 1E: Urdu → Arabic Result

Task 2: PRIMARY

Sub-Task 2A: Arabic → English Result
Sub-Task 2B: Arabic → Hindi Result
Sub-Task 2C: Arabic → Bangla Result
Sub-Task 2D: Arabic → Indonesian Result
Sub-Task 2E: Arabic → Urdu Result

Task 1: Contrastive

Sub-Task 1A: English → Arabic Result
Sub-Task 1B: Hindi → Arabic Result
Sub-Task 1C: Bangla → Arabic Result
Sub-Task 1D: Indonesian → Arabic Result
Sub-Task 1E: Urdu → Arabic Result

Task 2: Contrastive

Sub-Task 2A: Arabic → English Result
Sub-Task 2B: Arabic → Hindi Result
Sub-Task 2C: Arabic → Bangla Result
Sub-Task 2D: Arabic → Indonesian Result
Sub-Task 2E: Arabic → Urdu Result

CONTACT

asianmt.wmt@gmail.com

PAPER SUBMISSION