After five years of various versions of Multimodal Translation Task at the Workshop on Asian Translation 2024 (WAT2024), WMT continues with a merger: “English-to-Lowres Multimodal Translation Task”. The task relies on our “{Hindi, Bengali, Malayalam, Hausa} Visual Genome” datasets, all of which provide text and images suitable for English-{Hindi, Bengali, Malayalam, Hausa} machine translation tasks and multimodal research.
Hindi, Bengali, and Malayalam are Indic medium-to-low-resource languages, Hausa is a low-resource African language.
Timeline
-
August 10, 2024: Translations need to be submitted to the organizers
-
August 30, 2024: System description paper submission deadline
-
September 27, 2024: Review feedback for system description
-
October 4, 2024: Camera-ready
-
November 15 or 16, 2024: Workshop takes place
Task Description
The setup of the task is as follows:
-
Inputs:
-
An image,
-
A rectangular region in that image,
-
A short English caption of the rectangular region.
-
-
Expected Output:
-
The caption of the rectangual region in {Hindi, Bengali, Malayalam, Hausa}.
-
Types of Submissions Expected
Participants are welcome to submit outputs for any subsets of the task languages (Hindi, Bengali, Malayalam, Hausa) for any subset of the following task modalities:
-
Text-only translation (Source image not used)
-
Image captioning (English source text not used)
-
Multi-modal translation (uses both the image and the text)
Participants must indicate to which track their translations belong:
-
Text-only / Image-only / Multi-modal
-
see above
-
-
Domain-Aware / Domain-Unaware
-
Whether or not the full (English) Visual Genome was used in training.
-
-
Bilingual / Multilingual
-
Whether or not your model is multilingual, translating from English into all of the desired languages, or whether you used individual pairwise models.
-
-
Constrained / Non-Constrained
-
The limitations for the constrained systems track are as follows:
-
Allowed pretrained LLMs:
-
Llama-2-7B, Llama-2-13B, Mistral-7B
-
You may adapt / finetune / whatever these models using the provided constrained data only.
-
-
Allowed multimodal LLMs
-
CLIP, DALL-E, Gemini, LLaVA
-
You may adapt / finetune / whatever these models using the provided constrained data only. Mention if you use any other Multimodal LLMs.
-
-
Allowed pretrained LMs:
-
mBART, BERT, RoBERTa, XLM-RoBERTa, sBERT, LaBSE (all in all publicly available model sizes)
-
-
You may ONLY use the training data allowed for this year (linked below).
-
You may use any publicly available automatic quality metric during your development
-
Any basic linguistics tools (taggers, parsers, morphology analyzers, etc.)
-
-
-
Non-constrained submissions may use other data or pretrained models but need to specify what was used.
Training Data
The {Hindi, Bengali, Malayalam, Hausa} Visual Genome consists of:
-
29k training examples
-
1k dev set
-
1.6k evaluation set
All the datasets use the same underlying set of images with a handful of differences due to sanity checks which were carried out in each of the languages independently.
Evaluation
WAT2024 Multi-Modal Task will be evaluated on:
-
1.6k evaluation set of {Hindi, Bengali, Malayalam, Hausa} Visual Genome
-
1.4k challenge set of {Hindi, Bengali, Malayalam, Hausa} Visual Genome
Means of evaluation:
-
Automatic metrics: BLEU, CHRF3, and others
-
Manual evaluation, subject to the availability of {Hindi, Bengali, Malayalam, Hausa} speakers
Registration
Please register your team by sending application following WAT application here
Download Links
Paper and References
-
Hindi Visual Genome: A Dataset for Multi-Modal English to Hindi Machine Translation
-
Bengali Visual Genome: A Multimodal Dataset for Machine Translation and Image Captioning
-
Hausa Visual Genome: A Dataset for Multi-Modal English-to-Hausa Machine Translation
-
IITP at WAT 2021: System description for English-Hindi Multimodal Translation Task
Organizers
-
Shantipriya Parida (Silo AI, Finland)
-
Ondřej Bojar (Charles University, Czech Republic)
-
Idris Abdulmumin (University of Pretoria, South Africa)
-
Shamsuddeen Hassan Muhammad (Imperial College London, UK)
-
Ibrahim Said Ahmad (Institute for Experimental AI, Northeastern University, USA)
Contact
License
The data is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.
Acknowledgment
The datasets in this shared task were supported by the grant 19-26934X (Neural Representations in Multi-modal and Multi-lingual Modelling) of the Grant Agency of the Czech Republic.