English-to-Lowres Multi-Modal Translation Task

EMNLP 2024

NINTH CONFERENCE ON
MACHINE TRANSLATION (WMT24)

November 15-16, 2024
Miami, Florida, USA

[HOME] [PROGRAM] [PAPERS] [AUTHORS]
TRANSLATION TASKS: [GENERAL MT (NEWS)] [LOW-RESOURCE LANGUAGES OF SPAIN] [INDIC MT] [CHAT TASK] [BIOMEDICAL] [MULTIINDIC22MT TASK] [ENGLISH-TO-LOWRES MULTIMODAL MT TASK] [NON-REPETITIVE] [PATENT] [LITERARY]
EVALUATION TASKS: [METRICS TASK] [MT TEST SUITES] [QUALITY ESTIMATION]
OTHER TASKS: [OPEN LANGUAGE DATA INITIATIVE]

After five years of various versions of Multimodal Translation Task at the Workshop on Asian Translation 2024 (WAT2024), WMT continues with a merger: “English-to-Lowres Multimodal Translation Task”. The task relies on our “{Hindi, Bengali, Malayalam, Hausa} Visual Genome” datasets, all of which provide text and images suitable for English-{Hindi, Bengali, Malayalam, Hausa} machine translation tasks and multimodal research.

Hindi, Bengali, and Malayalam are Indic medium-to-low-resource languages, Hausa is a low-resource African language.

Timeline

August 10, 2024: Translations need to be submitted to the organizers
August 30, 2024: System description paper submission deadline
September 27, 2024: Review feedback for system description
October 4, 2024: Camera-ready
November 15 or 16, 2024: Workshop takes place

Task Description

The setup of the task is as follows:

Inputs:
- An image,
- A rectangular region in that image,
- A short English caption of the rectangular region.
Expected Output:
- The caption of the rectangual region in {Hindi, Bengali, Malayalam, Hausa}.

Types of Submissions Expected

Participants are welcome to submit outputs for any subsets of the task languages (Hindi, Bengali, Malayalam, Hausa) for any subset of the following task modalities:

Text-only translation (Source image not used)
Image captioning (English source text not used)
Multi-modal translation (uses both the image and the text)

Participants must indicate to which track their translations belong:

Text-only / Image-only / Multi-modal
- see above
Domain-Aware / Domain-Unaware
- Whether or not the full (English) Visual Genome was used in training.
Bilingual / Multilingual
- Whether or not your model is multilingual, translating from English into all of the desired languages, or whether you used individual pairwise models.
Constrained / Non-Constrained
- The limitations for the constrained systems track are as follows:
  - Allowed pretrained LLMs:
    
    Llama-2-7B, Llama-2-13B, Mistral-7B
    
    You may adapt / finetune / whatever these models using the provided constrained data only.
  - Allowed multimodal LLMs
    
    CLIP, DALL-E, Gemini, LLaVA
    
    You may adapt / finetune / whatever these models using the provided constrained data only. Mention if you use any other Multimodal LLMs.
  - Allowed pretrained LMs:
    
    mBART, BERT, RoBERTa, XLM-RoBERTa, sBERT, LaBSE (all in all publicly available model sizes)
  - You may ONLY use the training data allowed for this year (linked below).
  - You may use any publicly available automatic quality metric during your development
  - Any basic linguistics tools (taggers, parsers, morphology analyzers, etc.)
Non-constrained submissions may use other data or pretrained models but need to specify what was used.

Training Data

The {Hindi, Bengali, Malayalam, Hausa} Visual Genome consists of:

29k training examples
1k dev set
1.6k evaluation set

All the datasets use the same underlying set of images with a handful of differences due to sanity checks which were carried out in each of the languages independently.

Evaluation

WAT2024 Multi-Modal Task will be evaluated on:

1.6k evaluation set of {Hindi, Bengali, Malayalam, Hausa} Visual Genome
1.4k challenge set of {Hindi, Bengali, Malayalam, Hausa} Visual Genome

Means of evaluation:

Automatic metrics: BLEU, CHRF3, and others
Manual evaluation, subject to the availability of {Hindi, Bengali, Malayalam, Hausa} speakers

Registration

Please register your team by sending application following WAT application here

Download Links

Paper and References

Organizers

Shantipriya Parida (Silo AI, Finland)
Ondřej Bojar (Charles University, Czech Republic)
Idris Abdulmumin (University of Pretoria, South Africa)
Shamsuddeen Hassan Muhammad (Imperial College London, UK)
Ibrahim Said Ahmad (Institute for Experimental AI, Northeastern University, USA)

Contact

email: wat-multimodal-task@ufal.mff.cuni.cz

License

The data is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.

Acknowledgment

The datasets in this shared task were supported by the grant 19-26934X (Neural Representations in Multi-modal and Multi-lingual Modelling) of the Grant Agency of the Czech Republic.

NINTH CONFERENCE ON MACHINE TRANSLATION (WMT24)