EMNLP 2024

NINTH CONFERENCE ON
MACHINE TRANSLATION (WMT24)

November 15-16, 2024
Miami, Florida, USA
 
[HOME] [PROGRAM] [PAPERS] [AUTHORS]
TRANSLATION TASKS: [GENERAL MT (NEWS)] [LOW-RESOURCE LANGUAGES OF SPAIN] [INDIC MT] [CHAT TASK] [BIOMEDICAL] [MULTIINDIC22MT TASK] [ENGLISH-TO-LOWRES MULTIMODAL MT TASK] [NON-REPETITIVE] [PATENT] [LITERARY]
EVALUATION TASKS: [METRICS TASK] [MT TEST SUITES] [QUALITY ESTIMATION]
OTHER TASKS: [OPEN LANGUAGE DATA INITIATIVE]

After five years of various versions of Multimodal Translation Task at the Workshop on Asian Translation 2024 (WAT2024), WMT continues with a merger: “English-to-Lowres Multimodal Translation Task”. The task relies on our “{Hindi, Bengali, Malayalam, Hausa} Visual Genome” datasets, all of which provide text and images suitable for English-{Hindi, Bengali, Malayalam, Hausa} machine translation tasks and multimodal research.

Hindi, Bengali, and Malayalam are Indic medium-to-low-resource languages, Hausa is a low-resource African language.

Timeline

  • August 10, 2024: Translations need to be submitted to the organizers

  • August 30, 2024: System description paper submission deadline

  • September 27, 2024: Review feedback for system description

  • October 4, 2024: Camera-ready

  • November 15 or 16, 2024: Workshop takes place

Task Description

The setup of the task is as follows:

  • Inputs:

    • An image,

    • A rectangular region in that image,

    • A short English caption of the rectangular region.

  • Expected Output:

    • The caption of the rectangual region in {Hindi, Bengali, Malayalam, Hausa}.

Types of Submissions Expected

Participants are welcome to submit outputs for any subsets of the task languages (Hindi, Bengali, Malayalam, Hausa) for any subset of the following task modalities:

  • Text-only translation (Source image not used)

  • Image captioning (English source text not used)

  • Multi-modal translation (uses both the image and the text)

Participants must indicate to which track their translations belong:

  • Text-only / Image-only / Multi-modal

    • see above

  • Domain-Aware / Domain-Unaware

    • Whether or not the full (English) Visual Genome was used in training.

  • Bilingual / Multilingual

    • Whether or not your model is multilingual, translating from English into all of the desired languages, or whether you used individual pairwise models.

  • Constrained / Non-Constrained

    • The limitations for the constrained systems track are as follows:

      • Allowed pretrained LLMs:

        • Llama-2-7B, Llama-2-13B, Mistral-7B

        • You may adapt / finetune / whatever these models using the provided constrained data only.

      • Allowed multimodal LLMs

        • CLIP, DALL-E, Gemini, LLaVA

        • You may adapt / finetune / whatever these models using the provided constrained data only. Mention if you use any other Multimodal LLMs.

      • Allowed pretrained LMs:

        • mBART, BERT, RoBERTa, XLM-RoBERTa, sBERT, LaBSE (all in all publicly available model sizes)

      • You may ONLY use the training data allowed for this year (linked below).

      • You may use any publicly available automatic quality metric during your development

      • Any basic linguistics tools (taggers, parsers, morphology analyzers, etc.)

  • Non-constrained submissions may use other data or pretrained models but need to specify what was used.

Training Data

The {Hindi, Bengali, Malayalam, Hausa} Visual Genome consists of:

  • 29k training examples

  • 1k dev set

  • 1.6k evaluation set

All the datasets use the same underlying set of images with a handful of differences due to sanity checks which were carried out in each of the languages independently.

Evaluation

WAT2024 Multi-Modal Task will be evaluated on:

  • 1.6k evaluation set of {Hindi, Bengali, Malayalam, Hausa} Visual Genome

  • 1.4k challenge set of {Hindi, Bengali, Malayalam, Hausa} Visual Genome

Means of evaluation:

  • Automatic metrics: BLEU, CHRF3, and others

  • Manual evaluation, subject to the availability of {Hindi, Bengali, Malayalam, Hausa} speakers

Registration

Please register your team by sending application following WAT application here

Paper and References

Organizers

  • Shantipriya Parida (Silo AI, Finland)

  • Ondřej Bojar (Charles University, Czech Republic)

  • Idris Abdulmumin (University of Pretoria, South Africa)

  • Shamsuddeen Hassan Muhammad (Imperial College London, UK)

  • Ibrahim Said Ahmad (Institute for Experimental AI, Northeastern University, USA)

Contact

License

The data is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.

Acknowledgment

The datasets in this shared task were supported by the grant 19-26934X (Neural Representations in Multi-modal and Multi-lingual Modelling) of the Grant Agency of the Czech Republic.