Shared Task: Biomedical Translation Task

Task description

This task aims to evaluate systems on the translation of documents from the biomedical domain. The test data will consist of biomedical abstracts and it will address the following language pairs:

Data

ATTENTION!! We ask the participants to not download the Medline database by themselves in order to retrieve training data. Submissions that are derived from a model that was trained on the whole PubMed will be not considered in the evaluation.

Participants can rely on training (and development) data from various sources, for instance:

Participants are also free to use out-of-domain data.

Evaluation

Evaluation will be carried out both automatically and manually. Automatic evaluation will make use of standard machine translation metrics, such as BLEU.

Native speakers of each of the languages will manually check the quality of the translation for a small sample of the submissions. If necessary, we also expect participants to support us in the manual evaluation (accordingly to the number of submissions).

We plan to release test sets for the following language pairs and sources:

Scientific abstracts:

For the test set of Medline abstracts, the format will be plain text files. The format will be the following:

DOC_ID	SENT_ID	SENTENCE_TEXT
The three values are separated by a TAB character:
doc1	1	sentence_1
doc2	2	sentence_2
doc2	3	sentence_3
doc2	4	sentence_4
doc2	5	sentence_5
...
doc2	n	sentence_n
doc4	1	sentence_1
doc4	2	sentence_2
...
The format for the submission will be the same, such as in the example below. The participants should follow the same order of the sentences as in the original test set file.
doc1	1	translated_sentence_1
doc2	2	translated_sentence_2
doc2	3	translated_sentence_3
doc2	4	translated_sentence_4
doc2	5	translated_sentence_5
...
doc2	n	translated_sentence_n
doc4	1	translated_sentence_1
doc4	2	translated_sentence_2
...

Submission Requirements

Please notice that, following general WMT policy explicitly enforced in other tasks, we will release all participants' submissions after this year's edition of the task to promote further studies.

Please register your team using this form. You will receive a mail with the confirmation of your registration. The link for submission site will be informed in this mail. Please register your team as soon as possible.

The Medline test files are available in the WMT'23 biomedical task Google Drive folder.

The format for the submission files should include the original test file name preceded by the team identifier (as registered in the form above) and the run number, following this example for the abstracts:

Each team will be allowed to submit up to 3 runs per test set. Please notice that the submission form will include questions about details of your methods, please inform as much details as possible, this is important for us.

Results

Initial results for the biomedical task are available here.

Important dates

Release of test dataJuly 13th, 2023
Results submission deadlineJuly 20th, 2023 July 25th, 2023
Paper submission deadlineTBA September, 2023
Paper notificationTBA October, 2023
Camera-ready version due20th October, 2023
Conference EMNLP6-7 December, 2023

All deadlines are in AoE (Anywhere on Earth).

Organisers

Rachel Bawden (University of Edinburgh, UK)
Giorgio Maria Di Nunzio (University of Padua, Italy)
Cristian Grozea (Fraunhofer Institute, Germany)
Antonio Jimeno Yepes (University of Melbourne, Australia)
Aurélie Névéol (Université Paris Saclay, CNRS, LISN, France)
Mariana Neves (German Federal Institute for Risk Assessment, Germany)
Roland Roller (DFKI, Germany)
Amy Siu (Beuth University of Applied Sciences, Germany)
Philippe Thomas (DFKI, Germany)
Federica Vezzani (University of Padua, Italy)
Maika Vicente Navarro, Maika Spanish Translator, Melbourne, Australia
Dina Wiemann (Novartis, Switzerland)
Lana Yeganova (NCBI/NLM/NIH, USA)


Please contact us in the mail wmtbiomedical@gmail.com. Please join our discussion forum.