Shared task: Discourse-Level Literary Translation

EMNLP 2024

NINTH CONFERENCE ON
MACHINE TRANSLATION (WMT24)

November 15-16, 2024
Miami, Florida, USA

[HOME] [PROGRAM] [PAPERS] [AUTHORS]
TRANSLATION TASKS: [GENERAL MT (NEWS)] [LOW-RESOURCE LANGUAGES OF SPAIN] [INDIC MT] [CHAT TASK] [BIOMEDICAL] [MULTIINDIC22MT TASK] [ENGLISH-TO-LOWRES MULTIMODAL MT TASK] [NON-REPETITIVE] [PATENT] [LITERARY]
EVALUATION TASKS: [METRICS TASK] [MT TEST SUITES] [QUALITY ESTIMATION]
OTHER TASKS: [OPEN LANGUAGE DATA INITIATIVE]

ANNOUNCEMENTS

27th June 2024 - The Final Testing Datasets are released! Please go to the Testing Data Download ⏬ Section. ❤️
22th May 2024 - The Valid Datasets for Chinese→German and Chinese→Russian are released! Please go to the Training Data Download ⏬ Section. ❤️
22th May 2024 - The Webnovel Corpus Github is released (Keep data updated). 💯
22th May 2024 - The Tutorial of LLM-based discourse and literary translation is public (for new hand in this topic). 💯
20th May 2024 - The Chinese-Russian dataset is released (new in this year). 🔥
14th May 2024 - The Chinese-German dataset is released (new in this year). 🔥
12th May 2024 - The Chinese-English dataset is released (same as last year). 📚
10th May 2024 - The shared task is announced. 🌍
The New in This Year - Two more language pairs; A/B Testing evaluation method. ⚠️

DEADLINES

All deadlines are in AoE (Anywhere on Earth). Please note that the submission process for system papers follows the paper submission policy outlined by WMT. For further details, please refer to the IMPORTANT DATES on the WMT homepage.

Release of training and validation data

May, 2024

Test data released

27th June, 2024

Translation submission deadline

4th July, 2024

System description abstract paper

11th July, 2024

Paper submission deadline

TBA

OVERVIEW

Machine translation (MT) faces significant challenges when dealing with literary texts due to their complex nature. In general, literary MT is bottlenecked by several factors:

😢 Limited Training Data: Most existing document-level datasets are comprised of news articles and technical documents, with limited availability of high-quality, discourse-level parallel data in the literary domain. This scarcity of data makes it difficult to develop systems that can handle the complexities of literary translation.
😱 Rich Linguistic Phenomena: Literary texts contain more complex linguistic knowledge than non-literary ones, especially with regard to discourse. To generate a cohesive and coherent output, MT models require an understanding of the intended meaning and structure of the text at the discourse level.
😅 Long-Range Contex: Literary works, such as novels, have much longer contexts than texts in other domains, such as news articles. Translation models must acquire the ability to model long-range context in order to learn translation consistency and appropriate lexical choices.
😔 Unreliable Evaluation Methods: Evaluating literary translations requires measuring the meaning and structure of the text, as well as the nuances and complexities of the source language. A single automatic evaluation using a single reference is often unreliable. Thus, professional translators with well-defined scoring standards and targeted evaluation methods are considered a complement.

GOALS

The main goals of the task are to:

😊 Encourage research in machine translation, large language model and language agent for document modelling, discourse knowledge itegration and literary translation.
🤗 Provide a platform for researchers to evaluate and compare the performance of different methods and systems on this challenging dataset.
😃 Advance the state of the art in machine translation for practical application scenarios.

LANGUAGE PAIRS

✨ Chinese-English (document-level with cross-sentence alignment information)
🆕 Chinese-German (document-level without alignment information, may contain some translation errors)
🆕 Chinese-Russian (document-level without alignment information, may contain some translation errors)

TASK DESCRIPTION

The shared task will be the translation of web fiction texts in three directions: Chinese→English, Chinese→German, Chinese→Russian.

First, Participants will be provided with two types of training dataset:

GuoFeng Webnovel Corpus v1: we release a in-domain, discourse-level and human-translated training dataset with sentence-level alignment for Chinese→English (same as last year).
GuoFeng Webnovel Corpus v2: we release two in-domain and discourse-level training dataset for Chinese→German and Chinese→Russian (new in this year).
General MT Track Parallel Training Data: you can use all sentence-/document-level parallel training data of the general translation task (please go to General MT).

Second, we provide two types of validation/testing datasets:

Simple Set contains unseen chapters in the same web novels as the training data;
Difficult Set contains chapters in different web novels from the training data.

Third, we provide two types of in-domain pretrained models for Chinese→English (same as last year). You can also use the WMT allowed general-domain LMs/LLMs:

Chinese-Llama-2 (7B): The Llama-2 model is continiously pretrained on 400GB Chinese and English literary texts, and then finetuned on Chinese instruction dataset (BAAI/COIG) and Chinese-English Document-level translation dataset, without changging the vocabulary.
In-domain RoBERTa (base): 12 layer encoder, hidden size 768, vocabulary size 21,128, whole word masking. It was originally pretrained on Chinese Wikipedia. We continously train it with Chinese literary texts (84B tokens).
In-domain mBART (CC25): 12 layer encoder and 12 layer decoder, hidden size 1024, vocabulary size 250,000. It was originally trained with 25 language web corpus. We continously train it with English and Chinese literary texts (114B tokens).
General-domain LMs/LLMs: Llama-2-7B, Llama-2-13B, Mistral-7B; mBART, BERT, RoBERTa, XLM-RoBERTa, sBERT, LaBSE (please go to The limitations for the constrained systems track).

In the final testing stage, participants use their systems to translate an official testing set (mixed with Simple and Difficult unseen testsets). The translation quality is measured as follows. All systems will be ranked by human judgement or A/B testing according to our prefessional guidlines.

manual evaluation by human translators, e.g. MQM;
automatic evaluation metrics with two references;
A/B testing by web fiction readers (new in this year).

Besides, The task has Constrained Track and Unconstrained Track with different constraints on the training of the models. Participants can submit either constrained or unconstrained systems with flags, and we will distinguish their submissions. For example, if you finetuned Llama-2-7B on the above data, it is Constrained Tack.

Constrained Tack. You may ONLY use the training data specified above.
Unconstrained Tack. It allows the participation with a system trained without any limitations.

DATA

Copyright and Licence

Copyright is a crucial consideration when it comes to releasing literary texts, and we (Tencent AI Lab and China Literature Ltd.) are the rightful copyright owners of the web fictions included in this dataset. We are pleased to make this data available to the research community, subject to certain terms and conditions.

🔔 GuoFeng Webnovel Corpus are copyrighted by Tencent AI Lab and China Literature Limited.
🚦 After completing the registration process with your institute information, WMT participants or researchers are granted permission to use the dataset solely for non-commercial research purposes and must comply with the principles of fair use (CC-BY 4.0).
🔒 Modifying or redistributing the dataset is strictly prohibited. If you plan to make any changes to the dataset, such as adding more annotations, with the intention of publishing it publicly, please contact us first to obtain written consent.
🚧 By using this dataset, you agree to the terms and conditions outlined above. We take copyright infringement very seriously and will take legal action against any unauthorized use of our data.

Citation

📝 If you use the GuoFeng Webnovel Corpus, please cite the following papers and claim the original download link:

[1] Longyue Wang, Zhaopeng Tu, Yan Gu, Siyou Liu, Dian Yu, Qingsong Ma, Chenyang Lyu, Liting Zhou, Chao-Hong Liu, Yufeng Ma, Weiyu Chen, Yvette Graham, Bonnie Webber, Philipp Koehn, Andy Way, Yulin Yuan, Shuming Shi. Findings of the WMT 2023 shared task on discourse-level literary translation: A fresh orb in the cosmos of LLMs. Proceedings of the Eighth Conference on Machine Translation (WMT). 2023.
[2] Longyue Wang, Siyou Liu, Minghao Wu, Wenxiang Jiao, Xing Wang, Jiahao Xu, Zhaopeng Tu, Liting Zhou, Yan Gu, Weiyu Chen, Philipp Koehn, Andy Way, Yulin Yuan. Findings of the WMT 2024 Shared Task on Discourse-Level Literary Translation. Proceedings of the Ninth Conference on Machine Translation (WMT). 2024.
[3] Download Link: www2.statmt.org/wmt23/literary-translation-task.html.

Data Description (GuoFeng Webnovel Corpus V1)

💌 The web novels are originally written in Chinese by novel writers and then translated into English by professional translators. The processing steps are detailed in [1]. Note that (1) some sentences may have no aliged translations, because human translators translate novels in a document way; (2) we keep the all document-level information such as continous chapters and sentences.

🎈 Chinese→English: We release 22,567 continuous chapters from 179 web novels, covering 14 genres such as fantasy science and romance. The data statistics are listed as follows.

Subset	# Book	# Chapter	# Sentence	Notes
Train	179	22,567	1,939,187	covering 14 genres
Valid 1	22	22	755	same books with Train
Test 1	26	22	697	same books with Train
Valid 2	10	10	853	different books with Train
Test 2	12	12	917	different books with Train
Testing Input	12	239	16,742	different books with Train and long documents

Data Format: Taking "train.en" for exaple, the data format is shown as follows: <BOOK id=""> </BOOK> indicates a book boundary, which contains a number of continous chapters with the tag <CHAPTER id=""> </CHAPTER>. The contents are splited into sentences and manually aligned to Chinese sentences in "train.zh".

<BOOK id="100-jdxx">
<CHAPTER id="jdxx_0001">
Chapter 1 Make Your Choice, Youth
"Implode reality, pulverize thy spirit. By banishing this world, comply with the blood pact, I will summon forth thee, O' young Demon King!"
At a park during sunset, a childlike, handsome youth placed his left hand on his chest, while his right hand was stretched out with his fingers wide open, as though he was about to release something amazing from his palm. He looked serious and solemn.
... ...
</CHAPTER>
<CHAPTER id="jdxx_0002">
....
</CHAPTER>
</BOOK>

Data Description (GuoFeng Webnovel Corpus V2)

💌 We release ~19K continuous chapters from ~120 web novels, covering 14 genres such as fantasy science and romance. The data are document-level without alignment information. The data statistics are listed as follows (X-Lang means other langauge wihle CH-Lang indicates Chinese language).

🎈 Chinese→German

Subset

# Book

# Chapter

# X-Lang Word / CH-Lang Char

Notes

Train

118

19,101

25,562,039 / 36,790,017

14 genres

Valid

16,686 / 85,330

11 genres

Testing Input

** / 91,981

13 genres

🎈 Chinese→Russian

Subset

# Book

# Chapter

# X-Lang Word / CH-Lang Char

Notes

Train

122

19,971

23,521,169 / 39,074,007

14 genres

Valid

14,514 / 85,330

11 genres

Testing Input

** / 91,981

13 genres

Data Format: Taking Chinese-German for exaple, the data format is shown as follows: (1) 1-ac, 2-ccg, …… indicates book-level folders. (2) In "1-ac" folder, 15-jlws_0001-CH.txt, 15-jlws_0001-DE.txt, …. are continous chapters in Chinese and German languages. (3) In each file, there is no tags and sentence-level alignment information.

.
    ├── 1-ac                       # Book ID - English Title
    │   ├── 15-jlws_0001-CH.txt    # Chapter ID - Chinese
    │   ├── 15-jlws_0001-DE.txt    # Chapter ID - German
    │   ├── ......                 # more chapters
    ├── 2-ccg                      # Book ID - English Title
    │   ├── 62-xzltq_0002-CH.txt   # Chapter ID - Chinese
    │   ├── 62-xzltq_0002-DE.txt   # Chapter ID - German
    │   ├── ......                 # more chapters
    ├── ......                     # more books

15-jlws_0001-CH.txt
第一章 李戴
李戴走出考场，穿梭在密密麻麻的人群当中。看着周围那一张张春风得意的脸，耳边响起路人兴高采烈的讨论声，李戴心中却愈加的沮丧。
“哎，考砸了！想进入到面试是肯定没戏了。”李戴揉了揉太阳穴，头脑中那种沉甸甸的感觉却愈发的浓郁。
15-jlws_0001-DE.txt
Kapitel 1: Li Dai
Li Dai verließ das Prüfungszentrum und bewegte sich durch die dichte Menschenmenge. Er sah die triumphierenden Gesichter um ihn herum und hörte die enthusiastischen Diskussionen der Passanten, doch in seinem Herzen wurde er immer deprimierter.
"Oh, ich habe die Prüfung vergeigt! Eine Chance auf ein Vorstellungsgespräch gibt es sicherlich nicht mehr." Li Dai massierte seine Schläfen, das schwere Gefühl in seinem Kopf wurde immer intensiver.

Testing Data Download ⏬

The follows are the Chinese source sides of final testsets:

Valid Data Download ⏬

The follows are the Chinese→German and Chinese→Russian Valid Datasets. The Chinese→English Valid Datasets are placed with its Training Data:

Training Data Download ⏬

The GuoFeng Webnovel Corpus V1 and V2 can be download via Github: (1) Go to "Download" Section and click the buttion; (2) Fill out the registration form and you will get the link in the Final Page.

Pretrained Models ⏬

We provide three types of in-domain pretrained models (same as last year) and large language models (new in this year):

Version

Layer

Hidden Size

Vocabulary Size

Continuous Train

Chinese-Llama-2 7B

4,096

32,000

Chinese and English literary texts (115B tokens)

RoBERTa base

12 enc

768

21,128

Chinese literary texts (84B tokens)

mBART CC25

12 enc + 12 dec

1,024

250,000

English and Chinese literary texts (114B tokens)

EVALUATION METHODS

🤖 Automatic Evaluation: To evaluate the performance of the well-trained models, we will report multiple evaluation metrics, including d-BLEU (document-level sacreBLEU), d-COMET (document-level COMET) to measure the overall accuracy and fluency of the translations.
👩‍🏫 Human Evaluation: Besides, we provide professional translators to assess the translations based on more subjective criteria, such as the preservation of literary style and the overall coherence and cohesiveness of the translated texts. Based on our experience with this project, we designed a fine-grained error typology and marking MQM criteria for literary MT.
👨‍👩‍👧‍👦 A/B Testing: Acknowledging the concern that there is no single, universally preferred translation for literary texts, we ask human readers or LLMs to select their preferred contents in practical application scenarios.

RESULTS SUBMISSION

Participants can submit either constrained or unconstrained systems with flags, and we will distinguish their submissions.
Each team can submit at most 3 MT outputs per language pair direction, one primary and up to two contrastive.
Submissions will be done by sending us an email to [Longyue Wang](mailto:vincentwang0229@gmail.com).
The requirements of submission format are KEEP the SAME FORMAT with the INPUT. Taking Chinese→English for example: (1) Keep N output files that are identical to the testing input files. (2) In the output files, ensure that each line is aligned with the corresponding input line. If a particular input line is blank, the corresponding output line should also be blank.

Subject: WMT2024 Literary Translation Submission (Team Name)
Basic Information: your team name, affiliations, team member names.
System Flag: constrained or unconstrained / en-zh or de-zh or ru-zh.
System Description: main techniques and toolkits used in your three submission systems.
Attachment: File names associated with testing input IDs (primary-1.en-zh.out, primary-2.en-zh.out, ..., contrastive1-1.en-zh.out, ..., contrastive2-12.en-zh.out)

TUTORIAL

If you are new to discourse-level translation or literary translation, we recommend you browse through our related work in the LLM era to quickly get up to speed with the latest advancements and background (data, methods, linguistic knowledge).

🚂 Comprehensive evaluation of LLMs' performance on document-level translation and discourse phenomena:

@inproceedings{wang2023document,
  title={Document-Level Machine Translation with Large Language Models},
  author={Wang, Longyue and Lyu, Chenyang and Ji, Tianbo and Zhang, Zhirui and Yu, Dian and Shi, Shuming and Tu, Zhaopeng},
  booktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing},
  pages={16646--16661},
  year={2023}
}

🚃 Comprehensive summary of document-level MT datasets and proposal of training strategies on ultra-long texts:

@inproceedings{duextrapolation,
  title={On Extrapolation of Long-Text Translation with Large Language Models},
  author={Du, Zefeng and Jiao, Wenxiang and Wang, Longyue and Lyu, Chenyang and Pang, Jianhui and Cui, Leyang and Song, Kaiqiang and Wong, Derek F and Shi, Shuming and Tu, Zhaopeng}
  booktitle={Findings of the Association for Computational Linguistics},
  year={2024}
}

🚄 A new method on reducing the keys/values cache and evaluation on document-level MT:

@inproceedings{pang2024anchor,
  title={Anchor-based Large Language Models},
  author={Pang, Jianhui and Ye, Fanghua and Wong, Derek F and Wang, Longyue},
  booktitle={Findings of the Association for Computational Linguistics},
  year={2024}
}

🚅 A exploration on translating ultra-long literary texts with LLM Agents:

@article{wu2024perhaps,
  title={(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts},
  author={Wu, Minghao and Yuan, Yulin and Haffari, Gholamreza and Wang, Longyue},
  journal={arXiv preprint arXiv:2405.11804},
  year={2024}
}

COMMITTEE

Organization Team

Longyue Wang (vincentwang0229@gmail.com) (Tencent AI Lab)
Zhaopeng Tu (Tencent AI Lab)
Wenxiang Jiao (Tencent AI Lab)
Xing Wang (Tencent AI Lab)
Jiahao Xu (Tencent AI Lab)
Yan Gu (China Literature Limited)
Weiyu Chen (China Literature Limited)

Evaluation Team

Siyou Liu (helen.liu103@gmail.com) (University of Macau)
Minghao Wu (Monash University)
Liting Zhou (Dublin City University)

Advisory Committee

Philipp Koehn (Johns Hopkins University)
Andy Way (Dublin City University)
Yulin Yuan (University of Macau)

Contact

If you have any further questions or suggestions, please do not hesitate to send an email to Longyue Wang (vincentwang0229@gmail.com).

NINTH CONFERENCE ON MACHINE TRANSLATION (WMT24)