Word-level Autocompletion Task - EMNLP Eighth Conference on Machine Translation

Shared Task: Word-Level AutoCompletion

Important dates

Release of training and dev data	May 20th, 2023
Release of test data	July 20th, 2023
Result submission deadline	July 27th, 2023
System paper submission deadline	September 1, 2023
Paper notification	October 6, 2023
Camera-ready version due	October 18, 2023

Note that system paper submission follows the paper submission policy in WMT, please see the section of paper submission information in WMT homepage for more details.

Overview

WLAC aims to predict a target word given a source sentence, translation context and a human typed character sequence. WLAC plays an important role in a CAT system in enhancing translation efficiency. The WLAC shared task was first held in WMT 2022 and participants can check its details in WLAC22-Findings.

Task Definition

Fig 1:Illustration of WLAC task for predicting a word for a source sentence, translation context as well as a human typed character sequence.

Suppose x = (x₁, x₂, . . . , x_m) is a source sequence, s = (s₁, s₂, . . . , s_k) is a sequence of human typed characters, and a translation context is denoted by c = (c_l, c_r ), where c_l= (c_l,1, c_l,2, . . . , c_l,i), c_r = (c_r,1, c_r,2, . . . , c_r,j ). The translation pieces c_land c_rare on the left and right hand side of s, respectively. This basic idea about WLAC task is shown in Fig 1, where the translation context c includes the left context c_l and right context c_r, underlined text “sp” is the human typed characters s and the words in the rounded rectangles are word-level autocompletion candidates. Formally, given a source sequence x, typed character sequence s and a context c, the word-level autocompletion (WLAC) task aims to predict a target word w which is to be placed in the middle between c_land c_rto constitute a partial translation. Note that in the partial translation consisting of c_l, w and c_r, w is not necessary to be consecutive to c_l,i or c_r,1. For example, in Figure 1, c_l = ("We", ), c_r = ("opinions"), s = ("sp", ), the WLAC task is expected to predict w = "specialists" to constitute a partial translation "We ··· specialists ··· opinions ", where "···" represents zero, one, or more words (i.e., the two words before and after it are not necessarily consecutive).

To make the task more general in real-world scenarios, the assumption is made that the left context c_l and right context c_r can be empty, which leads to the following four types of context:

Zero-context: both c_l and c_r are empty;
Suffix: c_l is empty;
Prefix: c_r is empty;
Bi-context: neither c_l nor c_r is empty.

Task description

The word-level autocompletion task in this year will address the following language pairs:

English-Chinese and Chinese-English (en/zh, zh/en)
English-German and German-English (en/de, de/en)

Data

ATTENTION!! Participants must use only the data provided in train/dev/test. Please fill in the registration form before your participation.

Evaluation

Automatic Evaluation: To evaluate the performance of the well-trained models, we choose accuracy as the evaluation metric: Acc=N_match / N_all where N_match is the number of predicted words that are identical to the human desired word and N_all is the number of testing examples.

Human Evaluation: Given a source sentence x, context c and a typed sequence s, there would be multiple ground-truth words w sufficing to the constraint of s, especially for a short s. However only a single of them is provided in the realistic test datasets, and hence automatic evaluation may lead to some limitations. As a result, we will additionally hire some professional translators for human evaluation by manually checking whether a predicted word is true or not.

Organizers

The members in the team are enumerated alphabetically as follows:

Francisco Casacuberta (Universitat Politècnica de València)
George Foster (Google)
Guoping Huang (Tencent)
Philipp Koehn (Johns Hopkins University)
Geza Kovacs (LILT)
Lemao Liu (Tencent)
Shuming Shi (Tencent)
Taro Watanabe (Nara Institute of Science and Technology)
Chengqing Zong (Institute of Automation, Chinese Academy of Sciences)

Contact

For any further questions or suggestions, please contact us through WLAC googlegroups or drop an email to Lemao Liu.

Supported by TBA.